Forum list » General discussion »
Using Lucene to boost latest articles
Pages:  1
(13 posts)

Using Lucene to boost latest articles

How can I boost the search result in a way that the latest articles are more relevant?

I found this thread https://www.progress.com/documentation/sitefinity-cms/for-developers-customize-the-lucene-search-scoring that could be relevant for the scoring function, but I am not sure how to use it with Webnodes.

I tried to use IndexQuery.CustomQuery (together with BodySearch), but when I did, I did not get any results.

(146 posts)

Re: Using Lucene to boost latest articles

Hi! 

Lucene has 3 different ways to boost scores when doing searches:

Document level boosting 

Document level field boosting

Query level boosting

 

Document level boosting is the easiest to archieve. This is done during indexing, so you can override the BuildSearchIndex method that has the Lucene doc as a parameter.

doc.SetBoost(1.5f);

 

Will that cover your scenario? 

(13 posts)

Re: Using Lucene to boost latest articles

Hi Vidar

Boosting during indexing is not relevant here.

Here are my current status. Some article types already have som boosting and I have reduced their boosting by a huge amount. Search words are split up and fuzzy and boosting is applied to each word.

After that I want to apply the custom query boosting for newest articles. What is the correct way to do this?

Right now all the boosting logic is hidden in a class (MyCustomScoreQuery) derived from Lucene.Net.Search.Function.CustomScoreQuery, as described in the link from the first post. If I use the CustomQuery property I get no results. What happens if I use both BodySearch and CustomQuery.

To make it event more complex, the query is also used in an AdvancedIndexQuery<T>

// escape lucene characters and split into words
var terms = Regex.Replace(search, "([+\\-\\(\\)\\{\\}\\[\\]\"\\^\\~\\*\\?\\:\\\\]|\\&\\&|\\|\\|)", "\\$1")
.Split(' ')
.Where(a => !string.IsNullOrWhiteSpace(a));

search = "" + string.Join(" ", terms.Select(a => a + "~0.7^3.0")) + " " + string.Join(" ", terms.Select(a => a + "*^3.0"));

query.BodySearch = search;

// TODO: boost articles newer than X days ( with this active we get no results )
var q1 = new Lucene.Net.Search.BooleanQuery();
var myCustomSearch = new MyCustomScoreQuery(q1);
query.CustomQuery = myCustomSearch;


var advancedQuery = new AdvancedIndexQuery(query);
advancedQuery.SuggestDifferentSpelling = true;

if (!autoCompleteSearch)
{
	advancedQuery.EnableFacetedSearch      = true;
	advancedQuery.FacetFilterOption        = WAF.Engine.Search.Facets.FacetFilterOptions.ShowAllFacetValuesForActiveFacets;
	advancedQuery.CustomLuceneFieldsToFacetOn = new List() { "Class" };

	if (FacetFilters != null)
		advancedQuery.FacetFilters = FacetFilters;
}
SearchResultSet searchResultSet = WAFContext.Session.SearchAdvanced(advancedQuery);






		[Serializable]
		public class MyCustomScoreQuery : Lucene.Net.Search.Function.CustomScoreQuery
		{
			public MyCustomScoreQuery(Lucene.Net.Search.Query subQuery) : base(subQuery)
			{
			}

			/// 
			/// The custom score provider for 
			/// 
			protected class CustomizedScoreProvider : Lucene.Net.Search.Function.CustomScoreProvider
			{
				/// 
				/// Initializes a new instance of the  class.
				/// 
				///The index reader
				public CustomizedScoreProvider(Lucene.Net.Index.IndexReader reader) : base(reader)
				{
				}

				/// 
				public override float CustomScore(int doc, float subQueryScore, float[] valSrcScores)
				{
					var baseScore = base.CustomScore(doc, subQueryScore, valSrcScores);
					double boost = 1;
					var currentDocument = reader.Document(doc);
					var dateFieldValue = currentDocument.Get("LastModified");
					if (dateFieldValue != null)
					{
						var now = DateTime.UtcNow;
						var date = DateTime.Parse(dateFieldValue);// DateTools.StringToDate(dateFieldValue);
						var age = (int)(now - date).TotalDays;
						boost = CalculateBoost(age);
					}

					var adjustedScore = baseScore * boost;
					return (float)adjustedScore;
				}

				/// 
				/// Calculates the boost, based on the content's age
				/// 
				///Content's age in days
				/// The boost
				protected virtual double CalculateBoost(int days)
				{
					double boostFactor = 100;
					double maxRampFactor = 5;
					double curveAdjustmentFactor = 5;
					return Math.Pow(boostFactor / (maxRampFactor + days), 1 / curveAdjustmentFactor);
				}
			}
		}