我的 lucene 索引包含带有“itemName”字段的文档。 该字段通过 0 到 1 之间的增强因子进行增强。
当我创建 BooleanQuery 时,我希望结果按匹配子句的数量和 boostfactor 进行排名,因此公式如下所示:
score = (count_of_matching_clauses / count_of_total_clauses + boost_factor) / 2
分数将始终是 0 和 1 之间的浮点数。 1 如果所有子句都匹配并且提升因子为 1。
例如,如果三个没有提升因子的文档的“itemName”字段值为:
document1: "java is an island"
document2: "the secret of monkey island"
document3: "java island adventures"
并且 BooleanQuery 如下所示:
TermQuery query1 = new TermQuery(new Term("name","java"));
TermQuery query2 = new TermQuery(new Term("name","island"));
BooleanQuery query = new BooleanQuery();
query.add(query1, BooleanClause.Occur.SHOULD);
query.add(query2, BooleanClause.Occur.SHOULD);
则 document1 将以 (2/2 +0)/2 的分数检索= 0.5 因为:
count_of_matching_clauses = 2 且
count_of_total_clauses = 2
document2 将以 (1/2+0)/2 = 0.25 的分数检索,因为:
count_of_matching_clauses = 1 且
count_of_total_clauses = 2
比 document3 将以 (2/2 +0)/2 = 0.5 的分数检索,因为:
count_of_matching_clauses = 2 且
count_of_total_clauses = 2
如何在lucene中实现这种排名机制? 我如何告诉 lucene 使用我的自定义排名类对结果进行排名?
My lucene index contains documents with the field "itemName". This field is boosted with a boost factor between 0 and 1.
When i create a BooleanQuery i'd like that the results are ranked by the count of matched clauses and the boostfactor, so the formula looks like:
score = (count_of_matching_clauses / count_of_total_clauses + boost_factor) / 2
The score would always be a float between 0 and 1. 1 in case all clauses match and the boost factor is 1.
For example, if the field value of "itemName" for three documents with no boost factor are:
document1: "java is an island"
document2: "the secret of monkey island"
document3: "java island adventures"
and the BooleanQuery would look like:
TermQuery query1 = new TermQuery(new Term("name","java"));
TermQuery query2 = new TermQuery(new Term("name","island"));
BooleanQuery query = new BooleanQuery();
query.add(query1, BooleanClause.Occur.SHOULD);
query.add(query2, BooleanClause.Occur.SHOULD);
than document1 would be retrieved with a score of (2/2 +0)/2 = 0.5 because:
count_of_matching_clauses = 2 and
count_of_total_clauses = 2
document2 would be retrieved with a score of (1/2+0)/2 = 0.25 because:
count_of_matching_clauses = 1 and
count_of_total_clauses = 2
than document3 would be retrieved with a score of (2/2 +0)/2 = 0.5 because:
count_of_matching_clauses = 2 and
count_of_total_clauses = 2
How to implement this ranking mechnism in lucene? How can i tell lucene to use my custom ranking class for ranking the results?
发布评论
评论(1)
您可以通过扩展 相似性类并在搜索过程中传递它。 在此类的 Javadoc 中(点击链接),您可以阅读评分算法的详细信息。 有关评分的更多文本,请参见 Searcher.explain()
顺便说一句,您希望实现的评分是默认评分。 结果的顺序将根据需要进行,但实际分数可能与 0.5 或 0.25 不同。
编辑:
将原始答案中引用 Lucene v2.4 的链接更新为 v5.3.1。
You can implement your own scoring algorithm by extending Similarity class and passing it during search. In the Javadoc of this class (follow the link), you can read the details of the scoring algorithm. Some more text on scoring can be found here. An exceptional aid to understand scoring is to actually see the explanation for the scoring as returned by Searcher.explain()
BTW, the scoring you wish to implement is the default scoring. The order of results will be as desired, though actual scores can be different than 0.5 or 0.25.
EDIT:
Updated the links in the original answer, which referred to Lucene v2.4, to v5.3.1.