如何实现自定义搜索结果排名?

发布于 2024-07-28 23:27:40 字数 1180 浏览 2 评论 0 原文

我的 lucene 索引包含带有“itemName”字段的文档。 该字段通过 0 到 1 之间的增强因子进行增强。 当我创建 BooleanQuery 时,我希望结果按匹配子句的数量和 boostfactor 进行排名,因此公式如下所示:

score = (count_of_matching_clauses / count_of_total_clauses + boost_factor) / 2

分数将始终是 0 和 1 之间的浮点数。 1 如果所有子句都匹配并且提升因子为 1。

例如,如果三个没有提升因子的文档的“itemName”字段值为:

document1: "java is an island"
document2: "the secret of monkey island"
document3: "java island adventures"

并且 BooleanQuery 如下所示:

TermQuery query1 = new TermQuery(new Term("name","java"));
TermQuery query2 = new TermQuery(new Term("name","island"));

BooleanQuery query = new BooleanQuery();
query.add(query1, BooleanClause.Occur.SHOULD);
query.add(query2, BooleanClause.Occur.SHOULD);

则 document1 将以 (2/2 +0)/2 的分数检索= 0.5 因为: count_of_matching_clauses = 2 且 count_of_total_clauses = 2

document2 将以 (1/2+0)/2 = 0.25 的分数检索,因为: count_of_matching_clauses = 1 且 count_of_total_clauses = 2

比 document3 将以 (2/2 +0)/2 = 0.5 的分数检索,因为: count_of_matching_clauses = 2 且 count_of_total_clauses = 2

如何在lucene中实现这种排名机制? 我如何告诉 lucene 使用我的自定义排名类对结果进行排名?

My lucene index contains documents with the field "itemName". This field is boosted with a boost factor between 0 and 1.
When i create a BooleanQuery i'd like that the results are ranked by the count of matched clauses and the boostfactor, so the formula looks like:

score = (count_of_matching_clauses / count_of_total_clauses + boost_factor) / 2

The score would always be a float between 0 and 1. 1 in case all clauses match and the boost factor is 1.

For example, if the field value of "itemName" for three documents with no boost factor are:

document1: "java is an island"
document2: "the secret of monkey island"
document3: "java island adventures"

and the BooleanQuery would look like:

TermQuery query1 = new TermQuery(new Term("name","java"));
TermQuery query2 = new TermQuery(new Term("name","island"));

BooleanQuery query = new BooleanQuery();
query.add(query1, BooleanClause.Occur.SHOULD);
query.add(query2, BooleanClause.Occur.SHOULD);

than document1 would be retrieved with a score of (2/2 +0)/2 = 0.5 because:
count_of_matching_clauses = 2 and
count_of_total_clauses = 2

document2 would be retrieved with a score of (1/2+0)/2 = 0.25 because:
count_of_matching_clauses = 1 and
count_of_total_clauses = 2

than document3 would be retrieved with a score of (2/2 +0)/2 = 0.5 because:
count_of_matching_clauses = 2 and
count_of_total_clauses = 2

How to implement this ranking mechnism in lucene? How can i tell lucene to use my custom ranking class for ranking the results?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

盗心人 2024-08-04 23:27:40

您可以通过扩展 相似性类并在搜索过程中传递它。 在此类的 Javadoc 中(点击链接),您可以阅读评分算法的详细信息。 有关评分的更多文本,请参见 Searcher.explain()

顺便说一句,您希望实现的评分是默认评分。 结果的顺序将根据需要进行,但实际分数可能与 0.5 或 0.25 不同。

编辑:
将原始答案中引用 Lucene v2.4 的链接更新为 v5.3.1。

You can implement your own scoring algorithm by extending Similarity class and passing it during search. In the Javadoc of this class (follow the link), you can read the details of the scoring algorithm. Some more text on scoring can be found here. An exceptional aid to understand scoring is to actually see the explanation for the scoring as returned by Searcher.explain()

BTW, the scoring you wish to implement is the default scoring. The order of results will be as desired, though actual scores can be different than 0.5 or 0.25.

EDIT:
Updated the links in the original answer, which referred to Lucene v2.4, to v5.3.1.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文