当前位置：文江博客话题详情

使用不同搜索空间大小的不同 lucene 搜索结果

发布于 2024-08-12 04:04:40 字数 271 浏览 7 评论 0原文

我有一个使用 lucene 进行搜索的应用程序。搜索空间有数千个。在这数千个搜索中，我只得到了一些结果，大约 20 个（这是正常的并且是预期的）。

然而，当我将搜索空间减少到只有那 20 个条目时（即我只对这 20 个条目建立索引并忽略其他所有内容......这样开发会更容易），我得到相同的 20 个结果，但顺序不同（和评分）。

我尝试通过 Field#setOmitNorms(true) 禁用范数因子，但仍然得到不同的结果？

是什么导致了评分的差异？

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

A君 2024-08-19 04:04:40

请参阅 Lucene 的相似度中的评分文档API。我的赌注是两种情况之间 idf 的差异（numDocs 和 docFreq 都不同）。为了确定，请使用 explain() 函数来调试分数。

编辑：用于获取解释的代码片段：

TopDocs hits = searcher.search(query, searchFilter, max);
ScoreDoc[] scoreDocs = hits.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocs) {
  String explanation = searcher.explain(query, scoreDoc.doc).toString();
  Log.debug(explanation);
}

Please see the scoring documentation in Lucene's Similarity API. My bet is on the difference in idf between the two cases (both numDocs and docFreq are different). In order to know for sure, use the explain() function to debug the scores.

Edit: A code fragment for getting explanations:

TopDocs hits = searcher.search(query, searchFilter, max);
ScoreDoc[] scoreDocs = hits.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocs) {
  String explanation = searcher.explain(query, scoreDoc.doc).toString();
  Log.debug(explanation);
}

回复收藏 0 原文