如何标准化 Lucene 分数？

发布于 2024-10-25 14:34:20 字数 224 浏览 2 评论 0原文

我需要将 Lucene 分数标准化为 0 到 1 之间。

例如，随机查询返回以下分数...

最大分数是多少？ 10.0？

谢谢

原文

I need to normalize the Lucene scores between 0 and 1.

For example, a random query returns the following scores...

What's the biggest score ? 10.0 ?

thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧瑾黎汐 2024-11-01 14:34:20

您可以将所有分数除以最大分数以获得 0 到 1 之间的分数。

但是，请注意，标准化分数只能用于比较单个查询的结果。比较 2 个不同查询的结果的分数（标准化或非标准化）是不正确的。

回复收藏 0 原文

另类 2024-11-01 14:34:20

没有好的标准方法可以使用 lucene 标准化分数。阅读此内容：ScoresAsPercentages 和此解释

在您的情况下，如果结果按分数排序，则最高分数是第一个结果的分数。但这个分数对于其他每个查询都会有所不同。

另请参阅 how-do-i-normalise-a-solr-lucene-得分

回复收藏 0 原文

黄昏下泛黄的笔记 2024-11-01 14:34:20

Solr 中没有最高分数，它取决于太多变量，因此无法预测。

但是您可以实现称为标准化分数（分数百分比）的方法，但不建议这样做。

有关更多详细信息，请参阅相关链接：

是否可以“合理”地设置 Solr 分数阈值，而与返回的结果无关？（即 Solr 评分是否以任何方式标准化）

如何做我标准化了 solr/lucene 分数吗？

在 Solr/Lucene 中删除低于特定分数阈值的结果？

回复收藏 0 原文

余生再见 2024-11-01 14:34:20

常规标准化只会帮助您比较查询（及其检索到的列表）之间的评分分布。
您不能简单地标准化分数来比较查询之间的性能。
考虑一个查询，其中所有检索到的文档都高度相关并获得相同的（高分），并且在另一个查询中，检索到的列表包含大麦相关文档（同样，具有相同的分数） - 现在，无论每个查询的规范化如何你所做的 - 标准化分数将是相同的。

您需要考虑一个可以使所有分数达到同一水平的交叉查询因素。

例如 - 也许计算查询和整个索引之间的相似性，并以某种方式将该分数与文档分数一起使用

回复收藏 0 原文

-黛色若梦 2024-11-01 14:34:20

如果您想比较两个或多个查询，我找到了一个解决方法。
您可以使用 LevenstheinDistance 或 LuceneLevenstheinDistance(Damerau) 类将得分最高的文档与查询词进行比较，以获取查询词与结果之间的距离。结果就是它们之间的相似性。对您想要比较的每个查询执行此操作。现在，您有一个工具可以使用 querytherm 和最高结果的相似性来比较您的查询。您现在可以选择相似度最高的查询，并将其用于下一步正确的操作。

    //Damerau LevenstheinDistance
    LuceneLevenshteinDistance d = new LuceneLevenshteinDistance();

    similiarity = d.getDistance(queryterm, yourResult );

If you want to compare two or more queries, i found an workaround.
You can compare your highest scored document with your queryterm using the LevenstheinDistance or LuceneLevenstheinDistance(Damerau) class to get the distance between your queryterm and your result. The result is the similiarity between them. Do this for each query you want to compare against. Now you have a tool to compare your queries using the similiarity of your querytherm and your highest result. You can now choose the query with the highest score of similiarity and use this for next proper actions.

    //Damerau LevenstheinDistance
    LuceneLevenshteinDistance d = new LuceneLevenshteinDistance();

    similiarity = d.getDistance(queryterm, yourResult );

回复收藏 0 原文