是否可以“合理”地设置 Solr 分数阈值,而与返回的结果无关? (即 Solr 评分是否以任何方式标准化)

发布于 2024-12-17 11:11:24 字数 251 浏览 0 评论 0原文

我有一个包含许多条目的 Solr 索引,并且在查询时返回一些子集 - 每个条目都有一些分数(显而易见)。一旦结果与分数一起返回,我希望能够仅“保留”高于某个分数的结果(即仅具有特定质量的结果)。当返回的子集可以是任何东西时是否可以这样做?

我问这个问题是因为在某些查询中,0.008 的分数似乎会导致良好的匹配,而其他查询则较高的分数会导致较差的匹配。

理想情况下,我只是在寻找一种方法来获取前 x 条目,只要它们至少具有一定的质量。

I have a Solr index with many entries, and upon query some subset is returned - each entry having some score, (Obvious). Once the results are returned with scores, I want to be able to only "keep" results that are above some score (i.e. results of a certain quality only). Is it possible to do this when the returned subset could be anything?

I ask because it seems like on some queries a score of say 0.008 is resulting in a decent match, whereas other queries a higher score results in a poor match.

Ideally I'm just looking for a method to take the top x entries as long as they are of at least a certain quality.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

与往事干杯 2024-12-24 11:11:24

我认为你不应该这样做。使用 TF-IDF 评分模型,无法计算出高于该分数的所有结果都相关的分数,反之亦然。如果您设法做到这一点,那么在对索引进行几次更新后,该阈值很可能将不再有效(因为文档频率会发生变化)。

如果您仍然想这样做,我认为可以使用函数查询来实现:Solr 中有一个 if (在主干中)和一个 query 函数。只需过滤您的结果,以便仅保留分数高于给定阈值的条目。

I think you should not do this. With the TF-IDF scoring model, there is no way to compute a score above which all results are relevant and vice-versa. And if you manage to do this, it is very likely that this threshold will not be valid anymore after a few updates to your index (because document frequencies will change).

If you still want to do this, I think it is achievable using function queries : there are a if (in trunk), and a query functions available in Solr. Just filter your results so that you only keep entries which have a higher score than a given threshold.

比忠 2024-12-24 11:11:24

还想先浏览 ScoresAsPercentages

Solr 不会标准化分数,因为它可以在客户端轻松完成。
您可以使用结果中提供的 maxScore,将所有分数除以
最大分数。
第一个记录的得分为 1,然后是其余记录。

Would also like to go through ScoresAsPercentages first.

Solr does not normalize scores since it may be easily done at the client side.
you can use the maxScore which is provided in the results, by dividing all scores by
maxScore.
The first record will have the score of one followed by the rest.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文