如何在使用 Solr/Lucene 时放弃低于特定分数的点击?

发布于 2025-01-04 05:39:28 字数 706 浏览 2 评论 0原文

我的问题是,搜索只是我的应用程序的一个小补充,我真的不想投入太多时间来深入研究整个想法。看看我的搜索结果 - 这是一种非常常见的模式,我得到一些非常好的匹配(7+)和一些非常非常糟糕的匹配,女巫得分约为 0.10。如果我想使用分数以外的任何其他标准对结果进行排序,那么它就没有什么意义,因为 0.10 几乎与查询无关,并且可能最终出现在列表中的第一个。

说真的,看起来将所有分数都削减到 3 左右会让我的结果更加一致,排序也会更有意义。

现在,在做了一些基础研究之后,看起来很多人都认为,按分数过滤 Solr 结果确实是个坏主意。关于如何做到这一点有一些建议,但我还没有找到可行的解决方案。

使用 frange (在正确的 q 查询或 qf 上)的建议想法并没有真正起作用。放弃应用程序本身的低分结果似乎也很乏味,因为它会破坏分页,减慢速度,并且通常会产生许多不必要的工作。

在 Google 上大约一个小时后,我发现很多人真的想要这个解决方案,尽管我找不到任何适合我的东西。

那么,有什么办法可以放弃 solr 方面的低分结果吗?有没有自定义过滤器可以做到这一点?

编辑:

由于某种原因,大部分结果在底部都有显着的分数差距。例如,最后一个相关结果的得分为 4.5,并且总是有一些结果,下一个最高的结果为 0.12...也许我在索引级别上做错了什么?有没有什么简单的方法可以将那些不相关的结果从结果哈希中推下来?经过更多的研究后,看起来我在放弃 < 后会变得更不好。 1 分...

My problem is, that search is a small addition to my Application and I don't really want to invest that much time into digging into the whole idea. Looking at my search results - its a very common pattern that I get some very good matches (7+) and some very very bad matches, witch score like 0.10. If I would like to sort the results using any other criteria than score, it will make very little sense, as the 0.10 have almost nothing to do with the query and might end up first on the list.

Seriously, it looks like cutting everything below score of around 3 will make my results way more consistent and sorting will make much more sense.

Now, after doing some basic research, it looks like lots of people think, that filtering Solr results by score is really bad idea. There are some hits on how to do this, but I couldn't find a working solution yet.

The suggested ideas with using frange (on both the proper q query or qf) doesn't really work. Ditching the low score results in the App itself seems pretty dull as well, since it will break pagination, slow things down and in general yield in a lot of unnecessary work.

After roughly na hour on the Google I found out that a lot of people really want this solution, though I couldn't find anything which works for me.

So, is there any way at all to ditch low score results on the solr side? Are there any custom Filters to do that?

Edit:

Vast of the results have a significant score gap at the bottom for some reason. For example the last relevant result get say 4.5 score and there is a always few more results with next highest one at 0.12... Maybe I am doing something wrong on the index level? Is there any simple way to push those irrelevant results down off the result hash? After some more research looks like that I would be more less ok after just ditching the < 1 scores...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

似狗非友 2025-01-11 05:39:28

在应用程序级别进行救助似乎是大多数人所做的。

一个想法是选择您喜欢的百分比,然后查看第一个文档并将其用作分母,然后将每个后续文档用作分子,然后停止在您的比率以下。但我同意在这个级别上这样做确实会搞乱分页等。

另一个想法是编写一个自定义 Solr 插件,强制分数低于某个点为零 - 这将修复分页和方面等。开始的地方是默认的“相似度”评分代码(这个名字有点奇怪,我自己也路过了几次)

Bailing out at the application level seems to be what most folks do.

One idea is to pick a percentage that you like, then look at the first doc and use it as the denominator, and then each subsequent doc as the numerator, and then stop below your ratio. But I agree doing it at this level does mess up paging, etc.

Another idea is to write a custom Solr plugin that forces the score to zero below some point - that would fix the pagination and facets, etc. The place to start would be the default "Similarity" scoring code (the name is a bit odd, I had passed by it a few times myself)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文