Sphinx搜索排名破了?

发布于 2024-12-15 12:37:31 字数 736 浏览 0 评论 0原文

有人曾经使用过 Sphinx 排名选项吗?我已经阅读了手册和书籍,但根本无法进行排名。据我了解,排名只是以不同的方式计算权重,不进行任何类型的排序。我的结果按@weight(内部sphinx字段)排序并使用扩展排序模式(您需要这样做),但看不到不同排名模式之间的任何差异。我的配置是这样的:

$cl->SetMatchMode( SPH_MATCH_EXTENDED2 );  
$cl->SetSortMode ( SPH_SORT_EXTENDED, "mylang DESC, @weight DESC, @id"); 

这些都没有任何区别:

$cl->setRankingMode(SPH_RANK_SPH04);
$cl->setRankingMode(SPH_RANK_PROXIMITY_BM25);

并且权重在两种模式下都是相同的。

最终,我想要实现的目标是将完全匹配的术语排序到顶部。例如,如果搜索“哈利·波特”,结果应如下所示:

Harry Potter
Harry Potter and the potters
Harry Potter and the Prisoner of Azkaban
Harry Potter and the Deathly Hallows: Part 1

这只是一个示例,但第一个结果应该是包含确切搜索词的结果,然后是其他结果。这并没有发生。有人有这方面的经验吗?

Has anyone ever gotten the Sphinx ranking options to work? I've read the manual and the book but cannot get ranking working at all. From what I understand, ranking simply computes the weights in a different manner, doesn't do any type of sorting. I have my results sorted by @weight (internal sphinx field) and using sort mode extended, which you need for this, yet cannot see any difference between different ranking modes. My config is something like this:

$cl->SetMatchMode( SPH_MATCH_EXTENDED2 );  
$cl->SetSortMode ( SPH_SORT_EXTENDED, "mylang DESC, @weight DESC, @id"); 

Neither of these makes any difference:

$cl->setRankingMode(SPH_RANK_SPH04);
$cl->setRankingMode(SPH_RANK_PROXIMITY_BM25);

And the weights are the same in either mode.

Ultimately, what I'm trying to achieve is to have terms that match exactly be sorted towards the top. So for example, if searching for "Harry Potter" the results should be as follows:

Harry Potter
Harry Potter and the potters
Harry Potter and the Prisoner of Azkaban
Harry Potter and the Deathly Hallows: Part 1

This is just an example, but the first result should be the one that contains the exact search term, then the others would follow. This is not happening. Anyone have any experience with this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

谁的年少不轻狂 2024-12-22 12:37:31

除了以“哈利·波特”开头的记录外,索引中还有其他记录吗?
如果不是,则短语“Harry Potter”将受到排名算法的惩罚。

请参阅我的文章:有关 Sphinx 中 BM25 的有趣的事情搜索

你们所有的记录都与“Harry Potter”完全匹配,所以我想单词越多的记录排名越高。

解决方案可能是使用以字节为单位存储记录大小的属性:

sql_query = select field, length(field) as f_size from ....

属性:

sql_attr_uint = f_size

Sphinx 排序模式:

$cl->SetSortMode ( SPH_SORT_ATTR_ASC, 'f_size' );

Do you have any other records in index except which start from "Harry Potter"?
If no, then phrase "Harry Potter" will be penalized by ranking algorithm.

See my article about that: Interesting thing about BM25 in Sphinx Search

All of you records have exact match for "Harry Potter", so I suppose records with more words would ranked higher.

Solution could be to use attribute which store records size in bytes:

sql_query = select field, length(field) as f_size from ....

Attribute:

sql_attr_uint = f_size

Sphinx sort mode:

$cl->SetSortMode ( SPH_SORT_ATTR_ASC, 'f_size' );

请别遗忘我 2024-12-22 12:37:31

结果发现0.9.9版本的sphinxapi.php文件中不包含SPH_RANK_SPH04!!!因此,即使您调用它,它也不会被考虑在内,而且不会产生错误。

这很糟糕,因为它使得故障排除变得非常困难。

我将其作为答案发布,希望对其他人有所帮助。我们为此疯狂地浪费了将近两天的时间,直到我们弄清楚了。

此外,2.0.1 中有一个错误,它并没有真正将一些精确的匹配带到前面,因为你需要 2.0.2(你需要从 SVN 获得)或更高版本,但我会非常厌倦在生产中使用实验版本。

希望 Sphinx 开发人员能够尽快解决这个问题。

聚苯乙烯
回顾开发者日记,它确实说:

“从 1.10-beta 开始,Sphinx 有 8 个不同的排名器”

我们从 0.9.9 升级到 2.0.1,并且必须留下 api 文件,绝望中我什至从未检查过这。如果排名模式不存在(就像匹配等其他模式一样),Sphinx 抛出错误仍然是件好事,而且据我们在测试中所知,2.0.1 的错误仍然存​​在。

Turns out that SPH_RANK_SPH04 is not included in the sphinxapi.php file in version 0.9.9!!! So even though you're calling it it's not taken into account and furthermore does not produce an error.

This is terrible because it makes it very hard to troubleshoot.

I've posted this as the answer in the hopes that it helps someone else. We lost almost 2 days going crazy over this until we figured it out.

Furthermore, there is a bug in 2.0.1 which doesn't really bring some exact matches to the front, for that you need 2.0.2 (which you need to get from SVN) or above, but I'd be very weary of using experimental versions in production.

Hopefully the Sphinx developers will take care of this soon.

PS
Looking back at the developer diaries, it does say:

"As of 1.10-beta, Sphinx has 8 different rankers"

We upgraded from 0.9.9 to 2.0.1 and must have left the api file behind, and in desperation I never even checked this. It would still be nice for Sphinx to throw an error if the ranking mode doesn't exist (as it does for other modes such as matching), and the 2.0.1 bug is still there as far as we can tell in our tests.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文