Sphinx 根据排序返回不一致的结果集

发布于 2024-12-27 02:40:46 字数 850 浏览 2 评论 0原文

我正在尝试为我正在开发的网络应用程序实现多语言索引。目前,记录有几种语言版本:英语、马来语和马来语。阿拉伯语(但它们没有分成不同的列)。目前仅启用英语词干分析器。

仅构建两个索引,即词干索引和非词干索引。我遇到了词干索引的问题,因为返回的结果集不一致,具体取决于排序列。

这两个查询(来自词干索引)每个都返回不同数量的总结果,尽管它们之间的区别只是排序顺序。

SELECT * FROM test1stemmed WHERE MATCH('@institution universiti') GROUP BY art_id ORDER BY art_title_ord ASC;

SELECT * FROM test1stemmed WHERE MATCH('@institution universiti') GROUP BY art_id ORDER BY art_title_ord DESC;

但是,如果在非词干索引上运行相同的查询,则结果数量相等。

我在使用 Sphinx PHP API 时也遇到同样的问题:

$sp = new SphinxClient();
$sp->SetServer('localhost', 9312);
$sp->SetMatchMode(SPH_MATCH_EXTENDED);
$sp->SetGroupBy('art_id', SPH_GROUPBY_ATTR, "$sp_sort_column $sort");
$sp->SetLimits($offset, $rows_per_page, 1000);
$sp->Query("$q", 'test1stemmed');

我缺少什么?

I'm trying to implement multilingual indexes for the web application I'm developing. At the moment, records exist in a few languages, English, Malay & Arabic (but they are not separated into different columns). Only English stemmer is currently enabled.

Only two indexes are built, for the stemmed and the non-stemmed indexes. I'm having the problem with the stemmed index, as the result set returned is not consistent, depending on the sort column.

These two queries (from the stemmed index), each returns a different number of total results, although the difference between them is only the sort order.

SELECT * FROM test1stemmed WHERE MATCH('@institution universiti') GROUP BY art_id ORDER BY art_title_ord ASC;

SELECT * FROM test1stemmed WHERE MATCH('@institution universiti') GROUP BY art_id ORDER BY art_title_ord DESC;

However, if the same queries were run on the non-stemmed index, the numbers of results are equal.

I'm also having the same problem with Sphinx PHP API:

$sp = new SphinxClient();
$sp->SetServer('localhost', 9312);
$sp->SetMatchMode(SPH_MATCH_EXTENDED);
$sp->SetGroupBy('art_id', SPH_GROUPBY_ATTR, "$sp_sort_column $sort");
$sp->SetLimits($offset, $rows_per_page, 1000);
$sp->Query("$q", 'test1stemmed');

What am I missing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

过潦 2025-01-03 02:40:46

我在此处的文档中错过了一些东西 http://sphinxsearch.com/docs/2.0.2 /聚类.html

警告:分组是在固定内存中完成的,因此其结果只是近似值;因此,total_found 中报告的组可能比实际存在的组多。 @count 也可能被低估。为了减少不准确性,应该提高 max_matches。如果 max_matches 允许存储所有找到的组,结果将 100% 正确。

因此,我可以通过增加 max_matches 中的值来解决此问题,但由于放置非常大的值绝对是不可取的,因此我会修复查询。

Something that I missed from the documentation here http://sphinxsearch.com/docs/2.0.2/clustering.html

WARNING: grouping is done in fixed memory and thus its results are only approximate; so there might be more groups reported in total_found than actually present. @count might also be underestimated. To reduce inaccuracy, one should raise max_matches. If max_matches allows to store all found groups, results will be 100% correct.

So I can workaround this by increasing the value in max_matches, but since putting a very large value is absolutely undesirable, I would fix the query instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文