作者姓名中的 sphinx 破折号导致搜索时出现问题

发布于 2024-12-09 06:43:10 字数 589 浏览 0 评论 0原文

我已经阅读了所有关于破折号的帖子,并尝试了其中提到的几乎所有内容,但无法弄清楚我遇到的一个奇怪的问题。

例如,我有一个这样的作者姓名:

Arturo Pérez-Reverte

搜索“pérez-reverte”不会出现任何内容,“pérez-reverte”也不会出现任何结果,因此转义破折号不是问题。 但搜索“蜘蛛侠”将会返回结果,证明破折号似乎有效。 然而,搜索“perez reverte”也会找到命中,因为它会单独搜索每个单词并在“perez-reverte”中找到“reverte”(但似乎找不到“perez”)。

搜索“pérez”或“perez”会找到相同数量的文档,这表明重音不是问题(我确实有一个 charset_table 可以解释重音字符)。

所以我对这里发生的事情感到非常困惑。如果不是重音也不是破折号,那会是什么?

我没有设置任何ignore_chars,我使用UTF-8并有一个charset_table将重音字符视为常规字符。

这两个术语之间的唯一区别是,其中一个是标题(蜘蛛侠),另一个是作者,但它们都是同一个 Sphinx 索引声明的一部分,所以我不认为这是任何问题方式。

任何帮助将不胜感激。

I've read all the posts about dashes and tried pretty much everything mentioned in them, yet cannot figure out a strange problem I'm having.

For example, I have an author name like this:

Arturo Pérez-Reverte

A search for 'pérez-reverte' will not turn up anything, nor will 'pérez-reverte' so escaping the dash is not the issue.
But a search for 'spider-man' will return hits, proving that the dash seems to be working.
However, a search for 'perez reverte' also finds a hit because it searches each word separately and finds the 'reverte' in 'perez-reverte' (but doesn't seem to find the 'perez').

A search for either 'pérez' or 'perez' finds the same number of documents, suggesting that the accent is not an issue (I do have a charset_table which accounts for accented characters).

So I'm very confused as to what's happening here. It if it isn't the accent and it isn't the dash, what could it be?

I don't have any ignore_chars set, I'm using UTF-8 and have a charset_table to treat accented characters as regular characters.

The only difference between these two terms is that one of them is a title (spider-man) and the other an author, but they are both part of the same Sphinx index declaration, so I don't see that as an issue in any way.

Any help would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

终陌 2024-12-16 06:43:10

经过一番斗争后,我发现即使我的数据库都是 UTF-8 且具有正确的排序规则,我也需要将其添加到 sphinx.conf 中以使一切正常工作:

sql_query_pre = SET NAMES utf8
sql_query_pre = SET CHARACTER SET utf8 

完成此操作并拥有正确的 charset_table 后,一切似乎工作顺利。

希望这对其他人有帮助。

After much fighting with it, I found out that even though my database is all UTF-8 with the proper collation I needed to add this in sphinx.conf for everything to work properly:

sql_query_pre = SET NAMES utf8
sql_query_pre = SET CHARACTER SET utf8 

After doing that, and having the proper charset_table, everything seems to be working fine.

Hope this helps someone else.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文