MySQL 匹配全文

发布于 2024-10-06 18:58:04 字数 920 浏览 7 评论 0原文

我正在尝试使用 mysql 进行全文搜索以匹配字符串。问题是它首先返回奇怪的结果。

例如,字符串 'passat 2.0 tdi' :

            AND MATCH (
            records_veiculos.titulo, records_veiculos.descricao
            )
            AGAINST (
             'passat 2.0 tdi' WITH QUERY EXPANSION
            )

将其作为第一个结果返回(其他结果都很好):

Volkswagen Passat Variant 1.9 TDI- ANO 2003

这是不正确的,因为此示例中没有“2.0”。

可能是什么?

编辑:此外,由于这可能是一个大型数据库(预计最多 500.000 条记录),这种搜索方法本身是最好的,还是安装任何其他搜索引擎(如 Sphinx)会更好?或者如果没有,如何显示相关结果?

edit2:郑重声明,尽管问题被标记为已回答,但 MySQL 分隔符的问题仍然存在,因此如果有人对如何转义分隔符有建议,我们将不胜感激,并值得 500 分危在旦夕。我发现增加结果集的解决方案是将WITH QUERY EXPANSION替换为IN BOOLEAN MODE,使用运算符强制引擎获取我需要的单词,例如:

AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
 '+passat +2.0 +tdi' IN BOOLEAN MODE
)

它根本没有解决,但至少结果的相关性因为变化很大。

Im' trying to do a fulltext search with mysql, to match a string. The problem is that it's returning odd results in the first place.

For example, the string 'passat 2.0 tdi' :

            AND MATCH (
            records_veiculos.titulo, records_veiculos.descricao
            )
            AGAINST (
             'passat 2.0 tdi' WITH QUERY EXPANSION
            )

is returning this as the first result (the others are fine) :

Volkswagen Passat Variant 1.9 TDI- ANO 2003

wich is incorrect, since there's no "2.0" in this example.

What could it be?

edit: Also, since this will probably be a large database (expecting up to 500.000 records), will this search method be the best for itself, or would it be better to install any other search engine like Sphinx? Or in case it doesn't, how to show relevant results?

edit2: For the record, despite the question being marked as answered, the problem with the MySQL delimiters persists, so if anyone has a suggestion on how to escape delimiters, it would be appreciated and worth the 500 points at stake. The sollution I found to increase the resultset was to replace WITH QUERY EXPANSION with IN BOOLEAN MODE, using operators to force the engine to get the words I needed, like :

AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
 '+passat +2.0 +tdi' IN BOOLEAN MODE
)

It didn't solve at all, but at least the relevance of the results as changed significantly.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

萤火眠眠 2024-10-13 18:58:04

来自 有关全文搜索的 MySQL 文档

< em>“全文解析器通过查找某些分隔符来确定单词的开始和结束位置;例如,“ ”(空格)、“,”(逗号)和“.” (句点)。”

这意味着句点分隔了 2 和 0。因此它不是在寻找“2.0”;而是在寻找“2.0”。它正在寻找“2”和“0”,但没有找到。 WITH QUERY EXPANSION 可能会导致相关的相关单词出现,从而避免“2”和“0”在结果排名中成为单独的单词。也可能会强制执行最低字符数要求。

From the MySQL documentation on Fulltext search:

"The FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example, “ ” (space), “,” (comma), and “.” (period)."

This means that the period is delimiting the 2 and 0. So it's not looking for '2.0'; it's looking for '2' and '0', and not finding it. WITH QUERY EXPANSION is probably causing relevant related words to show up, thus obviating the need for '2' and '0' to be individual words in the result rankings. A character minimum may also be being enforced.

池木 2024-10-13 18:58:04

默认情况下,我相信 mysql 只索引和匹配具有 4 个或更多字符的单词。你也可以尝试逃避经期?它可能会被忽略或以其他方式将其用作停止字符。

By default I believe mysql only indexes and matches words with 4 or more characters. You could also try escaping the period? It might be ignored this or otherwise using it as a stop character.

你是年少的欢喜 2024-10-13 18:58:04

它返回的匹配排名是多少?匹配是否必须包含所有“单词”,我的理解是它像谷歌一样工作,只需要匹配一些单词。

话虽如此,请注意添加 WITH QUERY EXPANSION 的效果,它会自动运行第二次搜索“相关”单词,这可能不是您输入的内容,但全文引擎认为是这样的可能有关。

相关文档: http://dev.mysql.com/ doc/refman/5.1/en/fulltext-query-expansion.html

What is the match rank that it returns for that? Does the match have to contain all "words" my understanding was it worked like Google and only needs to match some of the words.

Having said that, have a mind to the effect of adding WITH QUERY EXPANSION, that automatically runs a second search for "related" words, which may not be what you have typed, but which the fulltext engines deems probably related.

Relevant Documentation: http://dev.mysql.com/doc/refman/5.1/en/fulltext-query-expansion.html

那小子欠揍 2024-10-13 18:58:04

这 ”。”是您的查询结果中与 2003 相匹配的内容。

如果要搜索 3 个字符的文本字符串,则应设置 ft_min_word_len=3
在你的 mysql 配置中,重新启动 mysql。否则,搜索“tdi”将返回包含“TDI-”的结果,而不仅仅是“TDI”,因为包含“TDI-”的行将被索引,但单独的“TDI”则不会。

进行配置更改后,您必须在该表上重建索引。 (警告:您的索引现在可能会大得多。)

The "." is what's matching on 2003 in your query results.

If you're going to do searches on 3 character text strings, you should set ft_min_word_len=3
in your mysql config, restart mysql. Otherwise, a search for "tdi" will return results with "TDI-" but not with just "TDI", because rows with "TDI-" will be indexed but "TDI" alone will not.

After making that config change, you'll have to rebuild your index on that table. (Warning: your index might be significantly larger now.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文