MySQL 匹配全文
我正在尝试使用 mysql 进行全文搜索以匹配字符串。问题是它首先返回奇怪的结果。
例如,字符串 'passat 2.0 tdi' :
AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
'passat 2.0 tdi' WITH QUERY EXPANSION
)
将其作为第一个结果返回(其他结果都很好):
Volkswagen Passat Variant 1.9 TDI- ANO 2003
这是不正确的,因为此示例中没有“2.0”。
可能是什么?
编辑:此外,由于这可能是一个大型数据库(预计最多 500.000 条记录),这种搜索方法本身是最好的,还是安装任何其他搜索引擎(如 Sphinx)会更好?或者如果没有,如何显示相关结果?
edit2:郑重声明,尽管问题被标记为已回答,但 MySQL 分隔符的问题仍然存在,因此如果有人对如何转义分隔符有建议,我们将不胜感激,并值得 500 分危在旦夕。我发现增加结果集的解决方案是将WITH QUERY EXPANSION替换为IN BOOLEAN MODE,使用运算符强制引擎获取我需要的单词,例如:
AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
'+passat +2.0 +tdi' IN BOOLEAN MODE
)
它根本没有解决,但至少结果的相关性因为变化很大。
Im' trying to do a fulltext search with mysql, to match a string. The problem is that it's returning odd results in the first place.
For example, the string 'passat 2.0 tdi' :
AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
'passat 2.0 tdi' WITH QUERY EXPANSION
)
is returning this as the first result (the others are fine) :
Volkswagen Passat Variant 1.9 TDI- ANO 2003
wich is incorrect, since there's no "2.0" in this example.
What could it be?
edit: Also, since this will probably be a large database (expecting up to 500.000 records), will this search method be the best for itself, or would it be better to install any other search engine like Sphinx? Or in case it doesn't, how to show relevant results?
edit2: For the record, despite the question being marked as answered, the problem with the MySQL delimiters persists, so if anyone has a suggestion on how to escape delimiters, it would be appreciated and worth the 500 points at stake. The sollution I found to increase the resultset was to replace WITH QUERY EXPANSION with IN BOOLEAN MODE, using operators to force the engine to get the words I needed, like :
AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
'+passat +2.0 +tdi' IN BOOLEAN MODE
)
It didn't solve at all, but at least the relevance of the results as changed significantly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
来自 有关全文搜索的 MySQL 文档:
< em>“全文解析器通过查找某些分隔符来确定单词的开始和结束位置;例如,“ ”(空格)、“,”(逗号)和“.” (句点)。”
这意味着句点分隔了 2 和 0。因此它不是在寻找“2.0”;而是在寻找“2.0”。它正在寻找“2”和“0”,但没有找到。 WITH QUERY EXPANSION 可能会导致相关的相关单词出现,从而避免“2”和“0”在结果排名中成为单独的单词。也可能会强制执行最低字符数要求。
From the MySQL documentation on Fulltext search:
"The FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example, “ ” (space), “,” (comma), and “.” (period)."
This means that the period is delimiting the 2 and 0. So it's not looking for '2.0'; it's looking for '2' and '0', and not finding it. WITH QUERY EXPANSION is probably causing relevant related words to show up, thus obviating the need for '2' and '0' to be individual words in the result rankings. A character minimum may also be being enforced.
默认情况下,我相信 mysql 只索引和匹配具有 4 个或更多字符的单词。你也可以尝试逃避经期?它可能会被忽略或以其他方式将其用作停止字符。
By default I believe mysql only indexes and matches words with 4 or more characters. You could also try escaping the period? It might be ignored this or otherwise using it as a stop character.
它返回的匹配排名是多少?匹配是否必须包含所有“单词”,我的理解是它像谷歌一样工作,只需要匹配一些单词。
话虽如此,请注意添加
WITH QUERY EXPANSION
的效果,它会自动运行第二次搜索“相关”单词,这可能不是您输入的内容,但全文引擎认为是这样的可能有关。相关文档: http://dev.mysql.com/ doc/refman/5.1/en/fulltext-query-expansion.html
What is the match rank that it returns for that? Does the match have to contain all "words" my understanding was it worked like Google and only needs to match some of the words.
Having said that, have a mind to the effect of adding
WITH QUERY EXPANSION
, that automatically runs a second search for "related" words, which may not be what you have typed, but which the fulltext engines deems probably related.Relevant Documentation: http://dev.mysql.com/doc/refman/5.1/en/fulltext-query-expansion.html
这 ”。”是您的查询结果中与 2003 相匹配的内容。
如果要搜索 3 个字符的文本字符串,则应设置 ft_min_word_len=3
在你的 mysql 配置中,重新启动 mysql。否则,搜索“tdi”将返回包含“TDI-”的结果,而不仅仅是“TDI”,因为包含“TDI-”的行将被索引,但单独的“TDI”则不会。
进行配置更改后,您必须在该表上重建索引。 (警告:您的索引现在可能会大得多。)
The "." is what's matching on 2003 in your query results.
If you're going to do searches on 3 character text strings, you should set ft_min_word_len=3
in your mysql config, restart mysql. Otherwise, a search for "tdi" will return results with "TDI-" but not with just "TDI", because rows with "TDI-" will be indexed but "TDI" alone will not.
After making that config change, you'll have to rebuild your index on that table. (Warning: your index might be significantly larger now.)