MySql 全文搜索是否可以合理地处理非拉丁语言(希伯来语、阿拉伯语、日语...)
MySql 全文搜索对于非拉丁语言是否可以正常工作? (希伯来语、阿拉伯语、日语...)
添加:做了一些测试...希伯来语有一些问题。示例:מוסמנזון 这个名字与 מושמנזון 发音相同,但搜索找不到另一个,因为这是希伯来语中常见的拼写错误,看来我必须进行一些数据操作才能使其完美工作。
Does MySql full text search works reasonably with non-Latin languages? (Hebrew, Arabic, Japanese...)
Addition: Did some tests... It has some problems with Hebrew. Example: The name מוסינזון is pronounced the same as מושינזון but searching one won't find the other, as this is a common spelling error in Hebrew, it seems I will have to do some data manipulation for it to work perfectly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
虽然MySQL对希伯来语的支持是有限的,但你的问题更多的是人们使用不正确的拼写的问题,然后从这个角度来看是MySQL服务器的功能障碍。当您在 Google 中拼错某个单词时,它会向您显示一条建议,您可以单击该建议来搜索该术语。
也许您可以构建一些具有相同行为的程序,例如您可以创建一个包含 2 个字段的表,一个包含常见拼写错误的单词,另一个包含正确的拼写。然后,您可以构建一个程序来查找拼写错误的单词并显示建议。
Though Hebrew support in MySQL is limited, your problem is more a problem of people using incorrect spelling, then a dysfunction of the MySQL server in this perspective. When you misspell a word in Google, it will show you a suggestion and you can click on that suggestion to search for that term.
Perhaps you could build some program that has the same behaviour, e.g. you could create a table that has 2 fields, one containing the commonly misspelled word and the other containing the correct spelling. You could then build a program that finds the misspelled word and displays the suggestion.
只要您的 排序规则 设置正确,它效果很好。
当然,Unicode 可以解决大部分问题。但这并不能很好地将拉丁字符翻译为它们(例如,在荷兰语排序规则中
aa
将被识别为å
)。So long as your collation is set properly, it works splendidly.
Unicode will work for most of this, of course. But that doesn't really translate Latin characters to them very well (for example, in a Dutch collation
aa
will be recognized aså
).是 MySQL 全文搜索非常适合阿拉伯语。只需在需要时确保以下内容:
COLLATION = utf8_unicode_ci
&字符集 = utf8
。 (数据库、表和列)。ft_min_word_len = 3
(请参阅显示类似“ft_%”的变量;
)Yes MySQL fulltext search works well for Arabic. Just make sure of the following where needed:
COLLATION = utf8_unicode_ci
&CHARACTER SET = utf8
. (Databases, Tables, and Columns).ft_min_word_len = 3
(seeshow variables like "ft_%";
)是的,但是,请查看停用词是什么。
Yes, however, check out what stopwords are.
日语
和中文
使用它们自己的空白符号,MySQL
无法理解。确保要索引的文本中的单词用
ASCII
分隔符(空格、逗号等)分隔。任何超出ASCII
范围的内容都可能不起作用。此外,您可能需要修复
ft_min_word_len
:默认情况下,MySQL
不会索引短于4
个字符的单词,并且大多数>日语
和中文
单词。在西里尔语中,音译错误非常常见。
此序列中的所有字母:
АВЕКМНОРСТуХ / ABEKMHOPCTyX
在大多数字体中都无法区分。其中最糟糕的是西里尔文
С
/ 拉丁文C
:这两个符号都位于键盘上的一个键上,在大多数字体中完全没有区别,但它们有不同的功能代码。MySQL
也不会捕获它。Japanese
andChinese
use their own whitespace symbols thatMySQL
does not understand.Make sure that the words in the texts you are going to index are separated with
ASCII
separators (spaces, commas etc). Anything outside theASCII
range will probably not work.Besides, you'll probably need to fix
ft_min_word_len
: by default,MySQL
won't index words shorter than4
characters, and mostJapanese
andChinese
words.In
Cyrillic
languages transilteration errors are quite common.All letters from this sequence:
АВЕКМНОРСТуХ / ABEKMHOPCTyX
are indistinguishable in most fonts.The worst of them is Cyrillic
С
/ LatinC
: both these symbols are located on one key on the keyboard and do not differ at all in most fonts, but they have different codes.MySQL
will not catch it either.