Mongodb 将重音字符匹配为基础字符
在 MongoDB“db.foo.find()”语法中,我如何告诉它匹配所有字母及其重音版本?
例如,如果我的数据库中有一个姓名列表:
若昂
弗朗索瓦
Jesús
我如何允许搜索字符串“Joao”、“Francois”或“Jesus”来匹配给定的名称?
我希望我不必每次都进行这样的搜索:db.names.find({name : /Fr[aã...][nñ][cç][所有重音 o 字符][所有重音 i 字符]s/ })
In MongoDB "db.foo.find()" syntax, how can I tell it to match all letters and their accented versions?
For example, if I have a list of names in my database:
João
François
Jesús
How would I allow a search for the strings "Joao", "Francois", or "Jesus" to match the given name?
I am hoping that I don't have to do a search like this every time:db.names.find({name : /Fr[aã...][nñ][cç][all accented o characters][all accented i characters]s/ })
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
从 Mongo 3.2 开始,您可以使用
$text
并将$diacriticSensitive
设置为 false:请参阅 Mongo 文档中的更多信息:https://docs.mongodb.com/manual/reference/operator/query/text/
As of Mongo 3.2, you can use
$text
and set$diacriticSensitive
to false:See more in the Mongo docs: https://docs.mongodb.com/manual/reference/operator/query/text/
我建议您添加一个索引字段,例如简化字符串的
NameSearchable
,例如搜索时可以使用在数据库中插入新项目时使用的相同映射。具有正确大小写和重音符号的原始字符串将被保留。
最重要的是,查询可以利用索引。 不区分大小写的查询和正则表达式查询不能使用索引(根正则表达式除外),并且在大型集合上增长速度会非常慢。
哦,由于可以从原始字符串创建简化字符串,因此将其添加到现有集合中不是问题。
I suggest you add an indexed field like
NameSearchable
of simplified strings, e.g.The same mapping that is used when inserting new items in the database can be used when searching. The original string with correct casing and accents will be preserved.
Most importantly, the query can make use of indexing. Case insensitive queries and regex queries can not use indexes (with the exception of rooted regexs) and will grow prohibitively slow on large collections.
Oh, and since the simplified strings can be created from the original strings, it's not a problem to add this to existing collections.
在此博客中: http://tech.rgou.net/en/php/pesquisas-nao-sectiveis-ao-caso-e-acento-no-mongodb-e-php/
有人用过你试图做的方法。据我所知,这是最新 MongoDB 版本的唯一解决方案。
In this blog: http://tech.rgou.net/en/php/pesquisas-nao-sensiveis-ao-caso-e-acento-no-mongodb-e-php/
Somebody used the approach you were trying to do. This is as far as I know the only solution for the latest MongoDB version.
看起来更像是 mongoDb 目前不支持的模糊匹配搜索。
你可以尝试的是:
/1。将每个条目的名称变体存储在集合中的单独元素中。然后可以通过查找搜索项是否存在于变体数组中来运行查询。
或
/2。为同一集合中的每个名称存储
soundex
字符串。然后对于您的搜索字符串,获取 soundex 字符串,并查询数据库,您将获得与您的查询具有相似Soundex
结果的结果。您可以在脚本中进一步过滤和验证该数据。
示例:
François 的 Soundex 代码 = F652,Francois 的 Soundex 代码 = F652
Jesús 的 Soundex 代码 = J220,Jesus 的 Soundex 代码 = J220
在此处查看更多信息:
http://creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm#SoundExConverter
It seems more like fuzzy matching search which mongoDb does not support currently.
What you can try is:
/1. Store variations of the name in seperate element in the collection for each entry. Then the query can be run by finding if the search term exists within the variations array.
or
/2. Store
soundex
string for each of the names in the same collection. Then for your search string, get a soundex string , and query the database, you will get result which has similarSoundex
result to your query.You can filter and verify that data more in your script.
example :
Soundex code for François = F652, Soundex Code for Francois = F652
Soundex code for Jesús = J220, Soundex Code for Jesus = J220
Check more here :
http://creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm#SoundExConverter