RAILS3:搜索忽略变音符号?
我有一个包含 Article 对象的 Rails 3 应用程序。他们有一个标题属性。在添加新文章之前,人们应该搜索该标题的文章是否已经存在。
今天有人举报了一篇重复的文章。事实证明,添加它的人首先搜索了它,但标题中的“o”上方有一个元音变音。他们使用常规“o”字符在没有元音变音的情况下进行搜索,但没有找到它,并添加了重复项。
我正在使用范围对标题属性进行简单查找,如下所示:
scope :search, lambda { |term| where('title like ?', "%#{term}%") }
我想知道是否有一种简单的方法来“忽略”变音符号,以便该人可以键入“o”并且仍然可以找到一篇文章,如果o 有一个变音符号,其他变音符号也是如此。
我考虑过创建一个 search_title 属性,并在更新时自己填充它,用普通的等效项替换变音符号,但这有其自身的问题,其中,如果有人使用变音符号怎么办。
我希望有一个简单的解决方案,但我并没有抱太大希望。 :-)
I have a Rails 3 app that contains Article objects. They have a title attribute. Before adding a new article, people are supposed to search to see if it an article with the title already exists.
Today someone reported a duplicate article. Turns out whoever added it had searched for it first, but there was an umlaut over an "o" in the title. They searched without the umlaut using a regular "o" character, didn't find it, and added the duplicate.
I'm doing a simple find on the title attribute with a scope, as below:
scope :search, lambda { |term| where('title like ?', "%#{term}%") }
I'm wondering if there's a simple way to "ignore" diacritics, so that the person could type an "o" and still find an article if the o has an umlaut, and the same for other diacritics.
I've considered creating a search_title attribute and populating it myself on update replacing the diacritics with their plain equivalents, but that has its own problems, among them, what if someone then does use the diacritic.
I was hoping there might be an easy solution for this, but I'm not holding out much hope. :-)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我建议创建一个 search_title 字段并将其存储在 title.to_ascii_brutal (使用此插件: https://github.com/tomash /ascii_tic)。然后将搜索范围更改为:
I suggest to create a search_title field and store there title.to_ascii_brutal (use this plugin: https://github.com/tomash/ascii_tic). And then change your search scope to:
是的,处理此问题的标准方法是维护影子搜索字段。除了将所有数据更改为 Ascii 之外,请考虑:
另一种策略是根据 Soundex 分数进行计算和搜索。 (或使用 Soundex 的修订版本)。有用于 Soundex 的 Ruby 库,或者您可以编写自己的库。
Soundex 会给你更多的误报——你需要确定你是否愿意有更多的误报,或者可能错过一场比赛(误报),因为一个标题是“瘟疫”,另一个是“瘟疫”
你也可以安装一个真正的全文检索系统,可以通过打开MySQL系统或通过单独的系统。
Yes, a standard way to handle this is to maintain a shadow search field. In addition to changing all the data to Ascii, consider:
An alternative strategy is to compute and search based on the Soundex score. (Or use a revised version of Soundex). There are Ruby libraries for Soundex or write your own.
Soundex will give you more false positives--you need to determine if you'd rather have more false positives or perhaps miss a match (a false negative) because one title was "Plague" and the other was "Plagues"
You could also install a real full-text search system, either by turning on the MySQL system or via a separate system.