RAILS3：搜索忽略变音符号？

发布于 2024-12-10 05:43:04 字数 509 浏览 2 评论 0原文

我有一个包含 Article 对象的 Rails 3 应用程序。他们有一个标题属性。在添加新文章之前，人们应该搜索该标题的文章是否已经存在。

今天有人举报了一篇重复的文章。事实证明，添加它的人首先搜索了它，但标题中的“o”上方有一个元音变音。他们使用常规“o”字符在没有元音变音的情况下进行搜索，但没有找到它，并添加了重复项。

我正在使用范围对标题属性进行简单查找，如下所示：

scope :search, lambda { |term| where('title like ?', "%#{term}%") }

我想知道是否有一种简单的方法来“忽略”变音符号，以便该人可以键入“o”并且仍然可以找到一篇文章，如果o 有一个变音符号，其他变音符号也是如此。

我考虑过创建一个 search_title 属性，并在更新时自己填充它，用普通的等效项替换变音符号，但这有其自身的问题，其中，如果有人使用变音符号怎么办。

我希望有一个简单的解决方案，但我并没有抱太大希望。 :-)

原文

I have a Rails 3 app that contains Article objects. They have a title attribute. Before adding a new article, people are supposed to search to see if it an article with the title already exists.

Today someone reported a duplicate article. Turns out whoever added it had searched for it first, but there was an umlaut over an "o" in the title. They searched without the umlaut using a regular "o" character, didn't find it, and added the duplicate.

I'm doing a simple find on the title attribute with a scope, as below:

scope :search, lambda { |term| where('title like ?', "%#{term}%") }

I'm wondering if there's a simple way to "ignore" diacritics, so that the person could type an "o" and still find an article if the o has an umlaut, and the same for other diacritics.

I've considered creating a search_title attribute and populating it myself on update replacing the diacritics with their plain equivalents, but that has its own problems, among them, what if someone then does use the diacritic.

I was hoping there might be an easy solution for this, but I'm not holding out much hope. :-)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浮世清欢 2024-12-17 05:43:04

我建议创建一个 search_title 字段并将其存储在 title.to_ascii_brutal （使用此插件： https://github.com/tomash /ascii_tic）。然后将搜索范围更改为：

scope :search, lambda { |term| where('search_title like ?', "%#{term.to_ascii_brutal}%") }

I suggest to create a search_title field and store there title.to_ascii_brutal (use this plugin: https://github.com/tomash/ascii_tic). And then change your search scope to:

scope :search, lambda { |term| where('search_title like ?', "%#{term.to_ascii_brutal}%") }

回复收藏 0 原文

谈情不如逗狗 2024-12-17 05:43:04

是的，处理此问题的标准方法是维护影子搜索字段。除了将所有数据更改为 Ascii 之外，请考虑：

将所有数据更改为大写以消除大小写问题，
删除所有非数字、字母或空格的字符。（删除标点符号、制表符等）
删除“停用词”，例如“is”“the” “a”等。当然，停用词取决于语言。

另一种策略是根据 Soundex 分数进行计算和搜索。（或使用 Soundex 的修订版本）。有用于 Soundex 的 Ruby 库，或者您可以编写自己的库。

Soundex 会给你更多的误报——你需要确定你是否愿意有更多的误报，或者可能错过一场比赛（误报），因为一个标题是“瘟疫”，另一个是“瘟疫”

你也可以安装一个真正的全文检索系统，可以通过打开MySQL系统或通过单独的系统。

回复收藏 0 原文

~没有更多了~

关于作者

神妖

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

RAILS3：搜索忽略变音符号？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

RAILS3：搜索忽略变音符号？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。