使用 Rails 进行全文搜索
我一直在寻找 Rails 的插件/宝石。 大多数文章将 Ferret (Lucene) 与 Ultrasphinx 或可能的 Thinking Sphinx 进行比较,但没有一篇讨论 SearchLogic< /a>. 有谁知道如何进行比较吗? 您使用什么以及它的性能如何?
I've been looking into searching plugins/gems for Rails. Most of the articles compare Ferret (Lucene) to Ultrasphinx or possibly Thinking Sphinx, but none that talk about SearchLogic. Does anyone have any clues as to how that one compares? What do you use, and how does it perform?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
thinking_sphinx 和 sphinx 工作得很好,没有索引、查询、安装问题(5 或 6 个安装,包括生产 slicehost)
为什么不是每个人都使用 sphinx,比如 craigslist? 在这里阅读有关其局限性的信息(一年半前的文章。sphinx 开发人员 Aksyonoff 正在研究这些问题,他正在以惊人的速度添加功能和可靠性并消除错误)
http://codemonkey.ravelry.com/2008/01/09/sphinx-for-search/
http://www.ibm.com/developerworks/opensource/library /os-php-apachesolr/
全文搜索引擎的比较 - Lucene、Sphinx、Postgresql、MySQL?
ferret:安装方便,词干不正确,索引速度非常慢(一个 mysql 数据库:sphinx:3 秒) 、雪貂:50分钟)。 负载下生产中的 drb 服务器中记录良好的问题(索引损坏)。 话虽如此,自从三年前acts-as_ferret问世以来,我就一直在开发中使用它,并且它对我很有帮助。 在某些情况下,不遵守 porter 词干提取是一种优势。
Lucene 和 Solr 是开源搜索领域的大猩猩/麦克卡车/重量级冠军。 这些团队在 solr 14 发布:
acts-as-solr:一旦 Tomcat 或 Jetty 就位,效果很好,但有时会很痛苦。 mattmatt 的 AAS 分支 是主要分支,但该项目相对缺乏维护。
关于 tomcat 安装:SOLR/lucene 无疑是我见过的任何软件包中最好的知识库/支持搜索引擎(我想我并不那么惊讶),这里的搜索框:
http://www.lucidimagination.com/
Sunspot 新的 ruby 包装器,基于 solr-ruby 构建。 看起来很有希望,但我无法将其安装在 OSX 上。 通过 AR 索引所有 ruby 对象,而不仅仅是数据库
真正有启发性的一件事是安装 2 个搜索插件,例如 sphinx 和 SOLR、sphinx 和 ferret,并查看它们返回的不同结果。 就像
@sphinx_results - @ferret_results
一样简单,
刚刚看到这篇文章和回复
一样简单http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines -and-indexing-twitter/
http://www.jroller.com/otis /entry/open_source_search_engine_benchmark
http://www .flax.co.uk/blog/2009/07/07/xapian-compared/
thinking_sphinx and sphinx work beautifully, no indexing, query, install problems ever (5 or 6 install, including production slicehost )
why doesn't everybody use sphinx, like, say craigslist? read here about its limitations (year and a half old articles. The sphinx developer, Aksyonoff, is working on these and he's putting in features and reliability and stamping out bugs at an amazing pace)
http://codemonkey.ravelry.com/2008/01/09/sphinx-for-search/
http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/
Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?
ferret: easy install, doesn't stem properly, very slow indexing (one mysql db: sphinx: 3 seconds, ferret: 50 minutes). Well documented problems (index corruption) in drb servers in production under load. Having said that, i have use it in develometn since acts-as_ferret came out 3 years ago, and it has served me well. Not adhering to porter stemming is an advantage in some contexts.
Lucene and Solr is the gorilla/mack truck / heavyweight champ of open source search. The teams have been doing an impressive number of new features in solr 14 release:
acts-as-solr: works well, once the tomcat or jetty is in place, but those sometimes are a pain. The A-A-S fork by mattmatt is the main fork, but the project is relatively unmaintained.
re the tomcat install: SOLR/lucene has unquestionably the best knowledge base/ support search engine of any software package i've seen ( i guess i'm not that surprised), the search box here:
http://www.lucidimagination.com/
Sunspot the new ruby wrapper, build on solr-ruby. Looks promising, but I couldn't get it to install on OSX. Indexes all ruby objects, not just databases through AR
one thing that's really instructive is to install 2 search plugins, e.g. sphinx and SOLR, sphinx and ferret, and see what different results they return. It's as easy as
@sphinx_results - @ferret_results
just saw this post and responses
http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/
http://www.jroller.com/otis/entry/open_source_search_engine_benchmark
http://www.flax.co.uk/blog/2009/07/07/xapian-compared/
首先,我明显的偏见:我创建并维护了 Thinking Sphinx。
碰巧的是,我昨晚确实看到 Ben Johnson(SearchLogic 的创建者)出席了纽约红宝石会议。 SearchLogic 仅支持 SQL - 因此,如果您不处理大量表,并且不需要相关性排名,那么它可能正是您正在寻找的。 语法也非常干净。
但是,如果您希望所有查询智能都由不属于您自己的代码处理,那么 Sphinx 或 Solr(我认为其底层是 Lucene)可能会更好。
First off, my obvious bias: I created and maintain Thinking Sphinx.
As it so happens, I actually saw Ben Johnson (creator of SearchLogic) present at the NYC ruby meet about it last night. SearchLogic is SQL-only - so if you're not dealing with massive tables, and relevance rankings aren't needed, then it could be exactly what you're looking for. The syntax is pretty clean, too.
However, if you want all the query intelligence handled by code that is not your own, then Sphinx or Solr (which is Lucene under the hood, I think) is probably going to work out better.
SearchLogic 是一个很好的插件,但它的真正目的是让你的搜索代码更具可读性,它不提供 Sphinx 所提供的自动索引功能。 我没用过 Ferret,但 Sphinx 非常强大。
http://railscasts.com/episodes/120-thinking-sphinx
很好的介绍看看它有多灵活。
SearchLogic is a good plugin, but is really meant to make your search code more readable, it doesn't provide the automatic indexing that Sphinx does. I haven't used Ferret, but Sphinx is incredibly powerful.
http://railscasts.com/episodes/120-thinking-sphinx
Great introduction to see how flexible it is.
我没有使用过SearchLogic,但我可以告诉你Lucene是一个非常成熟的项目,有多种语言的实现。 它快速、灵活,并且 API 使用起来很有趣。 这是一个不错的选择。
I have not used SearchLogic but I can tell you that Lucene is a very mature project, that has implementation in many languages. It is fast and flexible and the API is fun to work with. It's a good bet.
鉴于这个问题在谷歌全文搜索中仍然排名很高,我真的想说Sunspot如果您有兴趣向 Rails 应用程序添加全文搜索功能(并且希望 Solr 为您提供支持),那么现在它会更加强大。 您可以查看有关此的完整教程
在我们讨论这个问题的同时,该领域的另一个竞争者是 ElasticSearch,它的目标是成为构建在 Lucene 之上的实时全文搜索引擎(但与 Solr 相比,其工作方式有所不同)。 ElasticSearch 包括开箱即用的分片和复制到多个节点、更快的实时搜索、“过滤器”,让您可以在符合您条件的内容可用时收到通知,并且它的移动速度非常快,还有更多其他功能。 在它之上构建一些东西很容易,因为 API 非常简单并且完全基于使用 JSON 作为格式的 REST。 有人可能会说你甚至不需要插件来使用它。
Given this question is still highly ranked at google for full text search, I'd really like to say that Sunspot is even stronger today if you're interested in adding full text search capabilities to your Rails application (and would like to have Solr behind you for that). You can check a full tutorial on this here.
And while we're at it, another contender that has arrived in the field is ElasticSearch, that aims to be a real time full text search engine built on top of Lucene (but doing things differently when compared to Solr). ElasticSearch includes out-of-the-box sharding and replication to multiple nodes, faster real time search, "percolators" to allow you to receive notifications when something that matches your criteria becomes available and it's moving really fast with many more other features. It's easy to build something on top of it, since the API is dead simple and completely based on REST using JSON as a format. One could say you don't even need a plugin to use it.
就我个人而言,我不关心 Web 应用程序的数据库不可知论,并且很高兴在 pg83 中使用全文搜索。 好处是,如果您更改框架/语言,您仍然可以进行全文搜索。
Personally, I don't bother with database agnostics for web applications and am quite happy using the full text search in pg83. The benefit is, if and when you change your framework/language, that you will still have full text search.
全文索引和
MATCH() AGAINST()
。如果您只想对表中的几个文本列进行快速搜索,则只需使用这些列的全文索引并在查询中使用
MATCH() AGAINST()
即可。在迁移文件中创建全文索引:
使用该索引查询:
ElasticSearch 和 Searchkick
如果您正在寻找一个完整的搜索索引解决方案,该解决方案允许您搜索任何记录中的任何列,同时仍然闪电快,看看 ElasticSearch 和 Searchkick。
ElasticSearch 是索引和搜索引擎。
Searchkick 是与 Rails 的集成库,可以非常轻松地对记录进行索引和搜索。
Searchkick 的 README 在解释如何启动、运行以及微调方面做得非常出色您的设置,但这里有一个小片段:
安装并启动 ElasticSearch。
将
searchkick
gem 添加到您的包中:--strict
选项只是告诉 Bundler 在 Gemfile 中使用准确的版本,我强烈推荐这样做。添加
searchkick
到您想要索引的模型:为您的记录建立索引。
搜索您的索引。
Full Text Indexing and
MATCH() AGAINST()
.If you're just looking to do a fast search against a few text columns in your table, you can simply use a full text index of those columns and use
MATCH() AGAINST()
in your queries.Create the full text index in a migration file:
Query using that index:
ElasticSearch and Searchkick
If you're looking for a full blown search indexing solution that allows you to search for any column in any of your records while still being lightning quick, take a look at ElasticSearch and Searchkick.
ElasticSearch is the indexing and search engine.
Searchkick is the integration library with Rails that makes it very easy to index your records and search them.
Searchkick's README does a fantastic job at explaining how to get up and running and to fine tune your setup, but here is a little snippet:
Install and start ElasticSearch.
Add
searchkick
gem to your bundle:The
--strict
option just tells Bundler to use an exact version in your Gemfile, which I highly recommend.Add
searchkick
to a model you want to index:Index your records.
Search your index.
对于任何正在寻找没有任何依赖项的简单搜索 gem 的人,请查看 acts_as_indexed
For anyone looking for a simple search gem without any dependencies, check out acts_as_indexed