有哪些好的云或第三方搜索提供商?

发布于 2024-10-07 07:31:43 字数 1013 浏览 0 评论 0原文

我正在开发一个目前流量很少的网站。它使用 Ruby on Rails 构建,并在 Heroku 的云平台上运行。

作为网站的一部分,我有大量需要搜索的页面,每个页面上只有很少的信息。想象一下一个文章表,其中每篇文章只需要对其标题进行索引,但大约有 800 万篇文章。

Postgres 搜索: 当我第一次开始研究这个问题时,我运行了 Postgres 全文搜索,但显然它的优化不够,无法让搜索处理这么多索引项,而且运行速度很慢。我的一些搜索导致数据库连接超时,并且需要超过 30 秒才能完成。

Websolr: 然后我转向了当时唯一一个用于云搜索的 Heroku 插件,即 OneMoreCloud 的 Websolr。不幸的是,他们按索引的项目数量收费,这对于像我这样没有流量但有大量要索引的项目的网站来说是可怕的,而且我的性能可以说比免费的 Postgres 搜索更差。当 Postgres 搜索超时并导致网站瘫痪时,Websolr 将返回空的或部分结果集,使查看者认为结果不在数据库中。

索引罐: 现在 Heroku 添加了另一个云搜索提供商 Index Tank,该服务仍处于测试阶段。虽然它的测试版是免费的,但我不愿意尝试它们,因为对于他们的非 Heroku 服务来说,这不是免费的,他们的最高计划只有 200 万份文档,而每月的费用已经令人瞠目结舌 500 美元。

Google 网站搜索: 我目前正在考虑的一个选择是转向 Google Site Search。 Google 搜索品牌让我相信我不会遇到过去遇到的性能问题。而且,他们的定价非常合理,并且是按流量定价的。然而,不利的一面是,它并不是真正的集成搜索,因为它不会连接到数据库,而只会查看网页,因此据我所知,无法指定仅返回文章等内容的搜索在技​​术文章类别或类似的类别中。即使自定义搜索结果的外观似乎也是一种痛苦,因为我必须解析 XML 中的搜索结果,然后使用它来生成我的搜索结果页面,并且如果我想使用元数据进行自定义在显示中,我必须使用解析的搜索结果来查找数据库中的所有结果行。

您向 Stackoverflow 社区推荐的云或第三方搜索提供商有什么好的选择吗?

I'm working on a website that has minimal traffic at the moment. It's built using Ruby on Rails and runs on Heroku's cloud platform.

As part of the site, I have a large number of pages that need to be searchable, each of which only has a tiny amount of information on it. Think of a table of articles where each article only needs its title indexed, but there are around 8 million articles.

Postgres Search:
When I first started working on this, I ran Postgres full text search, but apparently it's not optimized enough for search to handle this many indexed items, and ran dog slow. I had some searches that were timing out the database connection and taking more than 30 seconds to complete.

Websolr:
I then moved onto what was then the one and only Heroku add-on for cloud search, Websolr by OneMoreCloud. Unfortunately, they charge by the number of items indexed, which is horrible for a site like mine that has no traffic but a large number of items to index, and I had performance that was arguably worse than Postgres search, which was free. Where Postgres search would timeout, and bring down the site, Websolr would return an empty or partial results set, making viewers think that the result wasn't in the database.

Index Tank:
Now Heroku has added another cloud search provider, Index Tank, which is in beta still. While the beta for it is free, I'm reluctant to try them because for their non-Heroku service, which is not free, their highest plan only has 2 million documents while already costing an eye popping $500 a month.

Google Site Search:
An option I'm currently looking at is moving over to Google Site Search. The Google search brand gives me confidence that I won't run into the performance issues I had in the past. Also, their pricing is extremely reasonable, and is priced by traffic. However, on the downside, it's not truly an integrated search, as it doesn't hook into the database but only looks at webpages, so there's no way as far as I can tell to specify a search where it only returns, say, articles in the Technical Articles category or something like that. Even to customize the appearance of the search results seems like it's kind of a pain, in that I'd have to parse the search results in XML and then use that to generate my search result page, and if I wanted to customize with meta data in the display, I'd have to used the parsed search results to look up all the results' rows in my database.

Are there any good options for cloud or 3rd party search providers out there that you'd recommend to the Stackoverflow community?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夏见 2024-10-14 07:31:43

查找 http://www.searchblox.com/。另一种选择(虽然不是云提供商)是使用 Elastic search http://www.elasticsearch.org/。它的设置和使用非常简单,并且通常可以在 OOTB 中使用。

这里还有 ES 的创建者本人将其与其他提供商进行比较的观点:
ElasticSearch、Sphinx、Lucene、Solr、Xapian。哪个适合哪种用途?

Look up http://www.searchblox.com/. Another alternative, although not a cloud provider, would be to use Elastic search http://www.elasticsearch.org/. It is super simple to setup and use and generally works OOTB.

Also here is a perspective by the creator of ES himself comparing it to other providers:
ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文