搜索实施困境:全文与纯 SQL
我有一个需要搜索的 MySQL/Rails 应用程序。以下是有关数据的一些信息:
用户仅在自己的数据中搜索,因此首先按 user_id 缩小搜索范围。
每个用户最多将拥有大约 5000 条记录(它们会随着时间的推移而累积)。
我将典型用户的记录写入文本文件。文件大小为 2.9 MB。
搜索必须涵盖两列:
title
和body
。title
是一个 varchar(255) 列。body
是列类型文本。这将被轻度使用。如果我平均每秒进行几次搜索,那将是令人惊讶的。
它正在运行一台 500 MB CentOS 5 VPS 计算机。
我不想要相关性排名或任何类型的模糊性。搜索应该精确字符串并可靠地返回包含该字符串的所有记录。简单的日期顺序 - 从最新到最旧。
我正在使用 InnoDB 表类型。
我正在研究纯 SQL 搜索(通过 searchlogic gem)或使用 Sphinx 和 Thinking Sphinx gem 的全文搜索。
Sphinx 非常快,Thinking Sphinx 很酷,但它增加了复杂性,需要维护守护进程,需要 cron 作业来维护索引。
我可以对小型应用程序进行简单的 SQL 搜索吗?
I have a MySQL/Rails app that needs search. Here's some info about the data:
Users search within their own data only, so searches are narrowed down by user_id to begin with.
Each user will have up to about five thousand records (they accumulate over time).
I wrote out a typical user's records to a text file. The file size is 2.9 MB.
Search has to cover two columns:
title
andbody
.title
is a varchar(255) column.body
is column type text.This will be lightly used. If I average a few searches per second that would be surprising.
It's running an a 500 MB CentOS 5 VPS machine.
I don't want relevance ranking or any kind of fuzziness. Searches should be for exact strings and reliably return all records containing the string. Simple date order -- newest to oldest.
I'm using the InnoDB table type.
I'm looking at plain SQL search (through the searchlogic gem) or full text search using Sphinx and the Thinking Sphinx gem.
Sphinx is very fast and Thinking Sphinx is cool, but it adds complexity, a daemon to maintain, cron jobs to maintain the index.
Can I get away with plain SQL search for a small scale app?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为简单的 SQL 搜索不是一个好的选择。因为当我们在 MySQL 中获取文本类型列时,无论缓存设置如何,请求总是落在硬盘驱动器上。
您只能对非常小的应用程序使用纯 SQL 搜索。
我更喜欢狮身人面像。
I think plain SQL search won't be the good choice. Because of when we fetching text type columns in MySQL the request is always falling to hard drive no matter what cache settings are.
You can use plain SQL search only with very small apps.
I'd prefer Sphinx for that.
我会从简单的开始——纯 SQL 很可能会很好地工作,如果搜索功能被证明是一个瓶颈,您可以随时切换到全文搜索。
我正在开发和维护一个具有搜索功能的应用程序,其属性与您的类似,到目前为止,简单的 SQL 搜索对我来说效果很好。当我在一两年前第一次实现搜索功能时,我也有类似的性能问题,但我还没有看到任何性能问题。
I would start out simple -- chances are that plain SQL will work well, and you can always switch to full text search later if the search function proves to be a bottleneck.
I'm developing and maintaining an application with a search function with properties similar to yours, and plain SQL search has worked very well for me so far. I had similar performance concerns when I first implemented the search function a year or two ago, but I haven't seen any performance problems whatsoever yet.
使用 MySQL 全文搜索大约 4 年了,现在刚刚转向 Sphinx,我想说使用全文布尔(即精确)语法的常规 MySQL 搜索就可以了。它速度很快,而且会完全按照您的要求进行。您在任何时候搜索的数据量都会很小。
唯一的问题可能是对结果进行排序。当您开始按(例如)日期排序时,MySQL 的全文搜索可能会变慢,因为这要求您搜索整个表,而不仅仅是它找到的第一个 nn 个结果。这最终是我搬到 Sphinx 的原因。
Sphinx 也很棒,所以不要害怕尝试它,但听起来您的情况可能不需要附加功能。
Having used MySQL fulltext search for about 4 years, and just moving now to Sphinx, I'd say that a regular MySQL search using the fulltext boolean (ie exact) syntax will be fine. It's fast and it will do exactly what you want. The amount of data you will be searching at any one time will be small.
The only problem might be ordering the results. MySQL's fulltext search can get slow when you start ordering things by (eg) date, as that requires that you search the entire table, rather than just the first nn results it finds. That was ultimately the reason I moved to Sphinx.
Sphinx is also awesome, so don't be afraid to try it, but it sounds like the additional functionality may not be required in your case.