在 MySQL 中对大型表(>10.000.000 行)中的列进行通配符搜索
您将使用哪些技术来实现对 MySql 中非常大的表的列中的内容的搜索? 举例来说,您有 10,000,000 封电子邮件存储在数据库的表中,并且想要实现主题搜索,这将使我能够搜索电子邮件主题中存在的一个或多个单词。 如果用户搜索“christmas santa”,您应该会找到一封主题为“圣诞老人今年圣诞节拜访我们”和“圣诞节,圣诞老人会出现吗”的电子邮件。
我的想法是处理主题中的所有单词(删除所有数字、特殊符号、逗号等)并将每个单词保存在索引表中,其中我在单词列上有一个唯一索引。 然后我会通过多对多关系表将其链接到电子邮件表。
有没有更好的方法在非常大的表上执行通配符搜索?
是否有数据库本身支持此类搜索?
Which techniqes would you use to implement a search for contents in a column on a very big table in MySql? Say for instance that you have 10.000.000 emails stored in a table in the database and would like to implement a subject search, that would enable me to search for one or more words that was present in the email subject. If the user searched for "christmas santa" you should find a emails with subjects like "Santa visits us this christmas" and "christmas, will santa ever show".
My idea is to process all the words in the subjects (strip all numbers, special signs, commas etc) and save each word in an index table, where I have a unique index on the word column. Then I would link that to the email table by a many to many relationship table.
Is there a better way to perform wildcard searches on very big tables ?
Is there databases that natively supports this kind of searches ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您使用 MyISAM 作为存储引擎,则可以使用 FULLTEXT 索引。 然而,MySQL 总体上不太擅长文本搜索。
更好的选择是使用专用的文本索引解决方案,例如 Lucene 或 狮身人面像。 我个人推荐 Sphinx - 它与 PHP 和 MySQL 有很好的集成,并且非常非常快(甚至可以用来加速普通查询 - 执行非常快的分组和排序)。
维基百科有一个很好的不同索引引擎列表 - 此处。
You could use FULLTEXT indexes if you are using MyISAM as the storage engine. However, MySQL in general is not very good with text search.
A much better option would be to go with a dedicated text indexing solution such as Lucene or Sphinx. Personally I'd recommend Sphinx - it has great integration with PHP and MySQL and is very, very fast (can be used to speed up even ordinary queries - performs very fast grouping and ordering).
Wikipedia has a nice list of different indexing engines - here.
MySQL 的 MyISAM 表支持
FULLTEXT
< /a> 索引,这有助于此类搜索。但这并不是此类搜索中最快的技术。 并且您不能对存储在 InnoDB 表中的数据使用它。
我听说过一些关于 Sphinx 搜索 的好消息,但我还没有使用过它。
这是关于 Sphinx 的另一个博客:http://capttofu.livejournal.com/13037.html
MySQL's MyISAM tables support a
FULLTEXT
index, which helps in this kind of search.But it's not the speediest technology available for this kind of search. And you can't use it on data stored in InnoDB tables.
I've heard some good things about Sphinx Search, but I haven't used it yet.
Here's another blog about Sphinx: http://capttofu.livejournal.com/13037.html
虽然 mysql 全文索引是可能的,但我怀疑我会考虑使用设计为搜索引擎的东西,例如 卢森。
While a mysql fulltext index is possible, I suspect I would look at using something designed to be a search engine like Lucene.
这听起来像是 SQL Server 支持的全文搜索。
但你的想法总体来说是合理的。 您可以有效地提前计算表上的“索引”以加快搜索速度。
This sounds like a a full text search, which SQL Server supports.
But your idea is generally sound. You're effectively computing an "index" on your table in advance to speed up searches.
您想要查看
MATCH...AGAINST
函数。例如,请参阅:使用 MySQL 全文搜索
You want to look at the
MATCH...AGAINST
function.See, for example: Using MySQL Full-text Searching
检查 MySQL 文档中的“全文搜索”( AFAIK,所有当前的 DBMS 都支持这一点)
check "full text search" in MySQL docs (AFAIK, all current DBMS support this)