提高全文搜索性能

发布于 2024-12-23 10:20:40 字数 280 浏览 1 评论 0 原文

我有一个 MySQL 数据库,其中有一个包含 2000 万行的表。我希望能够对其中一个 varchar(255) 列进行自由文本搜索。所有这些值的长度总和为 6000 万个字符。目前要做一个查询,例如:

select value from table1 where match( value ) against( 'history' ) ;

需要二十到三十秒。要在一秒或更短的时间内完成此类查询需要什么?

目前它正在 VPS 上运行。我应该考虑使用什么硬件/软件来将搜索时间缩短到 1 秒或更短。

I have a MySQL database with a table with 20 million rows. I'd like to be able to do a free text search of one of the columns which is a varchar(255). The sum of the length of all these values is 60 million characters. Currently to do a query such as:

select value from table1 where match( value ) against( 'history' ) ;

takes twenty to thirty seconds. What would it take to get this type of query to complete in one second or less?

Currently this is running on a VPS. What hardward/software should I consider using to improve this search to 1 second or less.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

远山浅 2024-12-30 10:20:40

我完全同意斯坦尼斯拉夫的观点。我认为任何外部搜索引擎,如 http://lucene.apache.org/http://sphinxsearch.com/ 对于您提到的集合大小会更快。

对于 Sphinx 速成课程,我建议从 http://astellar.com/2011/12/replacing-mysql-full-text-search-with-sphinx/

在您的情况下,我会在提到的基本设置中添加一些内容。

在源配置中使用范围查询来降低 MySQL 的压力,同时索引并使用开始/结束模板扩展 sql_query:

source my_source
{
  ...
  sql_query_range = SELECT MIN(id), MAX(id) FROM table
  sql_range_step  = 1000
  ...
  sql_query  = SELECT id, ... FROM table WHERE id>=$start AND id <= $end
  ...
}

这将告诉 Sphinx 每个 MySQL 查询最多获取 1000 个文档,而不是一次表中的所有记录。如果您有超过 1M 的记录,则必须有此选项。

在你的情况下,取决于盒子上的内存量,我还会将索引器的 mem_limit 增加到 512M..1024M,这样索引会工作得更快。

当您使用 Sphinx 时,您可能希望将一些查询从 MySQL 移至 Sphinx 端,并向 Sphinx 索引添加非完整文本字段以执行基于地理距离的搜索或分面搜索,如 http://sphinxsearch.com/docs/current.html#attributes

I completely agree with Stanislav here. I think any external search engine like http://lucene.apache.org/ or http://sphinxsearch.com/ will be faster on the collection size you've mentioned.

For Sphinx crash course I would recommend to start with simple setup described in http://astellar.com/2011/12/replacing-mysql-full-text-search-with-sphinx/

In your case I would add few things into basic setup mentioned.

Use ranged query in source config to lower pressure on MySQL while indexing and extend sql_query with start/end template:

source my_source
{
  ...
  sql_query_range = SELECT MIN(id), MAX(id) FROM table
  sql_range_step  = 1000
  ...
  sql_query  = SELECT id, ... FROM table WHERE id>=$start AND id <= $end
  ...
}

This will tell Sphinx to fetch up to 1000 docs per MySQL query instead of all records in table at once. If you have more than 1M records this is must have option.

In your case depends on amount of memory you have on the box I would also increase indexer's mem_limit up to 512M..1024M so indexing will work faster.

As you play with Sphinx you may want to move some queries from MySQL to Sphinx side and also add non-full text fields to Sphinx index to perform geodistance-based or faceted search as described in http://sphinxsearch.com/docs/current.html#attributes

无声情话 2024-12-30 10:20:40

Mysql 内置了全文搜索,但功能相当有限。

我建议使用专门的全文搜索引擎,最简单且对 mysql 友好的是 sphinx - http://sphinxsearch.com/< /a>.它有很多用于任何平台/语言的库来创建和管理索引。

分离的搜索引擎将允许您避免在如此大的集合上缓慢插入,因为每次插入时都会更新索引。

Mysql has built in full text search which is quite limited in functionality.

I would recommend to use specialized full text search engines, the most simplest and friendly with mysql is sphinx - http://sphinxsearch.com/. It has a lot of libs for any platforms/languages to create and manage indexes.

Separated search engine will allow you to avoid slow inserts on such a big collection because of indexes updates on every insert.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文