提高全文搜索性能

发布于 2024-12-23 10:20:40 字数 280 浏览 1 评论 0 原文

我有一个 MySQL 数据库，其中有一个包含 2000 万行的表。我希望能够对其中一个 varchar(255) 列进行自由文本搜索。所有这些值的长度总和为 6000 万个字符。目前要做一个查询，例如：

select value from table1 where match( value ) against( 'history' ) ;

需要二十到三十秒。要在一秒或更短的时间内完成此类查询需要什么？

目前它正在 VPS 上运行。我应该考虑使用什么硬件/软件来将搜索时间缩短到 1 秒或更短。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

远山浅 2024-12-30 10:20:40

我完全同意斯坦尼斯拉夫的观点。我认为任何外部搜索引擎，如 http://lucene.apache.org/ 或 http://sphinxsearch.com/ 对于您提到的集合大小会更快。

对于 Sphinx 速成课程，我建议从 http://astellar.com/2011/12/replacing-mysql-full-text-search-with-sphinx/

在您的情况下，我会在提到的基本设置中添加一些内容。

在源配置中使用范围查询来降低 MySQL 的压力，同时索引并使用开始/结束模板扩展 sql_query：

source my_source
{
  ...
  sql_query_range = SELECT MIN(id), MAX(id) FROM table
  sql_range_step  = 1000
  ...
  sql_query  = SELECT id, ... FROM table WHERE id>=$start AND id <= $end
  ...
}

这将告诉 Sphinx 每个 MySQL 查询最多获取 1000 个文档，而不是一次表中的所有记录。如果您有超过 1M 的记录，则必须有此选项。

在你的情况下，取决于盒子上的内存量，我还会将索引器的 mem_limit 增加到 512M..1024M，这样索引会工作得更快。

当您使用 Sphinx 时，您可能希望将一些查询从 MySQL 移至 Sphinx 端，并向 Sphinx 索引添加非完整文本字段以执行基于地理距离的搜索或分面搜索，如 http://sphinxsearch.com/docs/current.html#attributes

I completely agree with Stanislav here. I think any external search engine like http://lucene.apache.org/ or http://sphinxsearch.com/ will be faster on the collection size you've mentioned.

For Sphinx crash course I would recommend to start with simple setup described in http://astellar.com/2011/12/replacing-mysql-full-text-search-with-sphinx/

In your case I would add few things into basic setup mentioned.

Use ranged query in source config to lower pressure on MySQL while indexing and extend sql_query with start/end template:

source my_source
{
  ...
  sql_query_range = SELECT MIN(id), MAX(id) FROM table
  sql_range_step  = 1000
  ...
  sql_query  = SELECT id, ... FROM table WHERE id>=$start AND id <= $end
  ...
}

This will tell Sphinx to fetch up to 1000 docs per MySQL query instead of all records in table at once. If you have more than 1M records this is must have option.

In your case depends on amount of memory you have on the box I would also increase indexer's mem_limit up to 512M..1024M so indexing will work faster.

As you play with Sphinx you may want to move some queries from MySQL to Sphinx side and also add non-full text fields to Sphinx index to perform geodistance-based or faceted search as described in http://sphinxsearch.com/docs/current.html#attributes

回复收藏 0 原文