搜索常用术语时,SQL 匹配全文索引速度很慢
简单的 PHP 搜索页面,包含三个字段,查询包含数百万行的 MySQL 数据库。数据由个人资料组成,主要是美国。
当搜索词很少见时(例如“加拿大”),全文索引可将速度提高 9.6 倍。
当搜索词很常见(例如“美国”)时,全文索引会使速度降低 3.6 倍。
有什么方法可以享受这两种技术的好处吗?如果该术语很少见(在不到一半的记录中找到),也许只搜索表的一小部分并构建“MATCH AGAINST”查询,如果该术语很常见,则构建标准“LIKE”查询?这听起来像是一个复杂的解决方案。我想要一个更好的方法。
这是我的 SQL 和搜索时间:
SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%canada%'
1.14 秒(慢)
SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +canada' IN BOOLEAN MODE) AND country LIKE '%canada%
0.12 秒(快)
SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%united states%'
1.09 秒(相对快)
SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +united +states' IN BOOLEAN MODE) AND country LIKE '%united states%
3.9 秒(慢)
Simple PHP search page with three fields querying a MySQL database with millions of rows. The data consists of profiles, mostly USA.
Fulltext indexing increases speed by a factor of 9.6 when the search term is rare (eg "Canada").
Fulltext indexing slows speed by a factor of 3.6 when the search term is common (eg "United States").
Is there some way to enjoy the benefits of both technique? Maybe searching only a small portion of the table and constructing a "MATCH AGAINST" query if the term is rare (found in less than half of the records) and a standard "LIKE" query if the term is common? That sounds like a convoluted solution. I want a better way.
Here is my SQL and search times:
SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%canada%'
1.14 seconds (slow)
SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +canada' IN BOOLEAN MODE) AND country LIKE '%canada%
0.12 seconds (fast)
SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%united states%'
1.09 seconds (comparatively fast)
SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +united +states' IN BOOLEAN MODE) AND country LIKE '%united states%
3.9 seconds (slow)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您知道搜索是针对国家/地区名称,并且您只需要在
country
列中查找,那么这将比上述任何方法快得多:每个查询的 COUNT 是多少?这将有助于解释时间安排。表中有多少行?
同时,您的 FT 查询与您的 LIKE 查询不相同。所以,我不想讨论为什么速度不同。
If you could know that the search is on a country name and that you need to look only in the
country
column, then this would be much faster than any of the above:What were the COUNTs from each query? That would help explain the timings. How many rows in the table?
Meanwhile, your FT queries are not the same as your LIKE queries. So, I don't want to discuss why the speeds are different.
我的解决方案是生成我自己的停用词列表。我下载了一个数据样本,并使用 Python 找到了最常见的术语。通过 PHP 进行的 MATCH 查询排除了常用术语。在我的数据中,“United”和“States”是常见术语:
成为:
当所有搜索术语都是“常见”时,MATCH 语句将被完全删除:
成为:
两全其美。
My solution is to generate my own list of stopwords. I downloaded a sample of data and found the most common terms using Python. Common terms are excluded from MATCH queries through PHP. In my data, "United" and "States" are common terms:
Becomes:
When all of the search terms are "common" the MATCH statement is removed entirely:
Becomes:
The best of both worlds.