搜索常用术语时,SQL 匹配全文索引速度很慢

发布于 2025-01-19 10:42:46 字数 897 浏览 1 评论 0原文

简单的 PHP 搜索页面,包含三个字段,查询包含数百万行的 MySQL 数据库。数据由个人资料组成,主要是美国。

当搜索词很少见时(例如“加拿大”),全文索引可将速度提高 9.6 倍。

当搜索词很常见(例如“美国”)时,全文索引会使速度降低 3.6 倍。

有什么方法可以享受这两种技术的好处吗?如果该术语很少见(在不到一半的记录中找到),也许只搜索表的一小部分并构建“MATCH AGAINST”查询,如果该术语很常见,则构建标准“LIKE”查询?这听起来像是一个复杂的解决方案。我想要一个更好的方法。

这是我的 SQL 和搜索时间:

SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%canada%' 

1.14 秒(慢)

    SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +canada' IN BOOLEAN MODE) AND country LIKE '%canada%

0.12 秒(快)

SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%united states%'

1.09 秒(相对快)

 SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +united +states' IN BOOLEAN MODE) AND country LIKE '%united states%

3.9 秒(慢)

Simple PHP search page with three fields querying a MySQL database with millions of rows. The data consists of profiles, mostly USA.

Fulltext indexing increases speed by a factor of 9.6 when the search term is rare (eg "Canada").

Fulltext indexing slows speed by a factor of 3.6 when the search term is common (eg "United States").

Is there some way to enjoy the benefits of both technique? Maybe searching only a small portion of the table and constructing a "MATCH AGAINST" query if the term is rare (found in less than half of the records) and a standard "LIKE" query if the term is common? That sounds like a convoluted solution. I want a better way.

Here is my SQL and search times:

SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%canada%' 

1.14 seconds (slow)

    SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +canada' IN BOOLEAN MODE) AND country LIKE '%canada%

0.12 seconds (fast)

SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%united states%'

1.09 seconds (comparatively fast)

 SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +united +states' IN BOOLEAN MODE) AND country LIKE '%united states%

3.9 seconds (slow)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

爱给你人给你 2025-01-26 10:42:46

如果您知道搜索是针对国家/地区名称,并且您只需要在 country 列中查找,那么这将比上述任何方法快得多:

... WHERE country = 'canada'

每个查询的 COUNT 是多少?这将有助于解释时间安排。表中有多少行?

同时,您的 FT 查询与您的 LIKE 查询相同。所以,我不想讨论为什么速度不同。

If you could know that the search is on a country name and that you need to look only in the country column, then this would be much faster than any of the above:

... WHERE country = 'canada'

What were the COUNTs from each query? That would help explain the timings. How many rows in the table?

Meanwhile, your FT queries are not the same as your LIKE queries. So, I don't want to discuss why the speeds are different.

戏蝶舞 2025-01-26 10:42:46

我的解决方案是生成我自己的停用词列表。我下载了一个数据样本,并使用 Python 找到了最常见的术语。通过 PHP 进行的 MATCH 查询排除了常用术语。在我的数据中,“United”和“States”是常见术语:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST('+sales +united +states' IN BOOLEAN MODE) AND country LIKE '%united states% AND tagline LIKE '%sales%'"

成为:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST('+sales' IN BOOLEAN MODE) AND country LIKE '%united states% AND tagline LIKE '%sales%'"

当所有搜索术语都是“常见”时,MATCH 语句将被完全删除:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST('+united +states' IN BOOLEAN MODE) AND country LIKE '%united states%"

成为:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE country LIKE '%united states%"

两全其美。

My solution is to generate my own list of stopwords. I downloaded a sample of data and found the most common terms using Python. Common terms are excluded from MATCH queries through PHP. In my data, "United" and "States" are common terms:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST('+sales +united +states' IN BOOLEAN MODE) AND country LIKE '%united states% AND tagline LIKE '%sales%'"

Becomes:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST('+sales' IN BOOLEAN MODE) AND country LIKE '%united states% AND tagline LIKE '%sales%'"

When all of the search terms are "common" the MATCH statement is removed entirely:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST('+united +states' IN BOOLEAN MODE) AND country LIKE '%united states%"

Becomes:

$SQL =  "SELECT COUNT(1) FROM `mytable` WHERE country LIKE '%united states%"

The best of both worlds.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文