当前位置：文江博客话题详情

performance MySQL full-text-search

一个'WHERE列LIKE"%Expression%"怎么能？ '比 MySQL 中的 MATCH(column) AGAINST(“表达式”) 执行得更好？

发布于 2024-11-18 15:20:47 字数 6988 浏览 1 评论 0 原文

我遇到了严重的 MySQL 性能瓶颈，我无法理解和解决它。以下是表结构、索引和记录计数（请耐心等待，这只是两个表）：

mysql> desc elggobjects_entity;
+-------------+---------------------+------+-----+---------+-------+
| Field       | Type                | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| guid        | bigint(20) unsigned | NO   | PRI | NULL    |       |
| title       | text                | NO   | MUL | NULL    |       |
| description | text                | NO   |     | NULL    |       |
+-------------+---------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> show index from elggobjects_entity;
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table              | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| elggobjects_entity |          0 | PRIMARY  |            1 | guid        | A         |      613637 |     NULL | NULL   |      | BTREE      |         |
| elggobjects_entity |          1 | title    |            1 | title       | NULL      |         131 |     NULL | NULL   |      | FULLTEXT   |         |
| elggobjects_entity |          1 | title    |            2 | description | NULL      |         131 |     NULL | NULL   |      | FULLTEXT   |         |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)

mysql> select count(*) from elggobjects_entity;
+----------+
| count(*) |
+----------+
|   613637 |
+----------+
1 row in set (0.00 sec)

mysql> desc elggentity_relationships;
+--------------+---------------------+------+-----+---------+----------------+
| Field        | Type                | Null | Key | Default | Extra          |
+--------------+---------------------+------+-----+---------+----------------+
| id           | int(11)             | NO   | PRI | NULL    | auto_increment |
| guid_one     | bigint(20) unsigned | NO   | MUL | NULL    |                |
| relationship | varchar(50)         | NO   | MUL | NULL    |                |
| guid_two     | bigint(20) unsigned | NO   | MUL | NULL    |                |
| time_created | int(11)             | NO   |     | NULL    |                |
+--------------+---------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> show index from elggentity_relationships;
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table                    | Non_unique | Key_name     | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| elggentity_relationships |          0 | PRIMARY      |            1 | id           | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          0 | guid_one     |            1 | guid_one     | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          0 | guid_one     |            2 | relationship | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          0 | guid_one     |            3 | guid_two     | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          1 | relationship |            1 | relationship | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          1 | guid_two     |            1 | guid_two     | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)

mysql> select count(*) from elggentity_relationships;
+----------+
| count(*) |
+----------+
| 11408236 |
+----------+
1 row in set (0.00 sec)

现在我想对这两个表使用 INNER JOIN 并执行全文搜索。

查询：

SELECT
        count(DISTINCT o.guid) as total
FROM
        elggobjects_entity o
INNER JOIN
        elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
        ((MATCH (o.title, o.description) AGAINST ('scelerisque' )))

这给了我 6 分钟（！）的响应时间。

另一方面，这个

SELECT
        count(DISTINCT o.guid) as total
FROM
        elggobjects_entity o
INNER JOIN
        elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
        ((o.title like "%scelerisque%") OR (o.description like "%scelerisque%"))

在 0.02 秒内返回了相同的计数值。

这怎么可能？我在这里缺少什么？（MySQL 信息：mysql Ver 14.14 Distrib 5.1.49，使用 readline 6.1 的 debian-linux-gnu (x86_64)）

编辑

解释第一个查询（使用 match .. against）给出：

+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type     | possible_keys         | key          | key_len | ref   | rows | Extra       |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
|  1 | SIMPLE      | r     | ref      | guid_one,relationship | relationship | 152     | const | 6145 | Using where |
|  1 | SIMPLE      | o     | fulltext | PRIMARY,title         | title        | 0       |       |    1 | Using where |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+

而第二个查询（使用 LIKE“%..%”）：

+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| id | select_type | table | type   | possible_keys         | key          | key_len | ref                 | rows | Extra       |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
|  1 | SIMPLE      | r     | ref    | guid_one,relationship | relationship | 152     | const               | 6145 | Using where |
|  1 | SIMPLE      | o     | eq_ref | PRIMARY               | PRIMARY      | 8       | elgg1710.r.guid_one |    1 | Using where |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+

原文

I've run into a serious MySQL performance bottleneck which I'm unable to understand and resolve. Here are the table structures, indexes and record counts (bear with me, it's only two tables):

mysql> desc elggobjects_entity;
+-------------+---------------------+------+-----+---------+-------+
| Field       | Type                | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| guid        | bigint(20) unsigned | NO   | PRI | NULL    |       |
| title       | text                | NO   | MUL | NULL    |       |
| description | text                | NO   |     | NULL    |       |
+-------------+---------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> show index from elggobjects_entity;
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table              | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| elggobjects_entity |          0 | PRIMARY  |            1 | guid        | A         |      613637 |     NULL | NULL   |      | BTREE      |         |
| elggobjects_entity |          1 | title    |            1 | title       | NULL      |         131 |     NULL | NULL   |      | FULLTEXT   |         |
| elggobjects_entity |          1 | title    |            2 | description | NULL      |         131 |     NULL | NULL   |      | FULLTEXT   |         |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)

mysql> select count(*) from elggobjects_entity;
+----------+
| count(*) |
+----------+
|   613637 |
+----------+
1 row in set (0.00 sec)

mysql> desc elggentity_relationships;
+--------------+---------------------+------+-----+---------+----------------+
| Field        | Type                | Null | Key | Default | Extra          |
+--------------+---------------------+------+-----+---------+----------------+
| id           | int(11)             | NO   | PRI | NULL    | auto_increment |
| guid_one     | bigint(20) unsigned | NO   | MUL | NULL    |                |
| relationship | varchar(50)         | NO   | MUL | NULL    |                |
| guid_two     | bigint(20) unsigned | NO   | MUL | NULL    |                |
| time_created | int(11)             | NO   |     | NULL    |                |
+--------------+---------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> show index from elggentity_relationships;
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table                    | Non_unique | Key_name     | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| elggentity_relationships |          0 | PRIMARY      |            1 | id           | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          0 | guid_one     |            1 | guid_one     | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          0 | guid_one     |            2 | relationship | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          0 | guid_one     |            3 | guid_two     | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          1 | relationship |            1 | relationship | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
| elggentity_relationships |          1 | guid_two     |            1 | guid_two     | A         |    11408236 |     NULL | NULL   |      | BTREE      |         |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)

mysql> select count(*) from elggentity_relationships;
+----------+
| count(*) |
+----------+
| 11408236 |
+----------+
1 row in set (0.00 sec)

Now I'd like to use an INNER JOIN on those two tables and perform a full text search.

Query:

SELECT
        count(DISTINCT o.guid) as total
FROM
        elggobjects_entity o
INNER JOIN
        elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
        ((MATCH (o.title, o.description) AGAINST ('scelerisque' )))

This gave me a 6 minute (!) response time.

On the other hand this one

SELECT
        count(DISTINCT o.guid) as total
FROM
        elggobjects_entity o
INNER JOIN
        elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
        ((o.title like "%scelerisque%") OR (o.description like "%scelerisque%"))

returned the same count value in 0.02 seconds.

How is that possible? What am I missing here?
(MySQL info: mysql Ver 14.14 Distrib 5.1.49, for debian-linux-gnu (x86_64) using readline 6.1)

EDIT

EXPLAINing the first query (using match .. against) gives:

+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type     | possible_keys         | key          | key_len | ref   | rows | Extra       |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
|  1 | SIMPLE      | r     | ref      | guid_one,relationship | relationship | 152     | const | 6145 | Using where |
|  1 | SIMPLE      | o     | fulltext | PRIMARY,title         | title        | 0       |       |    1 | Using where |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+

while the second query (using LIKE "%..%"):

+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| id | select_type | table | type   | possible_keys         | key          | key_len | ref                 | rows | Extra       |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
|  1 | SIMPLE      | r     | ref    | guid_one,relationship | relationship | 152     | const               | 6145 | Using where |
|  1 | SIMPLE      | o     | eq_ref | PRIMARY               | PRIMARY      | 8       | elgg1710.r.guid_one |    1 | Using where |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉栀 2024-11-25 15:20:47

通过结合您的经验和 EXPLAIN 的结果，在这种特殊情况下，全文索引似乎没有您期望的那么有用。这取决于数据库中的特定数据、数据库结构或/和特定查询。

通常数据库引擎每个表使用不超过一个索引。因此，当表有多个索引时，查询优化器会尝试使用更好的索引。但优化器并不总是足够聪明。

EXPLAIN 的输出显示数据库查询优化器决定对 relationship 和 title 使用索引。关系过滤器将表 elggentity_relationships 减少到 6145 行。标题过滤器将表 elggobjects_entity 减少到 72697 行。然后MySQL需要在不使用任何索引的情况下连接这些表（6145 x 72697 = 446723065个过滤操作），因为索引已经用于过滤。在这种情况下，这可能太多了。 MySQL 甚至可以通过尝试在内存中保留足够的可用空间来决定将中间计算保留在硬盘中。

现在让我们看一下另一个查询。它使用（表 elggobjects_entity 的）relationship 和 PRIMARY KEY 作为索引。关系过滤器将表 elggentity_relationships 减少到 6145 行。通过在 PRIMARY KEY 索引上连接这些表，结果仅获得 3957 行。这对于最后一个过滤器（即 LIKE "%scelerisque%"）来说并不重要，即使索引根本不用于此目的。

正如您所看到的，速度很大程度上取决于为查询选择的索引。因此，在这种特殊情况下，PRIMARY KEY 索引比全文 title 索引有用得多，因为 PRIMARY KEY 对结果缩减的影响比全文索引更大。 标题。

MySQL 并不总是能够巧妙地设置正确的索引。我们可以通过使用诸如 IGNORE INDEX (index_name)、FORCE INDEX (index_name) 等子句来手动执行此操作。

但在您的情况下，问题是如果我们使用 < code>MATCH() AGAINST() 在查询中则需要全文索引，因为如果没有全文索引，MATCH() AGAINST() 根本不起作用。所以这就是MySQL为查询选择错误索引的主要原因。

更新

好的，我做了一些调查。

首先，您可以尝试强制MySQL在表elggentity_relationships上使用guid_one索引而不是relationship：USE INDEX (guid_one)代码>.

但为了获得更好的性能，我认为您可以尝试为两列的组合创建一个索引（guid_one，membership）。当前索引 guid_one 非常相似，但针对的是 3 列，而不是 2 列。在此查询中仅使用了 2 列。在我看来，创建索引后 MySQL 应该自动使用正确的索引。如果没有，强制 MySQL 使用它。

注意：创建索引后，不要忘记从查询中删除旧的 USE INDEX 指令，因为这可能会阻止查询使用新创建的索引。 :)

By combining your experience and EXPLAIN's results, it seems that fulltext index is not as useful as you expect in this particular case. This depends on particular data in your database, on database structure or/and particular query.

Usually database engines use no more than one index per table. So when the table has more than one index, query optimizer tries to use the better one. But optimizer is not always clever enough.

EXPLAIN's output shows that database query optimizer decided to use indexes for relationship and title. The relationship filter reduces table elggentity_relationships to 6145 rows. And the title filter reduces the table elggobjects_entity to 72697 rows. Then MySQL needs to join those tables (6145 x 72697 = 446723065 filtering operations) without using any index because indexes have already been used for filtering. In this case this can be too much. MySQL can even make a decision to keep intermediate calculations in the hard drive by trying to keep enough free space in memory.

Now let's take a look into another query. It uses relationship and PRIMARY KEY (of table elggobjects_entity) as its indexes. The relationship filter reduces table elggentity_relationships to 6145 rows. By joining those tables on PRIMARY KEY index, the result gets only 3957 rows. This is not much for the last filter (i.e. LIKE "%scelerisque%"), even if index is NOT used for this purpose at all.

As you can see the speed much depends on indexes selected for a query. So, in this particular case the PRIMARY KEY index is much more useful than fulltext title index, because PRIMARY KEY has bigger impact for result reduction than title.

MySQL is not always clever to set the right indexes. We can do this manually, by using clauses like IGNORE INDEX (index_name), FORCE INDEX (index_name), etc.

But in your case the problem is that if we use MATCH() AGAINST() in a query then the fulltext index is required, because MATCH() AGAINST() doesn't work without fulltext index at all. So this is the main reason why MySQL has chosen incorrect indexes for the query.

UPDATE

OK, I did some investigation.

Firstly, you may try to force MySQL to use guid_one index instead of relationship on table elggentity_relationships: USE INDEX (guid_one).

But for even better performance I think you can try to create one index for the composition of two columns (guid_one, membership). Current index guid_one is very similar, but for 3 columns, not for 2. In this query there are only 2 columns used. In my opinion after index creation MySQL should automatically use the right index. If not, force MySQL to use it.

Note: After index creation don't forget to remove old USE INDEX instruction from your query, because this may prevent query from using the newly created index. :)

回复收藏 0 原文

~没有更多了~