一个'WHERE列LIKE"%Expression%"怎么能? '比 MySQL 中的 MATCH(column) AGAINST(“表达式”) 执行得更好?
我遇到了严重的 MySQL 性能瓶颈,我无法理解和解决它。以下是表结构、索引和记录计数(请耐心等待,这只是两个表):
mysql> desc elggobjects_entity;
+-------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| guid | bigint(20) unsigned | NO | PRI | NULL | |
| title | text | NO | MUL | NULL | |
| description | text | NO | | NULL | |
+-------------+---------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
mysql> show index from elggobjects_entity;
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| elggobjects_entity | 0 | PRIMARY | 1 | guid | A | 613637 | NULL | NULL | | BTREE | |
| elggobjects_entity | 1 | title | 1 | title | NULL | 131 | NULL | NULL | | FULLTEXT | |
| elggobjects_entity | 1 | title | 2 | description | NULL | 131 | NULL | NULL | | FULLTEXT | |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)
mysql> select count(*) from elggobjects_entity;
+----------+
| count(*) |
+----------+
| 613637 |
+----------+
1 row in set (0.00 sec)
mysql> desc elggentity_relationships;
+--------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| guid_one | bigint(20) unsigned | NO | MUL | NULL | |
| relationship | varchar(50) | NO | MUL | NULL | |
| guid_two | bigint(20) unsigned | NO | MUL | NULL | |
| time_created | int(11) | NO | | NULL | |
+--------------+---------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> show index from elggentity_relationships;
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| elggentity_relationships | 0 | PRIMARY | 1 | id | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 1 | guid_one | A | NULL | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 2 | relationship | A | NULL | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 3 | guid_two | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 1 | relationship | 1 | relationship | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 1 | guid_two | 1 | guid_two | A | 11408236 | NULL | NULL | | BTREE | |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)
mysql> select count(*) from elggentity_relationships;
+----------+
| count(*) |
+----------+
| 11408236 |
+----------+
1 row in set (0.00 sec)
现在我想对这两个表使用 INNER JOIN 并执行全文搜索。
查询:
SELECT
count(DISTINCT o.guid) as total
FROM
elggobjects_entity o
INNER JOIN
elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
((MATCH (o.title, o.description) AGAINST ('scelerisque' )))
这给了我 6 分钟(!)的响应时间。
另一方面,这个
SELECT
count(DISTINCT o.guid) as total
FROM
elggobjects_entity o
INNER JOIN
elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
((o.title like "%scelerisque%") OR (o.description like "%scelerisque%"))
在 0.02 秒内返回了相同的计数值。
这怎么可能?我在这里缺少什么? (MySQL 信息:mysql Ver 14.14 Distrib 5.1.49,使用 readline 6.1 的 debian-linux-gnu (x86_64))
编辑
解释第一个查询(使用 match .. against)给出:
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| 1 | SIMPLE | r | ref | guid_one,relationship | relationship | 152 | const | 6145 | Using where |
| 1 | SIMPLE | o | fulltext | PRIMARY,title | title | 0 | | 1 | Using where |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
而第二个查询(使用 LIKE“%..%”):
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| 1 | SIMPLE | r | ref | guid_one,relationship | relationship | 152 | const | 6145 | Using where |
| 1 | SIMPLE | o | eq_ref | PRIMARY | PRIMARY | 8 | elgg1710.r.guid_one | 1 | Using where |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过结合您的经验和 EXPLAIN 的结果,在这种特殊情况下,全文索引似乎没有您期望的那么有用。这取决于数据库中的特定数据、数据库结构或/和特定查询。
通常数据库引擎每个表使用不超过一个索引。因此,当表有多个索引时,查询优化器会尝试使用更好的索引。但优化器并不总是足够聪明。
EXPLAIN 的输出显示数据库查询优化器决定对
relationship
和title
使用索引。关系过滤器将表elggentity_relationships
减少到 6145 行。标题过滤器将表elggobjects_entity
减少到 72697 行。然后MySQL需要在不使用任何索引的情况下连接这些表(6145 x 72697 = 446723065个过滤操作),因为索引已经用于过滤。在这种情况下,这可能太多了。 MySQL 甚至可以通过尝试在内存中保留足够的可用空间来决定将中间计算保留在硬盘中。现在让我们看一下另一个查询。它使用(表 elggobjects_entity 的)
relationship
和PRIMARY KEY
作为索引。关系过滤器将表elggentity_relationships
减少到 6145 行。通过在PRIMARY KEY
索引上连接这些表,结果仅获得 3957 行。这对于最后一个过滤器(即LIKE "%scelerisque%"
)来说并不重要,即使索引根本不用于此目的。正如您所看到的,速度很大程度上取决于为查询选择的索引。因此,在这种特殊情况下,
PRIMARY KEY
索引比全文title
索引有用得多,因为PRIMARY KEY
对结果缩减的影响比全文索引更大。标题
。MySQL 并不总是能够巧妙地设置正确的索引。我们可以通过使用诸如
IGNORE INDEX (index_name)
、FORCE INDEX (index_name)
等子句来手动执行此操作。但在您的情况下,问题是如果我们使用 < code>MATCH() AGAINST() 在查询中则需要全文索引,因为如果没有全文索引,
MATCH() AGAINST()
根本不起作用。所以这就是MySQL为查询选择错误索引的主要原因。更新
好的,我做了一些调查。
首先,您可以尝试强制MySQL在表
elggentity_relationships
上使用guid_one
索引而不是relationship
:USE INDEX (guid_one)
代码>.但为了获得更好的性能,我认为您可以尝试为两列的组合创建一个索引(
guid_one
,membership
)。当前索引guid_one
非常相似,但针对的是 3 列,而不是 2 列。在此查询中仅使用了 2 列。在我看来,创建索引后 MySQL 应该自动使用正确的索引。如果没有,强制 MySQL 使用它。注意:创建索引后,不要忘记从查询中删除旧的
USE INDEX
指令,因为这可能会阻止查询使用新创建的索引。 :)By combining your experience and EXPLAIN's results, it seems that fulltext index is not as useful as you expect in this particular case. This depends on particular data in your database, on database structure or/and particular query.
Usually database engines use no more than one index per table. So when the table has more than one index, query optimizer tries to use the better one. But optimizer is not always clever enough.
EXPLAIN's output shows that database query optimizer decided to use indexes for
relationship
andtitle
. The relationship filter reduces tableelggentity_relationships
to 6145 rows. And the title filter reduces the tableelggobjects_entity
to 72697 rows. Then MySQL needs to join those tables (6145 x 72697 = 446723065 filtering operations) without using any index because indexes have already been used for filtering. In this case this can be too much. MySQL can even make a decision to keep intermediate calculations in the hard drive by trying to keep enough free space in memory.Now let's take a look into another query. It uses
relationship
andPRIMARY KEY
(of table elggobjects_entity) as its indexes. The relationship filter reduces tableelggentity_relationships
to 6145 rows. By joining those tables onPRIMARY KEY
index, the result gets only 3957 rows. This is not much for the last filter (i.e.LIKE "%scelerisque%"
), even if index is NOT used for this purpose at all.As you can see the speed much depends on indexes selected for a query. So, in this particular case the
PRIMARY KEY
index is much more useful than fulltexttitle
index, becausePRIMARY KEY
has bigger impact for result reduction thantitle
.MySQL is not always clever to set the right indexes. We can do this manually, by using clauses like
IGNORE INDEX (index_name)
,FORCE INDEX (index_name)
, etc.But in your case the problem is that if we use
MATCH() AGAINST()
in a query then the fulltext index is required, becauseMATCH() AGAINST()
doesn't work without fulltext index at all. So this is the main reason why MySQL has chosen incorrect indexes for the query.UPDATE
OK, I did some investigation.
Firstly, you may try to force MySQL to use
guid_one
index instead ofrelationship
on tableelggentity_relationships
:USE INDEX (guid_one)
.But for even better performance I think you can try to create one index for the composition of two columns (
guid_one
,membership
). Current indexguid_one
is very similar, but for 3 columns, not for 2. In this query there are only 2 columns used. In my opinion after index creation MySQL should automatically use the right index. If not, force MySQL to use it.Note: After index creation don't forget to remove old
USE INDEX
instruction from your query, because this may prevent query from using the newly created index. :)