MySQL 不使用 JOIN、WHERE 和 ORDER 索引
我们有两个类似于简单标记记录结构的表,如下所示(实际上它要复杂得多,但这就是问题的本质):
tag (A.a) | recordId (A.b)
1 | 1
2 | 1
2 | 2
3 | 2
....
问题
recordId (B.b) | recordData (B.c)
1 | 123
2 | 666
3 | 1246
是获取具有特定标记的有序记录。 最明显的方法是使用简单的连接和 (PK)(Aa, Ab), (Ab), (PK)(Bb), (Bb,Bc) 上的索引,如下所示:
select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 order by c;
但是,这会产生令人不快的结果文件排序:
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
| 1 | SIMPLE | A | ref | PRIMARY,b | PRIMARY | 4 | const | 94 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | B | ref | PRIMARY,b | b | 4 | booli.A.b | 1 | Using index |
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
使用巨大且极其冗余的“物化视图”,我们可以获得相当不错的性能,但这是以业务逻辑复杂化为代价的,这是我们希望避免的,特别是因为 A 和 B 表已经是 MV:s (并且是其他查询所需要的,并且实际上使用 UNION 的相同查询)。
create temporary table C engine=innodb as (select A.a, A.b, B.c from A join B on A.b = B.b);
explain select a, b, c from C where a = 44 order by c;
使情况进一步复杂化的是我们在 B 表上有条件,例如范围过滤器。
select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 AND B.c > 678 order by c;
但我们有信心,如果文件排序问题消失,我们可以解决这个问题。
有谁知道为什么上面代码块 3 中的简单连接不会使用索引进行排序,以及我们是否可以通过某种方式解决这个问题而不创建新的 MV?
下面是我们用于测试的完整 SQL 列表。
DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
DROP TABLE IF EXISTS C;
CREATE TEMPORARY TABLE A (a INT NOT NULL, b INT NOT NULL, PRIMARY KEY(a, b), INDEX idx_A_b (b)) ENGINE=INNODB;
CREATE TEMPORARY TABLE B (b INT NOT NULL, c INT NOT NULL, d VARCHAR(5000) NOT NULL DEFAULT '', PRIMARY KEY(b), INDEX idx_B_c (c), INDEX idx_B_b (b, c)) ENGINE=INNODB;
DELIMITER $$
CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt <= cnt DO
INSERT IGNORE INTO A SELECT RAND()*100, RAND()*10000;
INSERT IGNORE INTO B SELECT RAND()*10000, RAND()*1000, '';
SET _cnt = _cnt + 1;
END WHILE;
END
$$
DELIMITER ;
START TRANSACTION;
CALL prc_filler(100000);
COMMIT;
DROP PROCEDURE prc_filler;
CREATE TEMPORARY TABLE C ENGINE=INNODB AS (SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b);
ALTER TABLE C ADD (PRIMARY KEY(a, b), INDEX idx_C_a_c (a, c));
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE A.a = 44;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE 1 ORDER BY B.c;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b where A.a = 44 ORDER BY B.c;
EXPLAIN EXTENDED SELECT a, b, c FROM C WHERE a = 44 ORDER BY c;
-- Added after Quassnois comments
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM B FORCE INDEX (idx_B_c) JOIN A ON A.b = B.b WHERE A.a = 44 ORDER BY B.c;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE A.a = 44 ORDER BY B.c LIMIT 10;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM B FORCE INDEX (idx_B_c) JOIN A ON A.b = B.b WHERE A.a = 44 ORDER BY B.c LIMIT 10;
We have two tables resembling a simple tag-record structure as follows (in reality it's much more complex but this is the essence of the problem):
tag (A.a) | recordId (A.b)
1 | 1
2 | 1
2 | 2
3 | 2
....
and
recordId (B.b) | recordData (B.c)
1 | 123
2 | 666
3 | 1246
The problem is fetching ordered records with a specific tag. The obvious way of doing it is with a simple join and indexes on (PK)(A.a, A.b), (A.b), (PK)(B.b), (B.b,B.c) as such:
select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 order by c;
However, this gives the unpleasant result of a filesort:
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
| 1 | SIMPLE | A | ref | PRIMARY,b | PRIMARY | 4 | const | 94 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | B | ref | PRIMARY,b | b | 4 | booli.A.b | 1 | Using index |
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
Using a huge and extremely redundant "materialized view" we can get pretty decent performance but this at the expense of complicating the business-logic, something we would like to avoid, especially since the A and B tables already are MV:s (and are needed for other queries, and infact the same queries using a UNION).
create temporary table C engine=innodb as (select A.a, A.b, B.c from A join B on A.b = B.b);
explain select a, b, c from C where a = 44 order by c;
Further complicating the situation is the fact that we have conditionals on the B-table, such as range-filters.
select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 AND B.c > 678 order by c;
But we are confident we can handle this if the filesort problem goes away.
Does anyone know why the simple join in codeblock 3 above won't use the index for sorting and if we can get around the problem in some way without creating a new MV?
Below is the full SQL listing that we are using for testing.
DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
DROP TABLE IF EXISTS C;
CREATE TEMPORARY TABLE A (a INT NOT NULL, b INT NOT NULL, PRIMARY KEY(a, b), INDEX idx_A_b (b)) ENGINE=INNODB;
CREATE TEMPORARY TABLE B (b INT NOT NULL, c INT NOT NULL, d VARCHAR(5000) NOT NULL DEFAULT '', PRIMARY KEY(b), INDEX idx_B_c (c), INDEX idx_B_b (b, c)) ENGINE=INNODB;
DELIMITER $
CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
DECLARE _cnt INT;
SET _cnt = 1;
WHILE _cnt <= cnt DO
INSERT IGNORE INTO A SELECT RAND()*100, RAND()*10000;
INSERT IGNORE INTO B SELECT RAND()*10000, RAND()*1000, '';
SET _cnt = _cnt + 1;
END WHILE;
END
$
DELIMITER ;
START TRANSACTION;
CALL prc_filler(100000);
COMMIT;
DROP PROCEDURE prc_filler;
CREATE TEMPORARY TABLE C ENGINE=INNODB AS (SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b);
ALTER TABLE C ADD (PRIMARY KEY(a, b), INDEX idx_C_a_c (a, c));
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE A.a = 44;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE 1 ORDER BY B.c;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b where A.a = 44 ORDER BY B.c;
EXPLAIN EXTENDED SELECT a, b, c FROM C WHERE a = 44 ORDER BY c;
-- Added after Quassnois comments
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM B FORCE INDEX (idx_B_c) JOIN A ON A.b = B.b WHERE A.a = 44 ORDER BY B.c;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE A.a = 44 ORDER BY B.c LIMIT 10;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM B FORCE INDEX (idx_B_c) JOIN A ON A.b = B.b WHERE A.a = 44 ORDER BY B.c LIMIT 10;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当我尝试使用您的脚本重现此查询时:
它在 0.0043 秒内完成(立即),返回 930 行并产生此计划:
对于这样的查询来说,它非常有效。
对于这样的查询,您不能使用单个索引同时进行过滤和排序。
请参阅我博客中的这篇文章以获取更详细的说明:
如果您希望查询返回少量记录,则应该使用
A
上的索引进行过滤,然后使用 filesort 进行排序(就像上面的查询一样)。如果你希望它返回很多记录(并且
LIMIT
),你需要使用索引进行排序,然后过滤:When I try to reproduce this query using your scripts:
, it completes in
0.0043 seconds
(instantly), returns930
rows and yields this plan:It's quite efficient for such a query.
For such a query, you cannot use a single index both for filtering and sorting.
See this article in my blog for more detailed explanations:
If you expect your query to return few records, you should use the index on
A
for filtering and then sort using filesort (like the query above does).If you expect it to return many records (and
LIMIT
them), you need to use index for sorting and then filter:select Aa, Ab, Bc from A join B on Ab = Bb where a = 44 order by c;
如果为列添加别名,这有帮助吗? 示例:
我所做的唯一更改是:
我知道您的真实数据更复杂,但我假设您提供了一个简单版本的查询,因为问题就在这个简单的级别。
select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 order by c;
If you alias the columns, does that help? Example:
The only changes I made were:
I know your real data is more complex, but I assume that you provided a simple version of the query because the problem is at that simple level.