MySQL 不使用 JOIN、WHERE 和 ORDER 索引

发布于 2024-07-30 18:54:12 字数 3650 浏览 9 评论 0原文

我们有两个类似于简单标记记录结构的表，如下所示（实际上它要复杂得多，但这就是问题的本质）：

tag (A.a) | recordId (A.b)
1         | 1
2         | 1
2         | 2
3         | 2
....

问题

recordId (B.b) | recordData (B.c)
1              | 123
2              | 666
3              | 1246

是获取具有特定标记的有序记录。最明显的方法是使用简单的连接和 (PK)(Aa, Ab), (Ab), (PK)(Bb), (Bb,Bc) 上的索引，如下所示：

select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 order by c;

但是，这会产生令人不快的结果文件排序：

+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref       | rows | Extra                                        |
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
|  1 | SIMPLE      | A     | ref  | PRIMARY,b     | PRIMARY | 4       | const     |   94 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | B     | ref  | PRIMARY,b     | b       | 4       | booli.A.b |    1 | Using index                                  | 
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+

使用巨大且极其冗余的“物化视图”，我们可以获得相当不错的性能，但这是以业务逻辑复杂化为代价的，这是我们希望避免的，特别是因为 A 和 B 表已经是 MV:s （并且是其他查询所需要的，并且实际上使用 UNION 的相同查询）。

create temporary table C engine=innodb as (select A.a, A.b, B.c from A join B on A.b = B.b);
explain select a, b, c from C where a = 44 order by c;

使情况进一步复杂化的是我们在 B 表上有条件，例如范围过滤器。

select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 AND B.c > 678 order by c;

但我们有信心，如果文件排序问题消失，我们可以解决这个问题。

有谁知道为什么上面代码块 3 中的简单连接不会使用索引进行排序，以及我们是否可以通过某种方式解决这个问题而不创建新的 MV？

下面是我们用于测试的完整 SQL 列表。

DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
DROP TABLE IF EXISTS C;
CREATE TEMPORARY TABLE A (a INT NOT NULL, b INT NOT NULL, PRIMARY KEY(a, b), INDEX idx_A_b (b)) ENGINE=INNODB;
CREATE TEMPORARY TABLE B (b INT NOT NULL, c INT NOT NULL, d VARCHAR(5000) NOT NULL DEFAULT '', PRIMARY KEY(b), INDEX idx_B_c (c), INDEX idx_B_b (b, c)) ENGINE=INNODB;

DELIMITER $$
CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT IGNORE INTO A SELECT RAND()*100, RAND()*10000;
                INSERT IGNORE INTO B SELECT RAND()*10000, RAND()*1000, '';
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$
DELIMITER ;

START TRANSACTION;
CALL prc_filler(100000);
COMMIT;
DROP PROCEDURE prc_filler;

CREATE TEMPORARY TABLE C ENGINE=INNODB AS (SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b);
ALTER TABLE C ADD (PRIMARY KEY(a, b), INDEX idx_C_a_c (a, c));

EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE A.a = 44;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE 1 ORDER BY B.c;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b where A.a = 44 ORDER BY B.c;
EXPLAIN EXTENDED SELECT a, b, c FROM C WHERE a = 44 ORDER BY c;
-- Added after Quassnois comments
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM  B FORCE INDEX (idx_B_c) JOIN A ON A.b = B.b WHERE A.a = 44 ORDER BY B.c;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE A.a = 44 ORDER BY B.c LIMIT 10;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM  B FORCE INDEX (idx_B_c) JOIN A ON A.b = B.b WHERE A.a = 44 ORDER BY B.c LIMIT 10;

原文

We have two tables resembling a simple tag-record structure as follows (in reality it's much more complex but this is the essence of the problem):

tag (A.a) | recordId (A.b)
1         | 1
2         | 1
2         | 2
3         | 2
....

and

recordId (B.b) | recordData (B.c)
1              | 123
2              | 666
3              | 1246

The problem is fetching ordered records with a specific tag. The obvious way of doing it is with a simple join and indexes on (PK)(A.a, A.b), (A.b), (PK)(B.b), (B.b,B.c) as such:

select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 order by c;

However, this gives the unpleasant result of a filesort:

+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref       | rows | Extra                                        |
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+
|  1 | SIMPLE      | A     | ref  | PRIMARY,b     | PRIMARY | 4       | const     |   94 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | B     | ref  | PRIMARY,b     | b       | 4       | booli.A.b |    1 | Using index                                  | 
+----+-------------+-------+------+---------------+---------+---------+-----------+------+----------------------------------------------+

Using a huge and extremely redundant "materialized view" we can get pretty decent performance but this at the expense of complicating the business-logic, something we would like to avoid, especially since the A and B tables already are MV:s (and are needed for other queries, and infact the same queries using a UNION).

create temporary table C engine=innodb as (select A.a, A.b, B.c from A join B on A.b = B.b);
explain select a, b, c from C where a = 44 order by c;

Further complicating the situation is the fact that we have conditionals on the B-table, such as range-filters.

select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 AND B.c > 678 order by c;

But we are confident we can handle this if the filesort problem goes away.

Does anyone know why the simple join in codeblock 3 above won't use the index for sorting and if we can get around the problem in some way without creating a new MV?

Below is the full SQL listing that we are using for testing.

DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
DROP TABLE IF EXISTS C;
CREATE TEMPORARY TABLE A (a INT NOT NULL, b INT NOT NULL, PRIMARY KEY(a, b), INDEX idx_A_b (b)) ENGINE=INNODB;
CREATE TEMPORARY TABLE B (b INT NOT NULL, c INT NOT NULL, d VARCHAR(5000) NOT NULL DEFAULT '', PRIMARY KEY(b), INDEX idx_B_c (c), INDEX idx_B_b (b, c)) ENGINE=INNODB;

DELIMITER $
CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT IGNORE INTO A SELECT RAND()*100, RAND()*10000;
                INSERT IGNORE INTO B SELECT RAND()*10000, RAND()*1000, '';
                SET _cnt = _cnt + 1;
        END WHILE;
END
$
DELIMITER ;

START TRANSACTION;
CALL prc_filler(100000);
COMMIT;
DROP PROCEDURE prc_filler;

CREATE TEMPORARY TABLE C ENGINE=INNODB AS (SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b);
ALTER TABLE C ADD (PRIMARY KEY(a, b), INDEX idx_C_a_c (a, c));

EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE A.a = 44;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE 1 ORDER BY B.c;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b where A.a = 44 ORDER BY B.c;
EXPLAIN EXTENDED SELECT a, b, c FROM C WHERE a = 44 ORDER BY c;
-- Added after Quassnois comments
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM  B FORCE INDEX (idx_B_c) JOIN A ON A.b = B.b WHERE A.a = 44 ORDER BY B.c;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM A JOIN B ON A.b = B.b WHERE A.a = 44 ORDER BY B.c LIMIT 10;
EXPLAIN EXTENDED SELECT A.a, A.b, B.c FROM  B FORCE INDEX (idx_B_c) JOIN A ON A.b = B.b WHERE A.a = 44 ORDER BY B.c LIMIT 10;

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

对你再特殊 2024-08-06 18:54:12

当我尝试使用您的脚本重现此查询时：

SELECT  A.a, A.b, B.c
FROM    A
JOIN    B
ON      A.b = B.b
WHERE   a = 44
ORDER BY
        c

它在 0.0043 秒内完成（立即），返回 930 行并产生此计划：

1, 'SIMPLE', 'A', 'ref', 'PRIMARY', 'PRIMARY', '4', 'const', 1610, 'Using index; Using temporary; Using filesort'
1, 'SIMPLE', 'B', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.A.b', 1, ''

对于这样的查询来说，它非常有效。

对于这样的查询，您不能使用单个索引同时进行过滤和排序。

请参阅我博客中的这篇文章以获取更详细的说明：

选择索引

如果您希望查询返回少量记录，则应该使用 A 上的索引进行过滤，然后使用 filesort 进行排序（就像上面的查询一样）。

如果你希望它返回很多记录（并且LIMIT），你需要使用索引进行排序，然后过滤：

CREATE INDEX ix_a_b ON a (b);
CREATE INDEX ix_b_c ON b (c)

SELECT  *
FROM    B FORCE INDEX (ix_b_c)
JOIN    A
ON      A.b = B.b
ORDER BY
        b.c
LIMIT 10;

1, 'SIMPLE', 'B', 'index', '', 'ix_b_c', '4', '', 2, 'Using index'
1, 'SIMPLE', 'A', 'ref', 'ix_a_b', 'ix_a_b', '4', 'test.B.b', 4, 'Using index'

When I try to reproduce this query using your scripts:

SELECT  A.a, A.b, B.c
FROM    A
JOIN    B
ON      A.b = B.b
WHERE   a = 44
ORDER BY
        c

, it completes in 0.0043 seconds (instantly), returns 930 rows and yields this plan:

1, 'SIMPLE', 'A', 'ref', 'PRIMARY', 'PRIMARY', '4', 'const', 1610, 'Using index; Using temporary; Using filesort'
1, 'SIMPLE', 'B', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.A.b', 1, ''

It's quite efficient for such a query.

For such a query, you cannot use a single index both for filtering and sorting.

See this article in my blog for more detailed explanations:

Choosing index

If you expect your query to return few records, you should use the index on A for filtering and then sort using filesort (like the query above does).

If you expect it to return many records (and LIMIT them), you need to use index for sorting and then filter:

CREATE INDEX ix_a_b ON a (b);
CREATE INDEX ix_b_c ON b (c)

SELECT  *
FROM    B FORCE INDEX (ix_b_c)
JOIN    A
ON      A.b = B.b
ORDER BY
        b.c
LIMIT 10;

1, 'SIMPLE', 'B', 'index', '', 'ix_b_c', '4', '', 2, 'Using index'
1, 'SIMPLE', 'A', 'ref', 'ix_a_b', 'ix_a_b', '4', 'test.B.b', 4, 'Using index'

回复收藏 0 原文

樱娆 2024-08-06 18:54:12

select Aa, Ab, Bc from A join B on Ab = Bb where a = 44 order by c;

如果为列添加别名，这有帮助吗？示例：

 SELECT 
 T1.a AS colA, 
 T2.b AS colB, 
 T2.c AS colC 
 FROM A AS T1 
 JOIN B AS T2 
 ON (T1.b = T2.b) 
 WHERE 
 T1.a = 44 
 ORDER BY colC;

我所做的唯一更改是：

我将连接条件放在括号中连接
条件和 where 条件基于表列
ORDER BY 条件基于结果表列
我将结果表列和查询表别名为 (希望）在我使用其中之一时使其更清楚（并且对服务器更清楚。您忽略在原始查询中的两个位置引用您的列）。

我知道您的真实数据更复杂，但我假设您提供了一个简单版本的查询，因为问题就在这个简单的级别。

select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 order by c;

If you alias the columns, does that help? Example:

 SELECT 
 T1.a AS colA, 
 T2.b AS colB, 
 T2.c AS colC 
 FROM A AS T1 
 JOIN B AS T2 
 ON (T1.b = T2.b) 
 WHERE 
 T1.a = 44 
 ORDER BY colC;

The only changes I made were:

I put the join conditions in parenthesis
The join conditions and where conditions are based on table columns
The ORDER BY condition is based on the resulting table column
I aliased the result table columns and the queried tables to (hopefully) make it more clear when I was using one or the other (and more clear to the server. You neglect to refer to your columns in two places in your original query).

I know your real data is more complex, but I assume that you provided a simple version of the query because the problem is at that simple level.

回复收藏 0 原文

~没有更多了~