MySQL查询优化——distinct、order by和limit

发布于 2024-09-03 07:14:08 字数 5577 浏览 6 评论 0原文

我正在尝试优化以下查询：

select distinct this_.id as y0_
from Rental this_
    left outer join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    left outer join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
order by rentalsegm2_.id asc
limit 0, 100;

该查询连续多次执行，用于记录的分页处理（每次都有不同的限制）。它返回我在处理中需要的 id。我的问题是这个查询需要超过 3 秒。这三个表中的每一个都有大约 200 万行。

解释给出：

+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                                        | rows   | Extra                                        |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | rentalsegm2_ | range  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9       | NULL                                       | 449904 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest              | PRIMARY       | 8       | solscsm_main.rentalsegm2_.rentalRequest_id |      1 | Using where                                  | 
|  1 | SIMPLE      | this_        | eq_ref | PRIMARY,index_billingStatus                         | PRIMARY       | 8       | solscsm_main.rentalrequ1_.rental_id        |      1 | Using where                                  | 
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+

我尝试删除不同的值，查询运行速度快了三倍。不带查询的解释给出：

+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                                        | rows   | Extra                       |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+
|  1 | SIMPLE      | rentalsegm2_ | range  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9       | NULL                                       | 451972 | Using where; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest              | PRIMARY       | 8       | solscsm_main.rentalsegm2_.rentalRequest_id |      1 | Using where                 | 
|  1 | SIMPLE      | this_        | eq_ref | PRIMARY,index_billingStatus                         | PRIMARY       | 8       | solscsm_main.rentalrequ1_.rental_id        |      1 | Using where                 | 
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+

如您所见，在使用不同时添加 Usingtemporary 。

我已经有一个关于 where 子句中使用的所有字段的索引。我可以做些什么来优化这个查询吗？

非常感谢！

编辑：我尝试按照建议在 this_.id 上排序，但查询速度慢了 5 倍。以下是解释计划：

+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+
| id | select_type | table        | type | possible_keys                                       | key                                   | key_len | ref                          | rows   | Extra                                        |
+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | this_        | ref  | PRIMARY,index_billingStatus                         | index_billingStatus                   | 5       | const                        | 782348 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | ref  | PRIMARY,fk_rental_id_BikeRentalRequest              | fk_rental_id_BikeRentalRequest        | 9       | solscsm_main.this_.id        |      1 | Using where; Using index; Distinct           | 
|  1 | SIMPLE      | rentalsegm2_ | ref  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | fk_rentalRequest_id_BikeRentalSegment | 8       | solscsm_main.rentalrequ1_.id |      1 | Using where; Distinct                        | 
+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+

原文

I am trying to optimize the following query:

select distinct this_.id as y0_
from Rental this_
    left outer join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    left outer join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
order by rentalsegm2_.id asc
limit 0, 100;

This query is done multiple time in a row for paginated processing of records (with a different limit each time). It returns the ids I need in the processing. My problem is that this query take more than 3 seconds. I have about 2 million rows in each of the three tables.

Explain gives:

+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                                        | rows   | Extra                                        |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | rentalsegm2_ | range  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9       | NULL                                       | 449904 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest              | PRIMARY       | 8       | solscsm_main.rentalsegm2_.rentalRequest_id |      1 | Using where                                  | 
|  1 | SIMPLE      | this_        | eq_ref | PRIMARY,index_billingStatus                         | PRIMARY       | 8       | solscsm_main.rentalrequ1_.rental_id        |      1 | Using where                                  | 
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+

I tried to remove the distinct and the query ran three times faster. explain without the query gives:

+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                                        | rows   | Extra                       |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+
|  1 | SIMPLE      | rentalsegm2_ | range  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9       | NULL                                       | 451972 | Using where; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest              | PRIMARY       | 8       | solscsm_main.rentalsegm2_.rentalRequest_id |      1 | Using where                 | 
|  1 | SIMPLE      | this_        | eq_ref | PRIMARY,index_billingStatus                         | PRIMARY       | 8       | solscsm_main.rentalrequ1_.rental_id        |      1 | Using where                 | 
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+

As you can see, the Using temporary is added when using distinct.

I already have an index on all fields used in the where clause.
Is there anything I can do to optimize this query?

Thank you very much!

Edit: I tried to order by on this_.id as suggested and the query was 5x slower. Here is the explain plan:

+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+
| id | select_type | table        | type | possible_keys                                       | key                                   | key_len | ref                          | rows   | Extra                                        |
+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | this_        | ref  | PRIMARY,index_billingStatus                         | index_billingStatus                   | 5       | const                        | 782348 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | ref  | PRIMARY,fk_rental_id_BikeRentalRequest              | fk_rental_id_BikeRentalRequest        | 9       | solscsm_main.this_.id        |      1 | Using where; Using index; Distinct           | 
|  1 | SIMPLE      | rentalsegm2_ | ref  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | fk_rentalRequest_id_BikeRentalSegment | 8       | solscsm_main.rentalrequ1_.id |      1 | Using where; Distinct                        | 
+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

扶醉桌前 2024-09-10 07:14:08

从执行计划中我们看到优化器足够聪明，可以理解这里不需要 OUTER JOIN。无论如何，您最好明确指定这一点。
DISTINCT 修饰符意味着您想要对 SELECT 部分中的所有字段进行 GROUP BY，即对所有指定字段进行 ORDER BY，然后丢弃重复项。换句话说，order byrentsegm2_.id asc子句在这里没有任何意义。

下面的查询应返回等效结果：

select distinct this_.id as y0_
from Rental this_
    join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
limit 0, 100;

UPD

如果您希望执行计划以 RentalSegment 开头，则需要将以下索引添加到数据库中：

RentalSegment (endDate )
RentalRequest (id, Rental_id)
Rental (id, DTYPE, billingStatus) 或 (id, billingStatus, DTYPE)

然后可以将查询重写如下：

SELECT this_.id as y0_
FROM RentalSegment rs
    JOIN RentalRequest rr
    JOIN Rental this_
WHERE rs.endDate between 1273631699529 and 1274927699529
    AND rs.rentalRequest_id = rr.id
    AND rr.rental_id <= 1848978
    AND rr.rental_id = this_.id
    AND this_.DTYPE='D'
    AND this_.billingStatus = 1
GROUP BY this_.id
LIMIT 0, 100;

如果执行计划不会从 RentalSegment 开始，您可以使用 强制执行STRAIGHT_JOIN。

From the execution plan we see that the optimizer is smart enough to understand that you do not require OUTER JOINs here. Anyway, you should better specify that explicitly.
The DISTINCT modifier means that you want to GROUP BY all fields in SELECT part, that is ORDER BY all of the specified fields and then discard duplicates. In other words, order by rentalsegm2_.id asc clause does not make any sence here.

The query below should return the equivalent result:

select distinct this_.id as y0_
from Rental this_
    join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
limit 0, 100;

UPD

If you want the execution plan to start with RentalSegment, you will need to add the following indices to the database:

RentalSegment (endDate)
RentalRequest (id, rental_id)
Rental (id, DTYPE, billingStatus) or (id, billingStatus, DTYPE)

The query then could be rewritten as the following:

SELECT this_.id as y0_
FROM RentalSegment rs
    JOIN RentalRequest rr
    JOIN Rental this_
WHERE rs.endDate between 1273631699529 and 1274927699529
    AND rs.rentalRequest_id = rr.id
    AND rr.rental_id <= 1848978
    AND rr.rental_id = this_.id
    AND this_.DTYPE='D'
    AND this_.billingStatus = 1
GROUP BY this_.id
LIMIT 0, 100;

If the execution plan will not start from RentalSegment you can force in with STRAIGHT_JOIN.

回复收藏 0 原文