MySQL查询优化——distinct、order by和limit

发布于 2024-09-03 07:14:08 字数 5577 浏览 1 评论 0原文

我正在尝试优化以下查询:

select distinct this_.id as y0_
from Rental this_
    left outer join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    left outer join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
order by rentalsegm2_.id asc
limit 0, 100;

该查询连续多次执行,用于记录的分页处理(每次都有不同的限制)。它返回我在处理中需要的 id。我的问题是这个查询需要超过 3 秒。这三个表中的每一个都有大约 200 万行。

解释给出:

+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                                        | rows   | Extra                                        |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | rentalsegm2_ | range  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9       | NULL                                       | 449904 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest              | PRIMARY       | 8       | solscsm_main.rentalsegm2_.rentalRequest_id |      1 | Using where                                  | 
|  1 | SIMPLE      | this_        | eq_ref | PRIMARY,index_billingStatus                         | PRIMARY       | 8       | solscsm_main.rentalrequ1_.rental_id        |      1 | Using where                                  | 
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+

我尝试删除不同的值,查询运行速度快了三倍。不带查询的解释给出:

+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                                        | rows   | Extra                       |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+
|  1 | SIMPLE      | rentalsegm2_ | range  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9       | NULL                                       | 451972 | Using where; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest              | PRIMARY       | 8       | solscsm_main.rentalsegm2_.rentalRequest_id |      1 | Using where                 | 
|  1 | SIMPLE      | this_        | eq_ref | PRIMARY,index_billingStatus                         | PRIMARY       | 8       | solscsm_main.rentalrequ1_.rental_id        |      1 | Using where                 | 
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+

如您所见,在使用不同时添加 Usingtemporary

我已经有一个关于 where 子句中使用的所有字段的索引。 我可以做些什么来优化这个查询吗?

非常感谢!

编辑:我尝试按照建议在 this_.id 上排序,但查询速度慢了 5 倍。以下是解释计划:

+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+
| id | select_type | table        | type | possible_keys                                       | key                                   | key_len | ref                          | rows   | Extra                                        |
+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | this_        | ref  | PRIMARY,index_billingStatus                         | index_billingStatus                   | 5       | const                        | 782348 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | ref  | PRIMARY,fk_rental_id_BikeRentalRequest              | fk_rental_id_BikeRentalRequest        | 9       | solscsm_main.this_.id        |      1 | Using where; Using index; Distinct           | 
|  1 | SIMPLE      | rentalsegm2_ | ref  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | fk_rentalRequest_id_BikeRentalSegment | 8       | solscsm_main.rentalrequ1_.id |      1 | Using where; Distinct                        | 
+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+

I am trying to optimize the following query:

select distinct this_.id as y0_
from Rental this_
    left outer join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    left outer join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
order by rentalsegm2_.id asc
limit 0, 100;

This query is done multiple time in a row for paginated processing of records (with a different limit each time). It returns the ids I need in the processing. My problem is that this query take more than 3 seconds. I have about 2 million rows in each of the three tables.

Explain gives:

+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                                        | rows   | Extra                                        |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | rentalsegm2_ | range  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9       | NULL                                       | 449904 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest              | PRIMARY       | 8       | solscsm_main.rentalsegm2_.rentalRequest_id |      1 | Using where                                  | 
|  1 | SIMPLE      | this_        | eq_ref | PRIMARY,index_billingStatus                         | PRIMARY       | 8       | solscsm_main.rentalrequ1_.rental_id        |      1 | Using where                                  | 
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+

I tried to remove the distinct and the query ran three times faster. explain without the query gives:

+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+
| id | select_type | table        | type   | possible_keys                                       | key           | key_len | ref                                        | rows   | Extra                       |
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+
|  1 | SIMPLE      | rentalsegm2_ | range  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9       | NULL                                       | 451972 | Using where; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest              | PRIMARY       | 8       | solscsm_main.rentalsegm2_.rentalRequest_id |      1 | Using where                 | 
|  1 | SIMPLE      | this_        | eq_ref | PRIMARY,index_billingStatus                         | PRIMARY       | 8       | solscsm_main.rentalrequ1_.rental_id        |      1 | Using where                 | 
+----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+

As you can see, the Using temporary is added when using distinct.

I already have an index on all fields used in the where clause.
Is there anything I can do to optimize this query?

Thank you very much!

Edit: I tried to order by on this_.id as suggested and the query was 5x slower. Here is the explain plan:

+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+
| id | select_type | table        | type | possible_keys                                       | key                                   | key_len | ref                          | rows   | Extra                                        |
+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | this_        | ref  | PRIMARY,index_billingStatus                         | index_billingStatus                   | 5       | const                        | 782348 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | rentalrequ1_ | ref  | PRIMARY,fk_rental_id_BikeRentalRequest              | fk_rental_id_BikeRentalRequest        | 9       | solscsm_main.this_.id        |      1 | Using where; Using index; Distinct           | 
|  1 | SIMPLE      | rentalsegm2_ | ref  | index_endDate,fk_rentalRequest_id_BikeRentalSegment | fk_rentalRequest_id_BikeRentalSegment | 8       | solscsm_main.rentalrequ1_.id |      1 | Using where; Distinct                        | 
+----+-------------+--------------+------+-----------------------------------------------------+---------------------------------------+---------+------------------------------+--------+----------------------------------------------+

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

扶醉桌前 2024-09-10 07:14:08
  1. 从执行计划中我们看到优化器足够聪明,可以理解这里不需要 OUTER JOIN。无论如何,您最好明确指定这一点。
  2. DISTINCT 修饰符意味着您想要对 SELECT 部分中的所有字段进行 GROUP BY,即对所有指定字段进行 ORDER BY,然后丢弃重复项。换句话说,order byrentsegm2_.id asc子句在这里没有任何意义。

下面的查询应返回等效结果:

select distinct this_.id as y0_
from Rental this_
    join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
limit 0, 100;

UPD

如果您希望执行计划以 RentalSegment 开头,则需要将以下索引添加到数据库中:

  1. RentalSegment (endDate )
  2. RentalRequest (id, Rental_id)
  3. Rental (id, DTYPE, billingStatus) 或 (id, billingStatus, DTYPE)

然后可以将查询重写如下:

SELECT this_.id as y0_
FROM RentalSegment rs
    JOIN RentalRequest rr
    JOIN Rental this_
WHERE rs.endDate between 1273631699529 and 1274927699529
    AND rs.rentalRequest_id = rr.id
    AND rr.rental_id <= 1848978
    AND rr.rental_id = this_.id
    AND this_.DTYPE='D'
    AND this_.billingStatus = 1
GROUP BY this_.id
LIMIT 0, 100;

如果执行计划不会从 RentalSegment 开始,您可以使用 强制执行STRAIGHT_JOIN

  1. From the execution plan we see that the optimizer is smart enough to understand that you do not require OUTER JOINs here. Anyway, you should better specify that explicitly.
  2. The DISTINCT modifier means that you want to GROUP BY all fields in SELECT part, that is ORDER BY all of the specified fields and then discard duplicates. In other words, order by rentalsegm2_.id asc clause does not make any sence here.

The query below should return the equivalent result:

select distinct this_.id as y0_
from Rental this_
    join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
limit 0, 100;

UPD

If you want the execution plan to start with RentalSegment, you will need to add the following indices to the database:

  1. RentalSegment (endDate)
  2. RentalRequest (id, rental_id)
  3. Rental (id, DTYPE, billingStatus) or (id, billingStatus, DTYPE)

The query then could be rewritten as the following:

SELECT this_.id as y0_
FROM RentalSegment rs
    JOIN RentalRequest rr
    JOIN Rental this_
WHERE rs.endDate between 1273631699529 and 1274927699529
    AND rs.rentalRequest_id = rr.id
    AND rr.rental_id <= 1848978
    AND rr.rental_id = this_.id
    AND this_.DTYPE='D'
    AND this_.billingStatus = 1
GROUP BY this_.id
LIMIT 0, 100;

If the execution plan will not start from RentalSegment you can force in with STRAIGHT_JOIN.

我爱人 2024-09-10 07:14:08

没有不同的查询运行得更快的原因是因为您有一个限制子句。如果没有 unique,服务器只需要查看前一百个匹配项。然而,其中一些行可能具有重复的字段,因此如果引入 unique 子句,服务器必须查看更多行才能找到没有重复值的行。

顺便说一句,你为什么使用 OUTER JOIN?

The reason that the query without the distinct runs faster is because you have a limit clause. Without the distinct, the server only needs to look at the first hundred matches. However however some of those rows may have duplicate fields, so if you introduce the distinct clause, the server has to look at many more rows in order to find ones that do not have duplicate values.

BTW, why are you using OUTER JOIN?

北恋 2024-09-10 07:14:08

这里,对于“rentalsegm2_”表,优化器选择了“index_endDate”索引,并且该表的预期行数约为 45 万行。由于还存在其他where条件,您可以检查“this_”表索引。我的意思是您可以在“this_ table”中检查每个 where 条件影响的记录数量。

总之,您可以通过更改优化器使用的索引来尝试替代解决方案。
这可以通过“USE INDEX”、“FORCE INDEX”命令获得。

谢谢林森克


数据库管理员
www.qburst.com

Here for "rentalsegm2_" table, optimizer has chosen "index_endDate" index and its no of rows expected from this table is about 4.5 lakhs. Since there are other where conditions exist, you can check for "this_" table indexes . I mean you can check in "this_ table" for how much records affected for each where conditions.

In summary, you can try for alternate solutions by changing indices used by optimizer.
This can be obtained by "USE INDEX", "FORCE INDEX" commands.

Thanks

Rinson KE
DBA
www.qburst.com

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文