MySQL 连接优化:改进派生表和 GROUP BY 的连接类型
我正在尝试改进执行以下操作的查询:
对于每项工作,将所有成本相加,将发票金额相加,并计算利润/损失。成本来自几个不同的表,例如采购订单、用户事件(工程师分配的时间/他在现场花费的时间)、使用的库存等。
该查询还需要输出一些其他列,例如工作站点的名称,以便可以对该列进行排序(在所有这些之后附加 ORDER BY)。
SELECT
jobs.job_id,
jobs.start_date,
jobs.end_date,
events.time,
sites.name site,
IFNULL(stock_cost,0) stock_cost,
labour,
materials,
labour+materials+plant+expenses revenue,
(labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) profit,
((labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)))/(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) ratio
FROM
jobs
LEFT JOIN (
SELECT
job_id,
SUM(labour_charge) labour,
SUM(materials_charge) materials,
SUM(plant_hire_charge) plant,
SUM(expenses) expenses
FROM invoices
GROUP BY job_id
ORDER BY NULL
) invoices USING(job_id)
LEFT JOIN (
SELECT
job_id,
SUM(IF(start_onsite && end_onsite,end_onsite-start_onsite,end-start)) time,
SUM(travel+parking+materials) user_expenses
FROM users_events
WHERE type='job'
GROUP BY job_id
ORDER BY NULL
) events USING(job_id)
LEFT JOIN (
SELECT
job_id,
SUM(IFNULL(total,0))*0.01 orders_cost
FROM purchaseorders
GROUP BY job_id
ORDER BY NULL
) purchaseorders USING(job_id)
LEFT JOIN (
SELECT
location job_id,
SUM(amount*cost))*0.01 stock_cost
FROM stock_location
LEFT JOIN stock_items ON stock_items.id=stock_location.stock_id
WHERE location>=3000 AND amount>0 AND cost>0
GROUP BY location
ORDER BY NULL
) stock USING(job_id)
LEFT JOIN contacts_sites sites ON sites.id=jobs.site_id;
我读到了这个: http://dev.mysql.com /doc/refman/5.0/en/group-by-optimization.html 但不知道如何/是否可以应用其中的任何内容。 出于测试目的,我尝试在左侧、右侧和中心字段上添加各种索引,但对 EXPLAIN 输出没有任何改进:
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| 1 | PRIMARY | jobs | ALL | NULL | NULL | NULL | NULL | 7088 | |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5038 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 6476 | |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 904 | |
| 1 | PRIMARY | <derived5> | ALL | NULL | NULL | NULL | NULL | 531 | |
| 1 | PRIMARY | sites | eq_ref | PRIMARY | PRIMARY | 4 | bestbee_db.jobs.site_id | 1 | |
| 5 | DERIVED | stock_location | ALL | stock,location,amount,…| NULL | NULL | NULL | 5426 | Using where; Using temporary; |
| 5 | DERIVED | stock_items | eq_ref | PRIMARY | PRIMARY | 4 | bestbee_db.stock_location.stock_id | 1 | Using where |
| 4 | DERIVED | purchaseorders | ALL | NULL | NULL | NULL | NULL | 1445 | Using temporary; |
| 3 | DERIVED | users_events | ALL | type,type_job | NULL | NULL | NULL | 11295 | Using where; Using temporary; |
| 2 | DERIVED | invoices | ALL | NULL | NULL | NULL | NULL | 5320 | Using temporary; |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
生成的行数为 5 x 10^21(低于我开始优化此查询之前的 3 x 10^42) !)
目前执行需要 7 秒(从 26 秒减少),但我希望能低于 1 秒。
顺便说一句:GROUP BY x ORDER BY NULL 是从子查询中消除不必要的文件排序的好方法! (来自 http://www.mysqlperformanceblog.com/ 2006/09/04/group_concat-useful-group-by-extension/)
I am trying to improve a query which does the following:
For every job, add up all the costs, add up the invoiced amount, and calculate a profit/loss. The costs come from several different tables, e.g. purchaseorders, users_events (engineer allocated time/time he spent on site), stock used etc.
The query also needs to output some other columns like the name of the site for the work, so that that column can be sorted by (an ORDER BY is appended after all of this).
SELECT
jobs.job_id,
jobs.start_date,
jobs.end_date,
events.time,
sites.name site,
IFNULL(stock_cost,0) stock_cost,
labour,
materials,
labour+materials+plant+expenses revenue,
(labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) profit,
((labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)))/(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) ratio
FROM
jobs
LEFT JOIN (
SELECT
job_id,
SUM(labour_charge) labour,
SUM(materials_charge) materials,
SUM(plant_hire_charge) plant,
SUM(expenses) expenses
FROM invoices
GROUP BY job_id
ORDER BY NULL
) invoices USING(job_id)
LEFT JOIN (
SELECT
job_id,
SUM(IF(start_onsite && end_onsite,end_onsite-start_onsite,end-start)) time,
SUM(travel+parking+materials) user_expenses
FROM users_events
WHERE type='job'
GROUP BY job_id
ORDER BY NULL
) events USING(job_id)
LEFT JOIN (
SELECT
job_id,
SUM(IFNULL(total,0))*0.01 orders_cost
FROM purchaseorders
GROUP BY job_id
ORDER BY NULL
) purchaseorders USING(job_id)
LEFT JOIN (
SELECT
location job_id,
SUM(amount*cost))*0.01 stock_cost
FROM stock_location
LEFT JOIN stock_items ON stock_items.id=stock_location.stock_id
WHERE location>=3000 AND amount>0 AND cost>0
GROUP BY location
ORDER BY NULL
) stock USING(job_id)
LEFT JOIN contacts_sites sites ON sites.id=jobs.site_id;
I read this: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html but don't see how/if I can apply anything therein.
For testing purposes, I have tried adding all sorts of indices on fields left, right and centre with no improvement to the EXPLAIN output:
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| 1 | PRIMARY | jobs | ALL | NULL | NULL | NULL | NULL | 7088 | |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5038 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 6476 | |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 904 | |
| 1 | PRIMARY | <derived5> | ALL | NULL | NULL | NULL | NULL | 531 | |
| 1 | PRIMARY | sites | eq_ref | PRIMARY | PRIMARY | 4 | bestbee_db.jobs.site_id | 1 | |
| 5 | DERIVED | stock_location | ALL | stock,location,amount,…| NULL | NULL | NULL | 5426 | Using where; Using temporary; |
| 5 | DERIVED | stock_items | eq_ref | PRIMARY | PRIMARY | 4 | bestbee_db.stock_location.stock_id | 1 | Using where |
| 4 | DERIVED | purchaseorders | ALL | NULL | NULL | NULL | NULL | 1445 | Using temporary; |
| 3 | DERIVED | users_events | ALL | type,type_job | NULL | NULL | NULL | 11295 | Using where; Using temporary; |
| 2 | DERIVED | invoices | ALL | NULL | NULL | NULL | NULL | 5320 | Using temporary; |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
The rows produced is 5 x 10^21 (down from 3 x 10^42 before I started optimising this query!)
It currently takes seven seconds to execute (down from 26) but I would like that to be under one second.
By the way: GROUP BY x ORDER BY NULL is a great way to eliminate unnecessary filesorts from subqueries! (from http://www.mysqlperformanceblog.com/2006/09/04/group_concat-useful-group-by-extension/)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据您对我的问题的评论,我会执行以下操作...
在最顶部...
选择 STRAIGHT_JOIN (只需添加“STRAIGH_JOIN”关键字)
然后,对于发票、事件、采购订单的每个子查询,等等,将 ORDER BY 显式更改为 JOB_ID,这样可能有助于针对主 JOBS 表连接进行优化。
最后,确保每个子查询表在 Job_ID 上都有一个索引(Invoices、User_events、PurchaseOrders、Stock_Location)
此外,对于 Stock_Location 表,您可能希望通过在
上创建复合索引来帮助子查询的 WHERE 子句
(job_id, location, amount) 即使您有键加上 3 个 where 条件元素,三个字段的深度也应该足够了。
Based on your comment to my question, I would do the following...
At the very top...
SELECT STRAIGHT_JOIN (just add the "STRAIGH_JOIN" keyword)
Then, for each of your subqueries for invoices, events, p/o's, etc, change the ORDER BY to the JOB_ID explicitly so it might help the optimization against the primary JOBS table join.
Finally, ensure each of your subquery tables HAS an index on the Job_ID (Invoices, User_events, PurchaseOrders, Stock_Location)
Additionally, for the Stock_Location table, you might want to help the WHERE clause for your subquery by having a compound index on
(job_id, location, amount) Three fields deep should be enough even though you have the key plus 3 where condition elements.