MySQL 连接优化:改进派生表和 GROUP BY 的连接类型

发布于 2024-11-16 13:34:59 字数 5138 浏览 5 评论 0原文

我正在尝试改进执行以下操作的查询:

对于每项工作,将所有成本相加,将发票金额相加,并计算利润/损失。成本来自几个不同的表,例如采购订单、用户事件(工程师分配的时间/他在现场花费的时间)、使用的库存等。
该查询还需要输出一些其他列,例如工作站点的名称,以便可以对该列进行排序(在所有这些之后附加 ORDER BY)。

SELECT
    jobs.job_id,
    jobs.start_date,
    jobs.end_date,
    events.time,
    sites.name site,
    IFNULL(stock_cost,0) stock_cost,
    labour,
    materials,
    labour+materials+plant+expenses revenue,
    (labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) profit,
    ((labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)))/(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) ratio

FROM
    jobs

    LEFT JOIN (
        SELECT
            job_id,
            SUM(labour_charge) labour,
            SUM(materials_charge) materials,
            SUM(plant_hire_charge) plant,
            SUM(expenses) expenses
        FROM invoices
        GROUP BY job_id
        ORDER BY NULL
    ) invoices USING(job_id)

    LEFT JOIN (
        SELECT
            job_id,
            SUM(IF(start_onsite && end_onsite,end_onsite-start_onsite,end-start)) time,
            SUM(travel+parking+materials) user_expenses
        FROM users_events
        WHERE type='job'
        GROUP BY job_id
        ORDER BY NULL
    ) events USING(job_id)

    LEFT JOIN (
        SELECT
            job_id,
            SUM(IFNULL(total,0))*0.01 orders_cost
        FROM purchaseorders
        GROUP BY job_id
        ORDER BY NULL
    ) purchaseorders USING(job_id)

    LEFT JOIN (
        SELECT
            location job_id,
            SUM(amount*cost))*0.01 stock_cost
        FROM stock_location
        LEFT JOIN stock_items ON stock_items.id=stock_location.stock_id
        WHERE location>=3000 AND amount>0 AND cost>0
        GROUP BY location
        ORDER BY NULL
    ) stock USING(job_id)

    LEFT JOIN contacts_sites sites ON sites.id=jobs.site_id;

我读到了这个: http://dev.mysql.com /doc/refman/5.0/en/group-by-optimization.html 但不知道如何/是否可以应用其中的任何内容。 出于测试目的,我尝试在左侧、右侧和中心字段上添加各种索引,但对 EXPLAIN 输出没有任何改进:

+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| id | select_type | table          | type   | possible_keys          | key     | key_len | ref                                | rows  | Extra                         |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
|  1 | PRIMARY     | jobs           | ALL    | NULL                   | NULL    | NULL    | NULL                               |  7088 |                               |
|  1 | PRIMARY     | <derived2>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |  5038 |                               |
|  1 | PRIMARY     | <derived3>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |  6476 |                               |
|  1 | PRIMARY     | <derived4>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |   904 |                               |
|  1 | PRIMARY     | <derived5>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |   531 |                               |
|  1 | PRIMARY     | sites          | eq_ref | PRIMARY                | PRIMARY | 4       | bestbee_db.jobs.site_id            |     1 |                               |
|  5 | DERIVED     | stock_location | ALL    | stock,location,amount,…| NULL    | NULL    | NULL                               |  5426 | Using where; Using temporary; |
|  5 | DERIVED     | stock_items    | eq_ref | PRIMARY                | PRIMARY | 4       | bestbee_db.stock_location.stock_id |     1 | Using where                   |
|  4 | DERIVED     | purchaseorders | ALL    | NULL                   | NULL    | NULL    | NULL                               |  1445 | Using temporary;              |
|  3 | DERIVED     | users_events   | ALL    | type,type_job          | NULL    | NULL    | NULL                               | 11295 | Using where; Using temporary; |
|  2 | DERIVED     | invoices       | ALL    | NULL                   | NULL    | NULL    | NULL                               |  5320 | Using temporary;              |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+

生成的行数为 5 x 10^21(低于我开始优化此查询之前的 3 x 10^42) !)
目前执行需要 7 秒(从 26 秒减少),但我希望能低于 1 秒。

顺便说一句:GROUP BY x ORDER BY NULL 是从子查询中消除不必要的文件排序的好方法! (来自 http://www.mysqlperformanceblog.com/ 2006/09/04/group_concat-useful-group-by-extension/)

I am trying to improve a query which does the following:

For every job, add up all the costs, add up the invoiced amount, and calculate a profit/loss. The costs come from several different tables, e.g. purchaseorders, users_events (engineer allocated time/time he spent on site), stock used etc.
The query also needs to output some other columns like the name of the site for the work, so that that column can be sorted by (an ORDER BY is appended after all of this).

SELECT
    jobs.job_id,
    jobs.start_date,
    jobs.end_date,
    events.time,
    sites.name site,
    IFNULL(stock_cost,0) stock_cost,
    labour,
    materials,
    labour+materials+plant+expenses revenue,
    (labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) profit,
    ((labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)))/(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) ratio

FROM
    jobs

    LEFT JOIN (
        SELECT
            job_id,
            SUM(labour_charge) labour,
            SUM(materials_charge) materials,
            SUM(plant_hire_charge) plant,
            SUM(expenses) expenses
        FROM invoices
        GROUP BY job_id
        ORDER BY NULL
    ) invoices USING(job_id)

    LEFT JOIN (
        SELECT
            job_id,
            SUM(IF(start_onsite && end_onsite,end_onsite-start_onsite,end-start)) time,
            SUM(travel+parking+materials) user_expenses
        FROM users_events
        WHERE type='job'
        GROUP BY job_id
        ORDER BY NULL
    ) events USING(job_id)

    LEFT JOIN (
        SELECT
            job_id,
            SUM(IFNULL(total,0))*0.01 orders_cost
        FROM purchaseorders
        GROUP BY job_id
        ORDER BY NULL
    ) purchaseorders USING(job_id)

    LEFT JOIN (
        SELECT
            location job_id,
            SUM(amount*cost))*0.01 stock_cost
        FROM stock_location
        LEFT JOIN stock_items ON stock_items.id=stock_location.stock_id
        WHERE location>=3000 AND amount>0 AND cost>0
        GROUP BY location
        ORDER BY NULL
    ) stock USING(job_id)

    LEFT JOIN contacts_sites sites ON sites.id=jobs.site_id;

I read this: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html but don't see how/if I can apply anything therein.
For testing purposes, I have tried adding all sorts of indices on fields left, right and centre with no improvement to the EXPLAIN output:

+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| id | select_type | table          | type   | possible_keys          | key     | key_len | ref                                | rows  | Extra                         |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
|  1 | PRIMARY     | jobs           | ALL    | NULL                   | NULL    | NULL    | NULL                               |  7088 |                               |
|  1 | PRIMARY     | <derived2>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |  5038 |                               |
|  1 | PRIMARY     | <derived3>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |  6476 |                               |
|  1 | PRIMARY     | <derived4>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |   904 |                               |
|  1 | PRIMARY     | <derived5>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |   531 |                               |
|  1 | PRIMARY     | sites          | eq_ref | PRIMARY                | PRIMARY | 4       | bestbee_db.jobs.site_id            |     1 |                               |
|  5 | DERIVED     | stock_location | ALL    | stock,location,amount,…| NULL    | NULL    | NULL                               |  5426 | Using where; Using temporary; |
|  5 | DERIVED     | stock_items    | eq_ref | PRIMARY                | PRIMARY | 4       | bestbee_db.stock_location.stock_id |     1 | Using where                   |
|  4 | DERIVED     | purchaseorders | ALL    | NULL                   | NULL    | NULL    | NULL                               |  1445 | Using temporary;              |
|  3 | DERIVED     | users_events   | ALL    | type,type_job          | NULL    | NULL    | NULL                               | 11295 | Using where; Using temporary; |
|  2 | DERIVED     | invoices       | ALL    | NULL                   | NULL    | NULL    | NULL                               |  5320 | Using temporary;              |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+

The rows produced is 5 x 10^21 (down from 3 x 10^42 before I started optimising this query!)
It currently takes seven seconds to execute (down from 26) but I would like that to be under one second.

By the way: GROUP BY x ORDER BY NULL is a great way to eliminate unnecessary filesorts from subqueries! (from http://www.mysqlperformanceblog.com/2006/09/04/group_concat-useful-group-by-extension/)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

清浅ˋ旧时光 2024-11-23 13:34:59

根据您对我的问题的评论,我会执行以下操作...

在最顶部...

选择 STRAIGHT_JOIN (只需添加“STRAIGH_JOIN”关键字)

然后,对于发票、事件、采购订单的每个子查询,等等,将 ORDER BY 显式更改为 JOB_ID,这样可能有助于针对主 JOBS 表连接进行优化。

最后,确保每个子查询表在 Job_ID 上都有一个索引(Invoices、User_events、PurchaseOrders、Stock_Location)

此外,对于 Stock_Location 表,您可能希望通过在
上创建复合索引来帮助子查询的 WHERE 子句
(job_id, location, amount) 即使您有键加上 3 个 where 条件元素,三个字段的深度也应该足够了。

Based on your comment to my question, I would do the following...

At the very top...

SELECT STRAIGHT_JOIN (just add the "STRAIGH_JOIN" keyword)

Then, for each of your subqueries for invoices, events, p/o's, etc, change the ORDER BY to the JOB_ID explicitly so it might help the optimization against the primary JOBS table join.

Finally, ensure each of your subquery tables HAS an index on the Job_ID (Invoices, User_events, PurchaseOrders, Stock_Location)

Additionally, for the Stock_Location table, you might want to help the WHERE clause for your subquery by having a compound index on
(job_id, location, amount) Three fields deep should be enough even though you have the key plus 3 where condition elements.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文