为什么在连接使用 select max 的列时此 SQL(连接)查询很慢?

发布于 2024-12-25 18:56:31 字数 918 浏览 2 评论 0原文

如果有影响的话,我正在使用 Apache Derby 10.8。

我有一个非常简单的数据库,其中有一个充满项目的表和一个充满这些项目的出价的表。我想选择与其连接的项目的出价最高的每个项目。以下是我的第一次尝试,性能很糟糕:

select
    item.id as item_id,
    item.name as item_name,
    item.retail_value as item_retail_value,
    item.vendor as item_vendor,
    bid.bid_amount as bid_amount,
    bid.bidder_name as bid_bidder_name,
    bid.bidder_phone as bid_bidder_phone,
    bid.operator_name as bid_operator_name
from item
    left outer join bid on bid.item_id = item.id and
    bid.bid_amount = (select max(bid.bid_amount) from bid where bid.item_id = item.id and bid.status = 'OK')

我创建了一组测试数据,使用 282 个项目,每个项目有 200 个出价(总共 56400 个出价)。上述查询运行大约需要 30-40 秒。如果我选择每个项目并手动循环选择每个项目的高出价,则需要不到一秒钟的时间。

我尝试为 bid.bid_amountbid.status 列建立索引,但它没有做任何值得注意的事情。 SQL 不是我最擅长的领域,所以如果有人愿意解释为什么查询如此慢,我将非常感激。

I'm using Apache Derby 10.8 if it makes a difference.

I have a very simple database with a table full of items and a table full of bids on those items. I want to select every item with the highest bid for that item joined to it. The following is my first try at it and the performance is awful:

select
    item.id as item_id,
    item.name as item_name,
    item.retail_value as item_retail_value,
    item.vendor as item_vendor,
    bid.bid_amount as bid_amount,
    bid.bidder_name as bid_bidder_name,
    bid.bidder_phone as bid_bidder_phone,
    bid.operator_name as bid_operator_name
from item
    left outer join bid on bid.item_id = item.id and
    bid.bid_amount = (select max(bid.bid_amount) from bid where bid.item_id = item.id and bid.status = 'OK')

I created a set of test data that uses 282 items with 200 bids for each item (56400 bids total). The above query takes around 30-40 seconds to run. If I select every item and manually loop through the items selecting high bids for each, it takes less than a second.

I've tried indexing the bid.bid_amount and bid.status columns, but it didn't do anything noticeable. SQL isn't my strongest area, so if anyone is willing to explain why that query is so slow I'd really appreciate it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

逆流 2025-01-01 18:56:31

查询速度很慢,因为您正在执行所谓的相关子查询 - 它为每行运行 max

尝试这样的事情:

select
    item.id as item_id,
    item.name as item_name,
    item.retail_value as item_retail_value,
    item.vendor as item_vendor,
    bid.bid_amount as bid_amount,
    bid.bidder_name as bid_bidder_name,
    bid.bidder_phone as bid_bidder_phone,
    bid.operator_name as bid_operator_name
from 
    item
    left outer join (
        select 
            item_id, 
            MAX(bid_amount) maxamount 
        from 
            bid 
        where 
            status = 'OK' 
        group by 
            item_id
    ) b1 on
        item.id = b1.item_id
    left outer join bid on
        bid.item_id = item.id
        and bid.bid_amount = b1.maxamount

这个子查询只运行一次,而且速度会快得多。

The query's slow because you're doing what's called a correlated subquery--it's running that max for each row.

Try something like this:

select
    item.id as item_id,
    item.name as item_name,
    item.retail_value as item_retail_value,
    item.vendor as item_vendor,
    bid.bid_amount as bid_amount,
    bid.bidder_name as bid_bidder_name,
    bid.bidder_phone as bid_bidder_phone,
    bid.operator_name as bid_operator_name
from 
    item
    left outer join (
        select 
            item_id, 
            MAX(bid_amount) maxamount 
        from 
            bid 
        where 
            status = 'OK' 
        group by 
            item_id
    ) b1 on
        item.id = b1.item_id
    left outer join bid on
        bid.item_id = item.id
        and bid.bid_amount = b1.maxamount

This subquery is only run once, and it will go much faster.

蘸点软妹酱 2025-01-01 18:56:31

您创建了同步(或相关)子查询。对外表(项目)的每一行执行子查询。

You created a synchronized (or correlated) subquery. The subquery is executed for every row of the outer table (item).

a√萤火虫的光℡ 2025-01-01 18:56:31

问题是您的嵌套子查询在 JOIN 操作的每个步骤上运行。难怪查询性能很差,大概是CPU和磁盘都在努力工作吧!假设您正在尝试获取 items 表中每个项目的最大 OK 出价,您可能需要尝试以下查询:

SELECT I.id AS item_id,
       I.name AS item_name,
       I.retail_value AS item_retail_value,
       I.vendor AS item_vendor,
       B.bid_amount AS bid_amount,
       B.bidder_name AS bid_bidder_name,
       B.bidder_phone AS bid_bidder_phone,
       B.operator_name AS bid_operator_name
FROM item AS I
     LEFT OUTER JOIN (SELECT item_id, MAX(bid_amount) AS bid_amount
                      FROM bid
                      WHERE STATUS = 'OK'
                      GROUP BY item_id) AS _TEMP ON _TEMP.item_id = B.item_id
     LEFT OUTER JOIN bid AS B ON B.item_id = _TEMP.item_id AND B.bid_amount = _TEMP.bid_amount;

Problem is your nested subquery is running on each and every step of the JOIN operation. No wonder query performance is poor, the CPU and disk are probably hard at work! Assuming you are trying to get the maximum OK'd bid for every item in the items table, you might want to try this query:

SELECT I.id AS item_id,
       I.name AS item_name,
       I.retail_value AS item_retail_value,
       I.vendor AS item_vendor,
       B.bid_amount AS bid_amount,
       B.bidder_name AS bid_bidder_name,
       B.bidder_phone AS bid_bidder_phone,
       B.operator_name AS bid_operator_name
FROM item AS I
     LEFT OUTER JOIN (SELECT item_id, MAX(bid_amount) AS bid_amount
                      FROM bid
                      WHERE STATUS = 'OK'
                      GROUP BY item_id) AS _TEMP ON _TEMP.item_id = B.item_id
     LEFT OUTER JOIN bid AS B ON B.item_id = _TEMP.item_id AND B.bid_amount = _TEMP.bid_amount;
冷默言语 2025-01-01 18:56:31

您还可以通过对 bid.item_id 应用索引来提高查询性能,因为子查询是根据 item_id 选取记录。

You can also improve the performance of the query by applying indexing on bid.item_id since the sub query is picking records on the basis of item_id.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文