为什么在连接使用 select max 的列时此 SQL（连接）查询很慢？

发布于 2024-12-25 18:56:31 字数 918 浏览 2 评论 0原文

如果有影响的话，我正在使用 Apache Derby 10.8。

我有一个非常简单的数据库，其中有一个充满项目的表和一个充满这些项目的出价的表。我想选择与其连接的项目的出价最高的每个项目。以下是我的第一次尝试，性能很糟糕：

select
    item.id as item_id,
    item.name as item_name,
    item.retail_value as item_retail_value,
    item.vendor as item_vendor,
    bid.bid_amount as bid_amount,
    bid.bidder_name as bid_bidder_name,
    bid.bidder_phone as bid_bidder_phone,
    bid.operator_name as bid_operator_name
from item
    left outer join bid on bid.item_id = item.id and
    bid.bid_amount = (select max(bid.bid_amount) from bid where bid.item_id = item.id and bid.status = 'OK')

我创建了一组测试数据，使用 282 个项目，每个项目有 200 个出价（总共 56400 个出价）。上述查询运行大约需要 30-40 秒。如果我选择每个项目并手动循环选择每个项目的高出价，则需要不到一秒钟的时间。

我尝试为 bid.bid_amount 和 bid.status 列建立索引，但它没有做任何值得注意的事情。 SQL 不是我最擅长的领域，所以如果有人愿意解释为什么查询如此慢，我将非常感激。

原文

I'm using Apache Derby 10.8 if it makes a difference.

I have a very simple database with a table full of items and a table full of bids on those items. I want to select every item with the highest bid for that item joined to it. The following is my first try at it and the performance is awful:

select
    item.id as item_id,
    item.name as item_name,
    item.retail_value as item_retail_value,
    item.vendor as item_vendor,
    bid.bid_amount as bid_amount,
    bid.bidder_name as bid_bidder_name,
    bid.bidder_phone as bid_bidder_phone,
    bid.operator_name as bid_operator_name
from item
    left outer join bid on bid.item_id = item.id and
    bid.bid_amount = (select max(bid.bid_amount) from bid where bid.item_id = item.id and bid.status = 'OK')

I created a set of test data that uses 282 items with 200 bids for each item (56400 bids total). The above query takes around 30-40 seconds to run. If I select every item and manually loop through the items selecting high bids for each, it takes less than a second.

I've tried indexing the bid.bid_amount and bid.status columns, but it didn't do anything noticeable. SQL isn't my strongest area, so if anyone is willing to explain why that query is so slow I'd really appreciate it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

逆流 2025-01-01 18:56:31

查询速度很慢，因为您正在执行所谓的相关子查询 - 它为每行运行 max。

尝试这样的事情：

select
    item.id as item_id,
    item.name as item_name,
    item.retail_value as item_retail_value,
    item.vendor as item_vendor,
    bid.bid_amount as bid_amount,
    bid.bidder_name as bid_bidder_name,
    bid.bidder_phone as bid_bidder_phone,
    bid.operator_name as bid_operator_name
from 
    item
    left outer join (
        select 
            item_id, 
            MAX(bid_amount) maxamount 
        from 
            bid 
        where 
            status = 'OK' 
        group by 
            item_id
    ) b1 on
        item.id = b1.item_id
    left outer join bid on
        bid.item_id = item.id
        and bid.bid_amount = b1.maxamount

这个子查询只运行一次，而且速度会快得多。

The query's slow because you're doing what's called a correlated subquery--it's running that max for each row.

Try something like this:

select
    item.id as item_id,
    item.name as item_name,
    item.retail_value as item_retail_value,
    item.vendor as item_vendor,
    bid.bid_amount as bid_amount,
    bid.bidder_name as bid_bidder_name,
    bid.bidder_phone as bid_bidder_phone,
    bid.operator_name as bid_operator_name
from 
    item
    left outer join (
        select 
            item_id, 
            MAX(bid_amount) maxamount 
        from 
            bid 
        where 
            status = 'OK' 
        group by 
            item_id
    ) b1 on
        item.id = b1.item_id
    left outer join bid on
        bid.item_id = item.id
        and bid.bid_amount = b1.maxamount

This subquery is only run once, and it will go much faster.

回复收藏 0 原文

蘸点软妹酱 2025-01-01 18:56:31

您创建了同步（或相关）子查询。对外表（项目）的每一行执行子查询。

回复收藏 0 原文

a√萤火虫的光℡ 2025-01-01 18:56:31

问题是您的嵌套子查询在 JOIN 操作的每个步骤上运行。难怪查询性能很差，大概是CPU和磁盘都在努力工作吧！假设您正在尝试获取 items 表中每个项目的最大 OK 出价，您可能需要尝试以下查询：

SELECT I.id AS item_id,
       I.name AS item_name,
       I.retail_value AS item_retail_value,
       I.vendor AS item_vendor,
       B.bid_amount AS bid_amount,
       B.bidder_name AS bid_bidder_name,
       B.bidder_phone AS bid_bidder_phone,
       B.operator_name AS bid_operator_name
FROM item AS I
     LEFT OUTER JOIN (SELECT item_id, MAX(bid_amount) AS bid_amount
                      FROM bid
                      WHERE STATUS = 'OK'
                      GROUP BY item_id) AS _TEMP ON _TEMP.item_id = B.item_id
     LEFT OUTER JOIN bid AS B ON B.item_id = _TEMP.item_id AND B.bid_amount = _TEMP.bid_amount;

Problem is your nested subquery is running on each and every step of the JOIN operation. No wonder query performance is poor, the CPU and disk are probably hard at work! Assuming you are trying to get the maximum OK'd bid for every item in the items table, you might want to try this query:

SELECT I.id AS item_id,
       I.name AS item_name,
       I.retail_value AS item_retail_value,
       I.vendor AS item_vendor,
       B.bid_amount AS bid_amount,
       B.bidder_name AS bid_bidder_name,
       B.bidder_phone AS bid_bidder_phone,
       B.operator_name AS bid_operator_name
FROM item AS I
     LEFT OUTER JOIN (SELECT item_id, MAX(bid_amount) AS bid_amount
                      FROM bid
                      WHERE STATUS = 'OK'
                      GROUP BY item_id) AS _TEMP ON _TEMP.item_id = B.item_id
     LEFT OUTER JOIN bid AS B ON B.item_id = _TEMP.item_id AND B.bid_amount = _TEMP.bid_amount;

回复收藏 0 原文