为什么在连接使用 select max 的列时此 SQL(连接)查询很慢?
如果有影响的话,我正在使用 Apache Derby 10.8。
我有一个非常简单的数据库,其中有一个充满项目的表和一个充满这些项目的出价的表。我想选择与其连接的项目的出价最高的每个项目。以下是我的第一次尝试,性能很糟糕:
select
item.id as item_id,
item.name as item_name,
item.retail_value as item_retail_value,
item.vendor as item_vendor,
bid.bid_amount as bid_amount,
bid.bidder_name as bid_bidder_name,
bid.bidder_phone as bid_bidder_phone,
bid.operator_name as bid_operator_name
from item
left outer join bid on bid.item_id = item.id and
bid.bid_amount = (select max(bid.bid_amount) from bid where bid.item_id = item.id and bid.status = 'OK')
我创建了一组测试数据,使用 282 个项目,每个项目有 200 个出价(总共 56400 个出价)。上述查询运行大约需要 30-40 秒。如果我选择每个项目并手动循环选择每个项目的高出价,则需要不到一秒钟的时间。
我尝试为 bid.bid_amount
和 bid.status
列建立索引,但它没有做任何值得注意的事情。 SQL 不是我最擅长的领域,所以如果有人愿意解释为什么查询如此慢,我将非常感激。
I'm using Apache Derby 10.8 if it makes a difference.
I have a very simple database with a table full of items and a table full of bids on those items. I want to select every item with the highest bid for that item joined to it. The following is my first try at it and the performance is awful:
select
item.id as item_id,
item.name as item_name,
item.retail_value as item_retail_value,
item.vendor as item_vendor,
bid.bid_amount as bid_amount,
bid.bidder_name as bid_bidder_name,
bid.bidder_phone as bid_bidder_phone,
bid.operator_name as bid_operator_name
from item
left outer join bid on bid.item_id = item.id and
bid.bid_amount = (select max(bid.bid_amount) from bid where bid.item_id = item.id and bid.status = 'OK')
I created a set of test data that uses 282 items with 200 bids for each item (56400 bids total). The above query takes around 30-40 seconds to run. If I select every item and manually loop through the items selecting high bids for each, it takes less than a second.
I've tried indexing the bid.bid_amount
and bid.status
columns, but it didn't do anything noticeable. SQL isn't my strongest area, so if anyone is willing to explain why that query is so slow I'd really appreciate it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
查询速度很慢,因为您正在执行所谓的相关子查询 - 它为每行运行
max
。尝试这样的事情:
这个子查询只运行一次,而且速度会快得多。
The query's slow because you're doing what's called a correlated subquery--it's running that
max
for each row.Try something like this:
This subquery is only run once, and it will go much faster.
您创建了同步(或相关)子查询。对外表(项目)的每一行执行子查询。
You created a synchronized (or correlated) subquery. The subquery is executed for every row of the outer table (item).
问题是您的嵌套子查询在 JOIN 操作的每个步骤上运行。难怪查询性能很差,大概是CPU和磁盘都在努力工作吧!假设您正在尝试获取 items 表中每个项目的最大 OK 出价,您可能需要尝试以下查询:
Problem is your nested subquery is running on each and every step of the JOIN operation. No wonder query performance is poor, the CPU and disk are probably hard at work! Assuming you are trying to get the maximum OK'd bid for every item in the items table, you might want to try this query:
您还可以通过对 bid.item_id 应用索引来提高查询性能,因为子查询是根据 item_id 选取记录。
You can also improve the performance of the query by applying indexing on bid.item_id since the sub query is picking records on the basis of item_id.