WHERE子句包含IN(子查询)的查询性能
SELECT Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate,
SUM(TradeLine.Notional) / 1000 AS Expr1
FROM Trade INNER JOIN
TradeLine ON Trade.TradeId = TradeLine.TradeId
WHERE (TradeLine.Id IN
(SELECT PairOffId
FROM TradeLine AS TradeLine_1
WHERE (TradeDate <= '2011-05-11')
GROUP BY PairOffId
HAVING (SUM(Notional) <> 0)))
GROUP BY Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate
ORDER BY Trade.Type, Trade.TradeDate
当表开始增长时,我担心 WHERE 子句中 IN 的性能。对于这种查询,有人有更好的策略吗?子查询返回的记录数的增长速度比 TradeLine 表中的记录数增长慢得多。 TradeLine 表本身以 10/天的速度增长。
谢谢。
编辑: 我使用了将子查询从 WHERE 移至 FROM 的想法。我对促成这个新查询的所有答案都投票了。
SELECT Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate,
PairOff.Notional / 1000 AS Expr1
FROM Trade INNER JOIN
TradeLine ON Trade.TradeId = TradeLine.TradeId INNER JOIN
(SELECT PairOffId, SUM(Notional) AS Notional
FROM TradeLine AS TradeLine_1
WHERE (TradeDate <= '2011-05-11')
GROUP BY PairOffId
HAVING (SUM(Notional) <> 0)) AS PairOff ON TradeLine.Id = PairOff.PairOffId
ORDER BY Trade.Type, Trade.TradeDate
SELECT Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate,
SUM(TradeLine.Notional) / 1000 AS Expr1
FROM Trade INNER JOIN
TradeLine ON Trade.TradeId = TradeLine.TradeId
WHERE (TradeLine.Id IN
(SELECT PairOffId
FROM TradeLine AS TradeLine_1
WHERE (TradeDate <= '2011-05-11')
GROUP BY PairOffId
HAVING (SUM(Notional) <> 0)))
GROUP BY Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate
ORDER BY Trade.Type, Trade.TradeDate
I am concerned about the performance of the IN in the WHERE clause when the table starts to grow. Does anyone have a better strategy for this kind of query? The number of records returned by the subquery grows much slower than the number of records in the TradeLine table. The TradeLine table itself grows at a rate of 10/day.
Thank you.
EDIT:
I used the idea of moving the subquery from WHERE to FROM. I voted up on all answers that contributed to this new query.
SELECT Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate,
PairOff.Notional / 1000 AS Expr1
FROM Trade INNER JOIN
TradeLine ON Trade.TradeId = TradeLine.TradeId INNER JOIN
(SELECT PairOffId, SUM(Notional) AS Notional
FROM TradeLine AS TradeLine_1
WHERE (TradeDate <= '2011-05-11')
GROUP BY PairOffId
HAVING (SUM(Notional) <> 0)) AS PairOff ON TradeLine.Id = PairOff.PairOffId
ORDER BY Trade.Type, Trade.TradeDate
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我在 XXXXXX 数据库中的数十万条记录中遇到了同样的问题。在我的代码中,我想从所有节点检索层次结构(至少包含一个子节点)节点。
写入的初始查询非常慢。
然后重新写入
I have faced same issue with hundreds of thousands of records in XXXXXX DB. In my code i want to retrieve hierarchy (node which contain at least one child) nodes from all nodes.
The initial query written which is very slow.
Then re-written to
使用 IN 实际上会强制您进行表扫描。当表增长时,执行时间也会增长。此外,您还对返回的每条记录运行该查询。使用标量选择作为表会更容易:
Using an IN will essentially force you to do a table scan. When your table grows, your execution time grows. Also you are running that query for each record returned. Would be easier to use a scalar select as a table:
IN
子句中的子查询不依赖于外部查询中的任何内容。您可以安全地将其移至FROM
子句中;一个理智的查询计划构建器会自动完成它。此外,必须对要在生产中使用的任何查询调用
EXPLAIN PLAN
。执行此操作并查看 DBMS 对此查询计划的看法。The subquery in the
IN
clause does not depend on anything in the outer query. You can safely move it intoFROM
clause; a sane query plan builder would do it automatically.Also, calling
EXPLAIN PLAN
on any query you're going to use in production is a must. Do it and see what the DBMS thinks of the plan for this query.当子查询开始返回太大的结果集时,我很喜欢临时表。
因此,您的
where
子句将被定义为
#tempResults
(警告:语法来自内存,这意味着可能存在错误)I'm a fan of temp tables when a sub-query starts returning too large a result set.
So your
where
clause would just beand
#tempResults
would be defined as (warning: syntax is from memory, which means there may be errors)我有两个建议你可以尝试:
1)。使用 Exists 因为您不需要从子查询获取数据,如下所示:
2).例如,主查询连接到您的子查询
我相信这两种方式可以实现比“In”操作更好的性能。
I have 2 suggestion you can try:
1). use Exists since you don't need get data from subquery, like this:
2). main query join to your subquery, for instance
I believe these 2 ways can achieve better performance than "In" operation.