WHERE子句包含IN（子查询）的查询性能

发布于 2024-11-06 19:00:35 字数 1453 浏览 0 评论 0原文

SELECT Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate, 
       SUM(TradeLine.Notional) / 1000 AS Expr1
FROM   Trade INNER JOIN
             TradeLine ON Trade.TradeId = TradeLine.TradeId
WHERE  (TradeLine.Id IN
                      (SELECT     PairOffId
                        FROM          TradeLine AS TradeLine_1
                        WHERE      (TradeDate <= '2011-05-11')
                        GROUP BY PairOffId
                        HAVING      (SUM(Notional) <> 0)))
GROUP BY Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate
ORDER BY Trade.Type, Trade.TradeDate

当表开始增长时，我担心 WHERE 子句中 IN 的性能。对于这种查询，有人有更好的策略吗？子查询返回的记录数的增长速度比 TradeLine 表中的记录数增长慢得多。 TradeLine 表本身以 10/天的速度增长。

谢谢。

编辑：我使用了将子查询从 WHERE 移至 FROM 的想法。我对促成这个新查询的所有答案都投票了。

   SELECT Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate,   
          PairOff.Notional / 1000 AS Expr1
   FROM         Trade INNER JOIN
                  TradeLine ON Trade.TradeId = TradeLine.TradeId INNER JOIN
                      (SELECT     PairOffId, SUM(Notional) AS Notional
                        FROM          TradeLine AS TradeLine_1
                        WHERE      (TradeDate <= '2011-05-11')
                        GROUP BY PairOffId
                   HAVING (SUM(Notional) <> 0)) AS PairOff ON TradeLine.Id = PairOff.PairOffId
   ORDER BY Trade.Type, Trade.TradeDate

原文

SELECT Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate, 
       SUM(TradeLine.Notional) / 1000 AS Expr1
FROM   Trade INNER JOIN
             TradeLine ON Trade.TradeId = TradeLine.TradeId
WHERE  (TradeLine.Id IN
                      (SELECT     PairOffId
                        FROM          TradeLine AS TradeLine_1
                        WHERE      (TradeDate <= '2011-05-11')
                        GROUP BY PairOffId
                        HAVING      (SUM(Notional) <> 0)))
GROUP BY Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate
ORDER BY Trade.Type, Trade.TradeDate

I am concerned about the performance of the IN in the WHERE clause when the table starts to grow. Does anyone have a better strategy for this kind of query? The number of records returned by the subquery grows much slower than the number of records in the TradeLine table. The TradeLine table itself grows at a rate of 10/day.

Thank you.

EDIT:
I used the idea of moving the subquery from WHERE to FROM. I voted up on all answers that contributed to this new query.

   SELECT Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate,   
          PairOff.Notional / 1000 AS Expr1
   FROM         Trade INNER JOIN
                  TradeLine ON Trade.TradeId = TradeLine.TradeId INNER JOIN
                      (SELECT     PairOffId, SUM(Notional) AS Notional
                        FROM          TradeLine AS TradeLine_1
                        WHERE      (TradeDate <= '2011-05-11')
                        GROUP BY PairOffId
                   HAVING (SUM(Notional) <> 0)) AS PairOff ON TradeLine.Id = PairOff.PairOffId
   ORDER BY Trade.Type, Trade.TradeDate

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

幽蝶幻影 2024-11-13 19:00:36

我在 XXXXXX 数据库中的数十万条记录中遇到了同样的问题。在我的代码中，我想从所有节点检索层次结构（至少包含一个子节点）节点。

写入的初始查询非常慢。

  SELECT SUPPLIER_ID, PARENT_SUPPLIER_ID,
  FROM SUPPLIER
  WHERE 
    SUPPLIER_ID != PARENT_SUPPLIER_ID
    OR 
    SUPPLIER_ID   IN
      (SELECT DISTINCT PARENT_SUPPLIER_ID
       FROM SUPPLIER
       WHERE SUPPLIER_ID != PARENT_SUPPLIER_ID
      );

然后重新写入

  SELECT a.SUPPLIER_ID, a.PARENT_SUPPLIER_ID,
  FROM SUPPLIER a
  LEFT JOIN
  (SELECT DISTINCT PARENT_SUPPLIER_ID
  FROM SUPPLIER
  WHERE SUPPLIER_ID != PARENT_SUPPLIER_ID
  ) b
  ON a. SUPPLIER_ID     = b.PARENT_SUPPLIER_ID
  WHERE a. SUPPLIER_ID != a.PARENT_SUPPLIER_ID
     OR a. SUPPLIER_ID     = b.PARENT_SUPPLIER_ID;

I have faced same issue with hundreds of thousands of records in XXXXXX DB. In my code i want to retrieve hierarchy (node which contain at least one child) nodes from all nodes.

The initial query written which is very slow.

  SELECT SUPPLIER_ID, PARENT_SUPPLIER_ID,
  FROM SUPPLIER
  WHERE 
    SUPPLIER_ID != PARENT_SUPPLIER_ID
    OR 
    SUPPLIER_ID   IN
      (SELECT DISTINCT PARENT_SUPPLIER_ID
       FROM SUPPLIER
       WHERE SUPPLIER_ID != PARENT_SUPPLIER_ID
      );

Then re-written to

  SELECT a.SUPPLIER_ID, a.PARENT_SUPPLIER_ID,
  FROM SUPPLIER a
  LEFT JOIN
  (SELECT DISTINCT PARENT_SUPPLIER_ID
  FROM SUPPLIER
  WHERE SUPPLIER_ID != PARENT_SUPPLIER_ID
  ) b
  ON a. SUPPLIER_ID     = b.PARENT_SUPPLIER_ID
  WHERE a. SUPPLIER_ID != a.PARENT_SUPPLIER_ID
     OR a. SUPPLIER_ID     = b.PARENT_SUPPLIER_ID;

回复收藏 0 原文

昵称有卵用 2024-11-13 19:00:36

使用 IN 实际上会强制您进行表扫描。当表增长时，执行时间也会增长。此外，您还对返回的每条记录运行该查询。使用标量选择作为表会更容易：

SELECT t.TradeId, t.Type, t.Symbol, t.TradeDate, 
       SUM(TradeLine.Notional) / 1000 AS Expr1
FROM   Trade t,
(SELECT     TradeId, PairOffID
                        FROM          TradeLine AS TradeLine_1
                        WHERE      (TradeDate <= '2011-05-11')
                        GROUP BY PairOffId
                        HAVING      (SUM(Notional) <> 0)) tl       
WHERE  t.TradeId = tl.TradeId
  and  t.id <> tl.PairOffID
GROUP BY Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate
ORDER BY Trade.Type, Trade.TradeDate

Using an IN will essentially force you to do a table scan. When your table grows, your execution time grows. Also you are running that query for each record returned. Would be easier to use a scalar select as a table:

SELECT t.TradeId, t.Type, t.Symbol, t.TradeDate, 
       SUM(TradeLine.Notional) / 1000 AS Expr1
FROM   Trade t,
(SELECT     TradeId, PairOffID
                        FROM          TradeLine AS TradeLine_1
                        WHERE      (TradeDate <= '2011-05-11')
                        GROUP BY PairOffId
                        HAVING      (SUM(Notional) <> 0)) tl       
WHERE  t.TradeId = tl.TradeId
  and  t.id <> tl.PairOffID
GROUP BY Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate
ORDER BY Trade.Type, Trade.TradeDate

回复收藏 0 原文

不再见 2024-11-13 19:00:35

IN 子句中的子查询不依赖于外部查询中的任何内容。您可以安全地将其移至 FROM 子句中；一个理智的查询计划构建器会自动完成它。

此外，必须对要在生产中使用的任何查询调用 EXPLAIN PLAN。执行此操作并查看 DBMS 对此查询计划的看法。

回复收藏 0 原文

小嗷兮 2024-11-13 19:00:35

当子查询开始返回太大的结果集时，我很喜欢临时表。

因此，您的 where 子句将

Where TradeLine.Id In (Select PairOffId From #tempResults)

被定义为 #tempResults （警告：语法来自内存，这意味着可能存在错误）

Select PairOffId Into #tempResults
From TradeLine
Where (TradeDate <= @TradeDate) 
  //I prefer params in case the query becomes a StoredProc
Group By PairOffId
Having (Sum(Notional) <> 0)

I'm a fan of temp tables when a sub-query starts returning too large a result set.

So your where clause would just be

Where TradeLine.Id In (Select PairOffId From #tempResults)

and #tempResults would be defined as (warning: syntax is from memory, which means there may be errors)

Select PairOffId Into #tempResults
From TradeLine
Where (TradeDate <= @TradeDate) 
  //I prefer params in case the query becomes a StoredProc
Group By PairOffId
Having (Sum(Notional) <> 0)

回复收藏 0 原文