WHERE 和 JOIN 操作顺序

发布于 2024-09-27 19:07:06 字数 795 浏览 11 评论 0原文

我的问题与此SQL操作顺序类似，但有一点扭曲，所以我认为它是公平地问。

我正在使用 Teradata。我有 2 个表：table1、table2。

table1 只有一个 id 列。
table2 有以下列：id, val

我可能是错的，但我认为这两个语句给出相同的结果。

语句 1.

SELECT table1.id, table2.val
FROM table1
INNER  JOIN table2
ON table1.id = table2.id
WHERE table2.val<100

语句 2。

SELECT table1.id, table3.val
FROM table1
INNER JOIN (
    SELECT *
    FROM table2
    WHERE val<100
)  table3
ON table1.id=table3.id

我的问题是，查询优化器是否足够智能
- 首先执行 WHERE 子句，然后在语句 1 中执行 JOIN
- 知道语句 2 中实际上并不需要表 3

我对 SQL 还很陌生，所以如果我有任何误解，请告诉我。

原文

My question is similar to this SQL order of operations but with a little twist, so I think it's fair to ask.

I'm using Teradata. And I have 2 tables: table1, table2.

table1 has only an id column.
table2 has the following columns: id, val

I might be wrong but I think these two statements give the same results.

Statement 1.

SELECT table1.id, table2.val
FROM table1
INNER  JOIN table2
ON table1.id = table2.id
WHERE table2.val<100

Statement 2.

SELECT table1.id, table3.val
FROM table1
INNER JOIN (
    SELECT *
    FROM table2
    WHERE val<100
)  table3
ON table1.id=table3.id

My questions is, will the query optimizer be smart enough to
- execute the WHERE clause first then JOIN later in Statement 1
- know that table 3 isn't actually needed in Statement 2

I'm pretty new to SQL, so please educate me if I'm misunderstanding anything.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

十秒萌定你 2024-10-04 19:07:06

这取决于很多事情（表大小、索引、键分布等），您应该只检查执行计划：

您不说哪个数据库，但这里有一些方法：
MySql 解释
 SQL Server SET SHOWPLAN_ALL (Transact-SQL)
Oracle 解释计划

teradata 中的解释是什么？
Teradata使用 Visual Explain 和 XML 计划日志记录更快地捕获和比较计划

回复收藏 0 原文

终遇你 2024-10-04 19:07:06

根据相关表的统计信息和索引的可用性，优化器中的查询重写机制可能会也可能不会选择扫描 Table2 来查找 val 的记录。扫描 Table1 之前为 100。

在某些情况下，根据数据人口统计、连接、索引和统计信息，您可能会发现优化器在您认为应该消除查询计划中的记录时却没有消除。即使您有一个派生表，例如示例中的表。您只需在派生表中放置 GROUP BY 即可强制优化器处理派生表。然后，优化器有义务先解析 GROUP BY 聚合，然后才能考虑解析示例中两个表之间的联接。

SELECT table1.id, table3.val
FROM table1
INNER JOIN (
    SELECT table2.id, tabl2.val
    FROM table2
    WHERE val<100
    GROUP BY 1,2
)  table3
ON table1.id=table3.id

这并不是说您的标准方法应该是在您的代码中运行它。当我有一个查询计划时，这通常是我最后的手段之一，该计划根本无法在计划中足够早地消除无关记录，并导致扫描过多的数据并通过各种 SPOOL 文件携带。这只是当您遇到这种情况时可以放入工具箱中的一种技术。

查询重写机制不断从一个版本更新到下一个版本，有关其工作原理的详细信息可以在 Teradata 13.0 的 SQL 事务处理手册。

Depending on the availability of statistics and indexes for the tables in question the query rewrite mechanism in the optimizer will may or may not opt to scan Table2 for records where val < 100 before scanning Table1.

In certain situations, based on data demographics, joins, indexing and statistics you may find that the optimizer is not eliminating records in the query plan when you feel that it should. Even if you have a derived table such as the one in your example. You can force the optimizer to process a derived table by simply placing a GROUP BY in your derived table. The optimizer is then obligated to resolve the GROUP BY aggregate before it can consider resolving the join between the two tables in your example.

SELECT table1.id, table3.val
FROM table1
INNER JOIN (
    SELECT table2.id, tabl2.val
    FROM table2
    WHERE val<100
    GROUP BY 1,2
)  table3
ON table1.id=table3.id

This is not to say that your standard approach should be to run with this through out your code. This is typically one of my last resorts when I have a query plan that simply doesn't eliminate extraneous records earlier enough in the plan and results in too much data being scanned and carried around through the various SPOOL files. This is simply a technique you can put in your toolkit to when you encounter such a situation.

The query rewrite mechanism is continually being updated from one release to the next and the details about how it works can be found in the SQL Transaction Processing Manual for Teradata 13.0.

回复收藏 0 原文

谈情不如逗狗 2024-10-04 19:07:06

除非我遗漏了什么，为什么你还需要 Table1？

只是查询 Table2

Select id, val  
From table2  
WHERE val<100

还是使用 table1 中的行作为过滤器？即，表 1 是否仅包含表 2 中的 Id 的子集？

如果是这样，那么这也将起作用......

 Select id, val  
 From table2  
 Where val<100 
   And id In (Select id 
              From table1)

但是要回答您的问题，是的，查询优化器应该足够智能，能够找出执行将逻辑指令转换为物理结果所需的步骤的最佳顺序。它使用数据库在每个表上维护的存储统计信息来确定要做什么（例如使用什么类型的联接逻辑），以及执行操作的顺序，以便最大限度地减少磁盘 IO 和处理成本。

Unless I'm missing something, Why do you even need Table1??

Just query Table2

Select id, val  
From table2  
WHERE val<100

or are you using the rows in table1 as a filter? i.e., Does table1 only copntain a subset of the Ids in Table2??

If so, then this will work as well ...

 Select id, val  
 From table2  
 Where val<100 
   And id In (Select id 
              From table1)

But to answer your question, Yes the query optimizer should be intelligent enough to figure out the best order in which to execute the steps necessary to translate your logical instructions into a physical result. It uses the strored statistics that the database maintains on each table to determine what to do (what type of join logic to use for example), as wekll as what order to perform the operations in in order to minimize Disk IOs and processing costs.

回复收藏 0 原文