INNER JOIN 的列名不明确 SQL 错误:为什么?
以下代码将用于从两个表中选择数据:
SELECT t1.foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE2 t2 ON t1.foo=t2.foo
我可以轻松地编写
SELECT t2.foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE2 t2 ON t1.foo=t2.foo
t1.foo
或 t2.foo
:一个或六个中的六个。为什么不只是 foo
呢?
我一直想知道为什么 SQL Server 不自动返回数据而无需我指定一个表或另一个表,因为选择完全是任意的(据我所知)。
我可以编造一个您需要指定表的场景,例如
SELECT t1.foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE t2 ON t1.foo+=t2.foo
但是,根据我的经验,这种场景与常态相去甚远。
谁能告诉我为什么该语言的设计使得我必须在代码中做出这个看似武断的决定?
The following code will work to select data from two tables:
SELECT t1.foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE2 t2 ON t1.foo=t2.foo
I could just as easily written
SELECT t2.foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE2 t2 ON t1.foo=t2.foo
t1.foo
or t2.foo
: six of one or half a dozen of the other. Why not just foo
?
I've been wonder why doesn't the SQL server just automatically return the data without me specifying one table or the other since the choice is entirely arbitrary (as far as I can tell).
I can make up a scenario where you would need to specify the table, such as
SELECT t1.foo, t2.bar FROM TABLE1 t1 INNER JOIN TABLE t2 ON t1.foo+=t2.foo
However, such scenarios are far from the norm in my experience.
Can anyone enlighten me as to why the language is designed so that I have to make this seemingly arbitrary decision in my code?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
因为 MS SQL 中的相等并不一定意味着它们在您想要的值上相等。考虑 foo 的以下 2 个值:“Bar”、“baR”。由于比较中固有的大小写不敏感,Sql 会认为它们在连接方面是相等的,但是您要求的是哪一个? SQL Server 不知道,也无法猜测。你必须明确地告诉它。
编辑:正如 @Lukas Eder 所说,并非所有 SQL 实现都使用不区分大小写的比较。我知道 MS SQL 使用不区分大小写,我的回答就是考虑到这个概念。
Because equality in MS SQL doesn't necessarily mean they are equal in the values you want. Consider the following 2 values for foo: "Bar", "baR". Sql will believe them to be equal with respect to the join because of the case insensitivity inherent in the comparison, but which one were you asking for? SQL Server doesn't know, and it can't guess. You must explicitly tell it.
Edit:As @Lukas Eder brought up, not all implementations of SQL use case insensitive comparisons. I know MS SQL uses case insensitivity, and my answer is geared with this concept in mind.
你的推理并不完全正确。虽然 t1.foo = t2.foo 可能成立,但这并不意味着它们是相同的。一些示例:
VARCHAR(1)
,另一个可以是VARCHAR(2)
VARCHAR(1)
,另一个可以是NUMBER (1)
t1
可以是一个简单的表,而t2
是一个视图(或嵌套选择),它对的值进行超复杂的计算>foo
.在某些 RDBMS 中,foo
的投影成本可能不同。还有许多其他原因,为什么只写
foo
会产生歧义Your reasoning is not quite true. While
t1.foo = t2.foo
may hold true, that doesn't mean they're the same. Some examples:VARCHAR(1)
the otherVARCHAR(2)
VARCHAR(1)
the otherNUMBER(1)
t1
could be a simple table, whereast2
is a view (or nested select) that makes hyper-complex calculations for the value offoo
. The projection cost of eitherfoo
might not be the same in some RDBMS.And there are dozens of other reasons, why it would be ambigous to just write
foo
如果您确定这些列代表相同的事物,您可以使用 USING 子句进行连接。
否则无法保证 t1.foo 与 t2.foo 相同
If you're sure that the columns represent the same thing you could join with a USING clause.
Otherwise there's no guarantee that t1.foo is the same thing as t2.foo
在本例中,您有一个 INNER JOIN,因此很明显该决定是任意的。但在很多情况下,即使您加入 FOO,两者也并不相同。
例如:如果是 LEFT JOIN
或者如果您有类似 ON t1.foo = t2.foo+/-/whater 的内容,
引擎需要您的输入才能知道从哪里获取数据。
In this case you have a INNER JOIN so it's clear that the decision is arbitrary. But there are many situations where even if you join on FOO the 2 are not the same.
EX: in case of a LEFT JOIN
OR in case you have something like ON t1.foo = t2.foo+/-/whater
The engine needs your input to know where to take the data from.
您需要做出此决定的原因是它不是任意的。系统不知道哪个表有你想要的数据。您需要指定它。系统设计执行计划时,并没有弄清楚两个表中哪些列包含相同的数据。就其而言,这两列可能有不同的数据。它不会推断,因为您说这些列是相等的,所以在未指定列时它可以显示任一列。
The reason you need to make this decision is that it isn't arbitrary. The system does not know which table has the data you want. You need to specify it. When the system designs the execution plan, it does not figure out which columns contain the same data in both tables. As far as it is concerned, these two columns could have different data. It isn't going to extrapolate that since you are saying these columns are equal that it could display either column when one isn't specified.
在这种特殊情况下,
t1.foo
和t2.foo
是同一件事,但引擎并未针对此进行优化(如果是的话,将会令人困惑)。如果您的加入做了一些可能不相同的事情(例如这样)怎么办?由于我们使用
<
,t1 和 t2 上的 foo 可能是非常不同的东西。在这种情况下,引擎无法“猜测”。仅仅因为这些体验在您的体验中“远离常态”,引擎就必须允许它,否则会使某些类型的查询极其难以编写。
In that particular case,
t1.foo
andt2.foo
are the same thing, but the engine isn't optimized for that (and would be confusing if it was). What if your join did something where they may not be the same, like this?Since we are using
<
, foo on t1 and t2 could be very different things. The engine can't "guess" in this case.Just because those experiences are "far from the norm" in your experience, the engine has to allow for it, otherwise it would make some types of queries extremely difficult to write.
SQL 不会这样做,因为它根本无法解决歧义问题。 (但正如您所注意到的,它们是等效的。)
对于应用程序生命周期,最终最好自己解决它们,因为如果列更改名称或连接类型更改,您的代码不太可能被破坏,并且您的意图更加明显。但我确信这些好处不是故意的。
SQL doesn't do it because it simply doesn't resolve ambiguities. (But as you note they are equivalent.)
For application lifecycle it's ultimately better to resolve them yourself, because if a column changes name or the join type changes, your code is less likely to be broken and it's more obvious what your intentions were. But those benefits weren't intentional, I'm sure.