SQL Server 中 UNION 与 UNION ALL 的性能
我必须在多个表中运行 SELECT 语句。我确信这些表返回不同的记录。无论如何,我正在使用 UNION ALL。
当我确定表返回不同的记录时,从性能角度来看,使用 UNION 或 UNION ALL 哪个更好?
I have to run a SELECT statement across several tables. I am sure the tables return different records. I am anyway using UNION ALL.
Is it better to use UNION or of UNION ALL in performance terms when I am sure the tables return different records?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
当您不关心消除重复记录时,UNION ALL 的性能会比 UNION 更好,因为您可以避免昂贵的 不同的排序操作。请参阅:SQL SERVER – Union 与 Union All 之间的差异 – 最佳性能比较
UNION ALL will perform better than UNION when you're not concerned about eliminating duplicate records because you're avoiding an expensive distinct sort operation. See: SQL SERVER – Difference Between Union vs. Union All – Optimal Performance Comparison
UNION ALL 总是更快,因为 UNION 排除重复的条目
UNION ALL always is faster, because UNION exclude duplicated entries
UNION 内部实现两个查询。
1.
SELECT
将返回数据集2.
不同
。任何研究过数据库内部原理的人都可以很容易地理解
DISTINCT
子句在处理方面的成本极高。如果您非常确定生成的数据集不需要具有唯一的行,那么我们可以跳过
UNION
并使用UNION ALL
代替。UNION ALL
将与UNION
相同,只是它不会在内部触发DISTINCT
从而节省我们昂贵的操作UNION implement internally two queries.
1.
SELECT
which will return a dataset2.
DISTINCT
.Anyone who has studied database internals can easily understand that a
DISTINCT
clause is extremely costly in terms of processing.If you are pretty sure that the resultant dataset need not have unique rows then we can skip
UNION
and useUNION ALL
instead.UNION ALL
will be same asUNION
except that it doesn't fire aDISTINCT
internally sparing us costly operations当您知道需要所有结果行时,最好使用 UNION ALL,无论您是否知道它们是否不同。不带“all”的 UNION 将始终执行“独特检查”,无论数据实际是什么。
It is better to use UNION ALL when you know you want all the result rows, whether or not you know they'll be distinct or not. UNION without "all" will always perform the "distinct check", regardless of what the data actually is.
为什么 UNION ALL 更快?因为 UNION 必须进行排序来删除重复项。如果您不需要删除重复项,那么 UNION ALL 是更好的选择,但是 UNION 确实有其目的,应该在适当的时候使用。
Why is UNION ALL faster? Because UNION must do a sort to remove the duplicates. If you do not need to remove duplicates then UNION ALL is the better option, however UNION does have a purpose and should be used when appropriate.
我会冒险建议这取决于你的数据。
如果端到端地测量性能(从客户端发送请求的第一个字节到获得响应的最后一个字节),那么您会遇到以下两个极端:
在情况 1 中,UNION ALL 会更快,因为它在返回数据之前不需要对数据进行排序(以删除重复项)。
在情况 2 中,UNION 会更快,因为删除内存中的重复项比通过网络发送它们要快得多。如果您的结果集包含 100 万行,只有 2 个唯一值,那么在删除这些重复项后,您的网络时间将会少得多。
I'm going to go out on a limb and suggest that it depends on your data.
If performance is measured end-to-end (from the moment the client sends the first byte of the request to the moment it gets the last byte of the response) then you have the following two extremes:
In case 1, UNION ALL will be faster simply because it does not need to sort the data (to remove duplicates) before returning it.
In case 2, UNION will be faster because it's much quicker to remove duplicates in memory than sending them over the wire. If your result set contains 1 million rows with only 2 unique values then your network time will be much smaller once those duplicates have been removed.