连接查询和子查询哪个更快,为什么?我什么时候应该选择其中一种而不是另一种?
我有一个联接查询
Select E.Id,E.Name from Employee E join Dept D on E.DeptId=D.Id
和一个子查询,
Select E.Id,E.Name from Employee Where DeptId in (Select Id from Dept)
哪个会更快,为什么?
我什么时候应该选择其中一种而不是另一种?
I have a join query
Select E.Id,E.Name from Employee E join Dept D on E.DeptId=D.Id
and a subquery query
Select E.Id,E.Name from Employee Where DeptId in (Select Id from Dept)
Which would be faster and why?
When I should prefer one over the other?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我希望第一个查询会更快,主要是因为您有一个等价项和一个显式 JOIN。根据我的经验,
IN
是一个非常慢的运算符,因为 SQL 通常将其计算为一系列由“OR”分隔的WHERE
子句 (WHERE x=Y OR x =Z 或...
)。不过,与 ALL THINGS SQL 一样,您的情况可能会有所不同。速度在很大程度上取决于索引(您在两个 ID 列上都有索引吗?这将有很大帮助......)。
100% 确定哪个更快的唯一真正方法是打开性能跟踪(IO 统计信息特别有用)并运行它们。确保在运行之间清除缓存!
I would EXPECT the first query to be quicker, mainly because you have an equivalence and an explicit JOIN. In my experience
IN
is a very slow operator, since SQL normally evaluates it as a series ofWHERE
clauses separated by "OR" (WHERE x=Y OR x=Z OR...
).As with ALL THINGS SQL though, your mileage may vary. The speed will depend a lot on indexes (do you have indexes on both ID columns? That will help a lot...) among other things.
The only REAL way to tell with 100% certainty which is faster is to turn on performance tracking (IO Statistics is especially useful) and run them both. Make sure to clear your cache between runs!
嗯,我相信这是一个“古老但黄金”的问题。答案是:“这要看情况!”。
性能是一个如此微妙的主题,以至于说“永远不要使用子查询,总是加入”就太愚蠢了。
在以下链接中,您将找到一些我认为非常有用的基本最佳实践:
我有一个包含 50000 个元素的表,我要查找的结果是 739 个元素。
我一开始的查询是这样的:
执行花费了 7.9 秒。
我最后的查询是这样的:
并且花费了 0.0256s
很好的 SQL,很好。
Well, I believe it's an "Old but Gold" question. The answer is: "It depends!".
The performances are such a delicate subject that it would be too much silly to say: "Never use subqueries, always join".
In the following links, you'll find some basic best practices that I have found to be very helpful:
I have a table with 50000 elements, the result i was looking for was 739 elements.
My query at first was this:
and it took 7.9s to execute.
My query at last is this:
and it took 0.0256s
Good SQL, good.
性能取决于您正在执行的数据量...
如果数据量较少,大约为 20k。 JOIN 效果更好。
如果数据更像 100k+,那么 IN 效果更好。
如果您不需要其他表中的数据,IN 很好,但最好选择 EXISTS。
我测试了所有这些标准,并且表具有正确的索引。
Performance is based on the amount of data you are executing on...
If it is less data around 20k. JOIN works better.
If the data is more like 100k+ then IN works better.
If you do not need the data from the other table, IN is good, But it is alwys better to go for EXISTS.
All these criterias I tested and the tables have proper indexes.
开始查看执行计划以了解 SQl Server 解释它们的方式的差异。您还可以使用 Profiler 多次实际运行查询并获取差异。
我不希望它们有如此大的不同,当您使用相关子查询时,使用连接而不是子查询可以获得真正的、巨大的性能提升。
EXISTS 通常比这两者中的任何一个都更好,当您谈论左连接时,您想要不在左连接表中的所有记录,那么 NOT EXISTS 通常是更好的选择。
Start to look at the execution plans to see the differences in how the SQl Server will interpret them. You can also use Profiler to actually run the queries multiple times and get the differnce.
I would not expect these to be so horribly different, where you can get get real, large performance gains in using joins instead of subqueries is when you use correlated subqueries.
EXISTS is often better than either of these two and when you are talking left joins where you want to all records not in the left join table, then NOT EXISTS is often a much better choice.
性能应该是一样的;在表上应用正确的索引和集群更为重要(存在 关于该主题的一些好资源)。
(编辑以反映更新的问题)
The performance should be the same; it's much more important to have the correct indexes and clustering applied on your tables (there exist some good resources on that topic).
(Edited to reflect the updated question)
我知道这是一篇旧文章,但我认为这是一个非常重要的主题,尤其是现在我们拥有 10M+ 记录并谈论 TB 级数据。
我还将阐述以下观点。我的表 ([data]) 中有大约 45M 条记录,[cats] 表中有大约 300 条记录。我为我将要讨论的所有查询建立了广泛的索引。
考虑示例 1:
与示例 2 相比:
示例 1 的运行时间约为 23 分钟。示例 2 大约需要 5 分钟。
所以我得出结论,在这种情况下子查询要快得多。当然,请记住,我使用的 M.2 SSD 驱动器具有 I/O @ 1GB/秒(这是字节而不是位),因此我的索引也非常快。因此,在您的情况下,这也可能会影响速度。
如果是一次性数据清理,最好让它运行并完成。我使用 TOP(10000) 并查看在执行大查询之前需要多长时间并乘以记录数。
如果您正在优化生产数据库,我强烈建议对数据进行预处理,即使用触发器或作业代理来异步更新记录,以便实时访问检索静态数据。
I know this is an old post, but I think this is a very important topic, especially nowadays where we have 10M+ records and talk about terabytes of data.
I will also weight in with the following observations. I have about 45M records in my table ([data]), and about 300 records in my [cats] table. I have extensive indexing for all of the queries I am about to talk about.
Consider Example 1:
versus Example 2:
Example 1 took about 23 mins to run. Example 2 took around 5 mins.
So I would conclude that sub-query in this case is much faster. Of course keep in mind that I am using M.2 SSD drives capable of i/o @ 1GB/sec (thats bytes not bits), so my indexes are really fast too. So this may affect the speeds too in your circumstance
If its a one-off data cleansing, probably best to just leave it run and finish. I use TOP(10000) and see how long it takes and multiply by number of records before I hit the big query.
If you are optimizing production databases, I would strongly suggest pre-processing data, i.e. use triggers or job-broker to async update records, so that real-time access retrieves static data.
这两个查询在语义上可能不相同。如果一名员工为多个部门工作(可能在我工作的企业中;不可否认,这意味着您的表未完全规范化),那么第一个查询将返回重复的行,而第二个查询则不会。为了使查询在这种情况下等效,必须将
DISTINCT
关键字添加到SELECT
子句中,这可能会对性能产生影响。请注意,有一条设计经验法则规定表应该对实体/类或实体/类之间的关系进行建模,但不能同时对两者进行建模。因此,我建议您创建第三个表,例如
OrgChart
,来对员工和部门之间的关系进行建模。The two queries may not be semantically equivalent. If a employee works for more than one department (possible in the enterprise I work for; admittedly, this would imply your table is not fully normalized) then the first query would return duplicate rows whereas the second query would not. To make the queries equivalent in this case, the
DISTINCT
keyword would have to be added to theSELECT
clause, which may have an impact on performance.Note there is a design rule of thumb that states a table should model an entity/class or a relationship between entities/classes but not both. Therefore, I suggest you create a third table, say
OrgChart
, to model the relationship between employees and departments.您可以使用解释计划来获得客观答案。
对于您的问题,Exists 过滤器 可能执行速度最快。
You can use an Explain Plan to get an objective answer.
For your problem, an Exists filter would probably perform the fastest.