哪个更快:Sum(Case When) 或 Group By/Count(*)?
我可以写
Select
Sum(Case When Resposta.Tecla = 1 Then 1 Else 0 End) Valor1,
Sum(Case When Resposta.Tecla = 2 Then 1 Else 0 End) Valor2,
Sum(Case When Resposta.Tecla = 3 Then 1 Else 0 End) Valor3,
Sum(Case When Resposta.Tecla = 4 Then 1 Else 0 End) Valor4,
Sum(Case When Resposta.Tecla = 5 Then 1 Else 0 End) Valor5
From Resposta
或者
Select
Count(*)
From Resposta Group By Tecla
我在大量行上尝试过这个,看起来花费了相同的时间。
任何人都可以证实这一点吗?
I can write
Select
Sum(Case When Resposta.Tecla = 1 Then 1 Else 0 End) Valor1,
Sum(Case When Resposta.Tecla = 2 Then 1 Else 0 End) Valor2,
Sum(Case When Resposta.Tecla = 3 Then 1 Else 0 End) Valor3,
Sum(Case When Resposta.Tecla = 4 Then 1 Else 0 End) Valor4,
Sum(Case When Resposta.Tecla = 5 Then 1 Else 0 End) Valor5
From Resposta
Or
Select
Count(*)
From Resposta Group By Tecla
I tried this over a large number of rows and it seems like taking the same time.
Anyone can confirm this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我相信 Group By 更好,因为没有具体的治疗方法。
它可以通过数据库引擎进行优化。
我认为结果可能取决于您使用的数据库引擎。
也许您正在使用的优化了第一个查询,并且它就像一个 group by !
您可以尝试“解释/解释计划”命令来查看引擎如何计算您的查询,但使用我的 Microsoft SQL Server 2008,我只能看到 2 个操作(“计算标量”和“聚合”)之间的交换。
我在数据库表上尝试了这样的查询:
中,结果完全不同:
所以我的选择是“分组依据”。
另一个好处是查询写起来更简单!
I believe the Group By is better because there are no specific treatments.
It can be optimized by the database engine.
I think the results may depend on the database engine you use.
Maybe the one you are using optimizes the first query anderstanding it is like a group by !
You can try the "explain / explain plan" command to see how the engine is computing your querys but with my Microsoft SQL Server 2008, I just can see a swap between 2 operations ("Compute scalar" and "agregate").
I tried such queries on a database table :
the results are quite differents :
So My choice is "Group By".
Another benefit is the query is simplyer to write !
数据库在内部对第二个查询执行的操作实际上与您明确告诉它对第一个查询执行的操作相同。执行计划以及查询所花费的时间不应有任何差异。考虑到这一点,显然使用第二个查询更好:
Tecla
值时,它更加灵活不需要更改您的查询,
Tecla
有很多值阅读第一个查询并意识到它很重要会更困难
不同的值
What the DB does internally with the second query is practically the same as what you explicitly tell it to do with the first. There should be no difference in the execution plan and thus in the time the query takes. Taking this into account, clearly using the second query is better:
Tecla
youdon't need to change your query
Tecla
it'll be harder to read the first query and realize it just counts
distinct values
任何一个都必须从
Resposta
读取所有行,因此对于任何大小合理的表,我预计 I/O 成本占主导地位 - 给出大致相同的总体运行时间。我通常会使用:
如果 Tecla 值的范围将来有可能发生变化。
Either one is going to have to read all rows from
Resposta
, so for any reasonably sized table, I'd expect the I/O cost to dominate - giving approximately the same overall runtime.I'd generally use:
If there's a reasonable chance that the range of
Tecla
values will change in the future.在我看来,
GROUP BY
语句总是比SUM(CASE WHEN ...)
更快,因为在您的SUM ...
示例中将进行 5 种不同的计算,而使用 GROUP BY 时,DB 将简单地排序和计算。想象一下,您有一个装有不同硬币的袋子,您需要知道每种类型的硬币有多少。您可以通过以下方式完成此操作:
SUM(CASE WHEN ...)
方法是将每个硬币与预定义的示例硬币进行比较,并对每个示例进行数学计算(添加 1 或 0);GROUP BY
方式是按硬币类型对硬币进行排序,然后对每个组进行计数。您更喜欢哪种方法?
In my opinion
GROUP BY
statement will always be faster thanSUM(CASE WHEN ...)
because in your example forSUM ...
there would be 5 different calculations while when usingGROUP BY
, DB will simply sort and calculate.Imagine, you have a bag with different coins and you need to know, how much of earch type of coins do you have. You can do it this ways:
SUM(CASE WHEN ...)
way would be to compare each coin with predefined sample coins and do the math for each sample (add 1 or 0);GROUP BY
way would be to sort coins by their types and then count earch group.Which method would you prefer?
为了与
count(*)
公平竞争,您的第一个 SQL 可能应该是:并且为了回答您的问题,我没有注意到
SUM CASE WHEN
之间的速度有任何差异> 和COUNT
。我正在 POSTGRESQL 中查询超过 250,000 行。To fairly compete with
count(*)
, Your first SQL should probably be:And to answer your question, I'm not noticing a difference at all in speed between
SUM CASE WHEN
andCOUNT
. I'm querying over 250,000 rows in POSTGRESQL.