哪个更快:Sum(Case When) 或 Group By/Count(*)?

发布于 2024-12-11 04:21:56 字数 500 浏览 0 评论 0原文

我可以写

Select 
Sum(Case When Resposta.Tecla = 1 Then 1 Else 0 End) Valor1,
Sum(Case When Resposta.Tecla = 2 Then 1 Else 0 End) Valor2,
Sum(Case When Resposta.Tecla = 3 Then 1 Else 0 End) Valor3,
Sum(Case When Resposta.Tecla = 4 Then 1 Else 0 End) Valor4,
Sum(Case When Resposta.Tecla = 5 Then 1 Else 0 End) Valor5
From Resposta

或者

Select 
    Count(*)
From Resposta Group By Tecla

我在大量行上尝试过这个,看起来花费了相同的时间。

任何人都可以证实这一点吗?

I can write

Select 
Sum(Case When Resposta.Tecla = 1 Then 1 Else 0 End) Valor1,
Sum(Case When Resposta.Tecla = 2 Then 1 Else 0 End) Valor2,
Sum(Case When Resposta.Tecla = 3 Then 1 Else 0 End) Valor3,
Sum(Case When Resposta.Tecla = 4 Then 1 Else 0 End) Valor4,
Sum(Case When Resposta.Tecla = 5 Then 1 Else 0 End) Valor5
From Resposta

Or

Select 
    Count(*)
From Resposta Group By Tecla

I tried this over a large number of rows and it seems like taking the same time.

Anyone can confirm this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

半衾梦 2024-12-18 04:21:56

我相信 Group By 更好,因为没有具体的治疗方法。
它可以通过数据库引擎进行优化。
我认为结果可能取决于您使用的数据库引擎。
也许您正在使用的优化了第一个查询,并且它就像一个 group by !

您可以尝试“解释/解释计划”命令来查看引擎如何计算您的查询,但使用我的 Microsoft SQL Server 2008,我只能看到 2 个操作(“计算标量”和“聚合”)之间的交换。

我在数据库表上尝试了这样的查询:

  • SQL Server 2k8
  • 163000 行在表
  • 12 个类别(Valor1 -> Valor12)

中,结果完全不同:

  • 分组依据:2 秒
  • 案例:6 秒!

所以我的选择是“分组依据”。
另一个好处是查询写起来更简单!

I believe the Group By is better because there are no specific treatments.
It can be optimized by the database engine.
I think the results may depend on the database engine you use.
Maybe the one you are using optimizes the first query anderstanding it is like a group by !

You can try the "explain / explain plan" command to see how the engine is computing your querys but with my Microsoft SQL Server 2008, I just can see a swap between 2 operations ("Compute scalar" and "agregate").

I tried such queries on a database table :

  • SQL Server 2k8
  • 163000 rows in the table
  • 12 cathegories (Valor1 -> Valor12)

the results are quite differents :

  • Group By : 2seconds
  • Case When : 6seconds !

So My choice is "Group By".
Another benefit is the query is simplyer to write !

半仙 2024-12-18 04:21:56

数据库在内部对第二个查询执行的操作实际上与您明确告诉它对第一个查询执行的操作相同。执行计划以及查询所花费的时间不应有任何差异。考虑到这一点,显然使用第二个查询更好:

  • 当您有更多的 Tecla 值时,它更加灵活
    不需要更改您的查询,
  • 它更容易理解。如果您对 Tecla 有很多值
    阅读第一个查询并意识到它很重要会更困难
    不同的值
  • 它更小 - 您向数据库服务器发送的信息更少,并且它可能会更快地解析查询,这是我在此查询中看到的唯一性能差异。这会产生影响,尽管很小

What the DB does internally with the second query is practically the same as what you explicitly tell it to do with the first. There should be no difference in the execution plan and thus in the time the query takes. Taking this into account, clearly using the second query is better:

  • it's much more flexible, when there are more values of Tecla you
    don't need to change your query
  • it's easier to understand. If you have a lot of values for Tecla
    it'll be harder to read the first query and realize it just counts
    distinct values
  • it's smaller - you're sending less information to the DB server and it will probably parse the query faster, which is the only performance difference I see in this queries. This makes a difference, albeit small
黑凤梨 2024-12-18 04:21:56

任何一个都必须从 Resposta 读取所有行,因此对于任何大小合理的表,我预计 I/O 成本占主导地位 - 给出大致相同的总体运行时间。

我通常会使用:

Select
    Tecla,
    Count(*)
From Resposta
Group By Tecla

如果 Tecla 值的范围将来有可能发生变化。

Either one is going to have to read all rows from Resposta, so for any reasonably sized table, I'd expect the I/O cost to dominate - giving approximately the same overall runtime.

I'd generally use:

Select
    Tecla,
    Count(*)
From Resposta
Group By Tecla

If there's a reasonable chance that the range of Tecla values will change in the future.

歌入人心 2024-12-18 04:21:56

在我看来, GROUP BY 语句总是比 SUM(CASE WHEN ...) 更快,因为在您的 SUM ... 示例中将进行 5 种不同的计算,而使用 GROUP BY 时,DB 将简单地排序和计算。

想象一下,您有一个装有不同硬币的袋子,您需要知道每种类型的硬币有多少。您可以通过以下方式完成此操作:

  • SUM(CASE WHEN ...) 方法是将每个硬币与预定义的示例硬币进行比较,并对每个示例进行数学计算(添加 1 或 0);
  • GROUP BY 方式是按硬币类型对硬币进行排序,然后对每个组进行计数。

您更喜欢哪种方法?

In my opinion GROUP BY statement will always be faster than SUM(CASE WHEN ...) because in your example for SUM ... there would be 5 different calculations while when using GROUP BY, DB will simply sort and calculate.

Imagine, you have a bag with different coins and you need to know, how much of earch type of coins do you have. You can do it this ways:

  • The SUM(CASE WHEN ...) way would be to compare each coin with predefined sample coins and do the math for each sample (add 1 or 0);
  • The GROUP BY way would be to sort coins by their types and then count earch group.

Which method would you prefer?

三生路 2024-12-18 04:21:56

为了与 count(*) 公平竞争,您的第一个 SQL 可能应该是:

Select 
Sum(Case When Resposta.Tecla >= 1 AND Resposta.Tecla <=5 Then 1 Else 0 End) Valor
From Resposta

并且为了回答您的问题,我没有注意到 SUM CASE WHEN 之间的速度有任何差异> 和 COUNT。我正在 POSTGRESQL 中查询超过 250,000 行。

To fairly compete with count(*), Your first SQL should probably be:

Select 
Sum(Case When Resposta.Tecla >= 1 AND Resposta.Tecla <=5 Then 1 Else 0 End) Valor
From Resposta

And to answer your question, I'm not noticing a difference at all in speed between SUM CASE WHEN and COUNT. I'm querying over 250,000 rows in POSTGRESQL.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文