如何在不使用嵌套 sql 查询的情况下获取聚合

发布于 2024-08-14 12:13:04 字数 913 浏览 9 评论 0原文

我正在从 Avamar (Postgresql) 数据库编写自定义报告,其中包含备份作业历史记录。我的任务是显示昨晚失败的作业(基于 status_code),并在同一行中包含过去 30 天内该客户端的成功率(成功的作业/运行的作业总数)。

因此,总体选择仅选取失败的客户端(status_code 不等于 30000,这是成功代码)。然而,对于昨晚失败的每个客户端,我还需要知道有多少作业已成功,以及过去 30 天内启动/计划的作业总数。 (时间段部分很简单,因此为了保持简单,我没有将其包含在下面的代码中。)

根据 Hobodave 对 这个类似的问题,但我不太能解决它。

在下面的查询中,我收到以下错误: 列“v_activities_2.client_name”必须出现在 GROUP BY 子句中或在聚合函数中使用

这是我的(损坏的)查询。我知道这个逻辑是有缺陷的,但我对如何最好地实现这一目标一无所知。预先感谢您的任何指导!

select
  split_part(client_name,'.',1) as client_name,
  bunchofothercolumnns,
  round(
    100.0 * (
      ((sum(CASE WHEN status_code=30000 THEN 1 ELSE 0 END))) /
      ((sum(CASE WHEN type='Scheduled Backup' THEN 1 ELSE 0 END))))
    as percent_total
from v_activities_2
  where
    status_code<>30000
  order by client_name

I am writing a custom report from an Avamar (Postgresql) database which contains backup job history. My task is to display jobs that failed last night (based on status_code), and include that client's success ratio (jobs succeeded/total jobs run) over the past 30 days on the same line.

So the overall select just picks up clients that failed (status_code doesn't equal 30000, which is the success code). However, for each failed client from last night, I need to also know how many jobs have succeeded, and how many jobs total were started/scheduled in the past 30 days. (The time period part is simple, so I haven't included it in the code below, to keep it simple.)

I tried to do this without using a nested query, based on Hobodave's feedback on this similar question but I'm not quite able to nail it.

In the query below, I get the following error:
column "v_activities_2.client_name" must appear in the GROUP BY clause or be used in an aggregate function

Here's my (broken) query. I know the logic is flawed, but I'm coming up empty with how best to accomplish this. Thanks in advance for any guidance!

select
  split_part(client_name,'.',1) as client_name,
  bunchofothercolumnns,
  round(
    100.0 * (
      ((sum(CASE WHEN status_code=30000 THEN 1 ELSE 0 END))) /
      ((sum(CASE WHEN type='Scheduled Backup' THEN 1 ELSE 0 END))))
    as percent_total
from v_activities_2
  where
    status_code<>30000
  order by client_name

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不忘初心 2024-08-21 12:13:04

如果 SELECT 中的列没有对其执行聚合函数,则需要定义 GROUP BY:

  SELECT SPLIT_PART(t.client_name, '.', 1) AS client_name,
         SUM(CASE WHEN status_code = 30000 THEN 1 ELSE 0 END) as successes
    FROM v_activities_2
GROUP BY SPLIT_PART(t.client_name, '.', 1)
ORDER BY client_name

您期望以下内容如何工作:

      SUM(CASE WHEN status_code = 30000 THEN 1 ELSE 0 END) as successes
 FROM v_activities_2
WHERE status_code <> 30000

您不能期望计算行数重新排除

You need to define a GROUP BY if you have columns in the SELECT that do not have aggregate functions performed on them:

  SELECT SPLIT_PART(t.client_name, '.', 1) AS client_name,
         SUM(CASE WHEN status_code = 30000 THEN 1 ELSE 0 END) as successes
    FROM v_activities_2
GROUP BY SPLIT_PART(t.client_name, '.', 1)
ORDER BY client_name

How do you expect the following to work:

      SUM(CASE WHEN status_code = 30000 THEN 1 ELSE 0 END) as successes
 FROM v_activities_2
WHERE status_code <> 30000

You can't expect to count rows you're excluding.

So尛奶瓶 2024-08-21 12:13:04

为什么要避免嵌套查询?

这似乎是最合乎逻辑/最有效的解决方案。

如果您在没有 sobqueries(仅 group by)的情况下一次性执行此操作,您将以扫描整个表(或连接表)结束 - 这效率不高,因为昨晚只有某些客户端失败。

一般来说,子查询并没有那么糟糕。

Why avoid nested query?

It seems most logical / efficient solution here.

If you do this in one pass with no sobqueries (only group by's), you will end with scanning the whole table (or joined tables) - which is not efficient, because only SOME clients failed last night.

Subqueries are not that bad, in general.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文