DB2 SQL - 使用 GROUP BY 的中值
首先,我在 DB2 for i5/OS V5R4 上运行。 我有 ROW_NUMBER()、RANK() 和公用表表达式。 我没有有 TOP n PERCENT 或 LIMIT OFFSET。
我正在使用的实际数据集很难解释,所以我们假设我有一个天气历史表,其中的列是(城市、温度、时间戳)
。 我想比较每个组(城市)
的中位数和平均值。
这是我发现获得整个表聚合的中位数的最干净的方法。 我从 IBM 红皮书此处改编而来:
WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t
这对于获取后面有一排,但似乎因为分组而分崩离析。 从概念上讲,这就是我想要的:
SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...
city | mean_temp | median_temp
===================================================
'Minneapolis' | 60 | 64
'Milwaukee' | 65 | 66
'Muskegon' | 70 | 61
可能有一个让我看起来很愚蠢的答案,但我有一个心理障碍,这不是我现在要做的第一件事。 似乎这是可能的,但我不能使用极其复杂的东西,因为它是一个大表,而且我希望能够自定义聚合哪些列。
First of all, I am running on DB2 for i5/OS V5R4. I have ROW_NUMBER(), RANK() and common table expressions. I do not have TOP n PERCENT or LIMIT OFFSET.
The actual data set I'm working with is hard to explain, so let's just say I have a weather history table where the columns are (city, temperature, timestamp)
. I want to compare medians to averages for each group (city)
.
This was the cleanest way I found to get a median for a whole table aggregation. I adapted it from the IBM Redbook here:
WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t
That works well for getting a single row back, but it seems to fall apart for grouping. Conceptually, this is what I want:
SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...
city | mean_temp | median_temp
===================================================
'Minneapolis' | 60 | 64
'Milwaukee' | 65 | 66
'Muskegon' | 70 | 61
There could be an answer that makes me look stupid, but I'm having a mental block and this isn't my #1 thing to work on right now. Seems like it could be possible, but I can't use something that's extremely complex since it's a large table and I want the ability to customize which columns are being aggregated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在SQL Server中,像count(*)这样的聚合函数可以在没有group by的情况下进行分区和计算。 我快速浏览了参考的红皮书,看起来 DB2 也有相同的功能。 但如果没有,那么这将不起作用:
In SQL Server, agreagate functions like count(*) can be partitioned and calculated without a group by. I looked quickly through the referenced redbook, and it looks like DB2 has the same feature. But if not, then this won't work: