DB2 SQL - 使用 GROUP BY 的中值

发布于 2024-08-02 03:27:57 字数 1201 浏览 7 评论 0原文

首先，我在 DB2 for i5/OS V5R4 上运行。我有 ROW_NUMBER()、RANK() 和公用表表达式。我没有有 TOP n PERCENT 或 LIMIT OFFSET。

我正在使用的实际数据集很难解释，所以我们假设我有一个天气历史表，其中的列是（城市、温度、时间戳）。我想比较每个组（城市）的中位数和平均值。

这是我发现获得整个表聚合的中位数的最干净的方法。我从 IBM 红皮书此处改编而来：

WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
  WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t

这对于获取后面有一排，但似乎因为分组而分崩离析。从概念上讲，这就是我想要的：


SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...

city           | mean_temp       | median_temp       
===================================================
'Minneapolis'  | 60              | 64
'Milwaukee'    | 65              | 66
'Muskegon'     | 70              | 61

可能有一个让我看起来很愚蠢的答案，但我有一个心理障碍，这不是我现在要做的第一件事。似乎这是可能的，但我不能使用极其复杂的东西，因为它是一个大表，而且我希望能够自定义聚合哪些列。

原文

First of all, I am running on DB2 for i5/OS V5R4. I have ROW_NUMBER(), RANK() and common table expressions. I do not have TOP n PERCENT or LIMIT OFFSET.

The actual data set I'm working with is hard to explain, so let's just say I have a weather history table where the columns are (city, temperature, timestamp). I want to compare medians to averages for each group (city).

This was the cleanest way I found to get a median for a whole table aggregation. I adapted it from the IBM Redbook here:

WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
  WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t

That works well for getting a single row back, but it seems to fall apart for grouping. Conceptually, this is what I want:


SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...

city           | mean_temp       | median_temp       
===================================================
'Minneapolis'  | 60              | 64
'Milwaukee'    | 65              | 66
'Muskegon'     | 70              | 61

There could be an answer that makes me look stupid, but I'm having a mental block and this isn't my #1 thing to work on right now. Seems like it could be possible, but I can't use something that's extremely complex since it's a large table and I want the ability to customize which columns are being aggregated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

潦草背影 2024-08-09 03:27:57

在SQL Server中，像count(*)这样的聚合函数可以在没有group by的情况下进行分区和计算。我快速浏览了参考的红皮书，看起来 DB2 也有相同的功能。但如果没有，那么这将不起作用：

create table TemperatureHistory 
    (City varchar(20)
    , Temperature decimal(5, 2)
    , DateTaken datetime)

insert into TemperatureHistory values ('Minneapolis', 61, '20090101')
insert into TemperatureHistory values ('Minneapolis', 59, '20090102')

insert into TemperatureHistory values ('Milwaukee', 65, '20090101')
insert into TemperatureHistory values ('Milwaukee', 65, '20090102')
insert into TemperatureHistory values ('Milwaukee', 100, '20090103')

insert into TemperatureHistory values ('Muskegon', 80, '20090101')
insert into TemperatureHistory values ('Muskegon', 70, '20090102')
insert into TemperatureHistory values ('Muskegon', 70, '20090103')
insert into TemperatureHistory values ('Muskegon', 20, '20090104')

; with base_t as
    (select city
        , Temperature
        , row_number() over (partition by city order by temperature) as RowNum
        , (count(*) over (partition by city)) + 1 as CountPlusOne 
    from TemperatureHistory)
select City
    , avg(Temperature) as MeanTemp
    , avg(case 
        when RowNum in (FLOOR(CountPlusOne/2.0), CEILING(CountPlusOne/2.0)) 
            then Temperature
            else null end) as MedianTemp
from base_t 
group by City

In SQL Server, agreagate functions like count(*) can be partitioned and calculated without a group by. I looked quickly through the referenced redbook, and it looks like DB2 has the same feature. But if not, then this won't work:

create table TemperatureHistory 
    (City varchar(20)
    , Temperature decimal(5, 2)
    , DateTaken datetime)

insert into TemperatureHistory values ('Minneapolis', 61, '20090101')
insert into TemperatureHistory values ('Minneapolis', 59, '20090102')

insert into TemperatureHistory values ('Milwaukee', 65, '20090101')
insert into TemperatureHistory values ('Milwaukee', 65, '20090102')
insert into TemperatureHistory values ('Milwaukee', 100, '20090103')

insert into TemperatureHistory values ('Muskegon', 80, '20090101')
insert into TemperatureHistory values ('Muskegon', 70, '20090102')
insert into TemperatureHistory values ('Muskegon', 70, '20090103')
insert into TemperatureHistory values ('Muskegon', 20, '20090104')

; with base_t as
    (select city
        , Temperature
        , row_number() over (partition by city order by temperature) as RowNum
        , (count(*) over (partition by city)) + 1 as CountPlusOne 
    from TemperatureHistory)
select City
    , avg(Temperature) as MeanTemp
    , avg(case 
        when RowNum in (FLOOR(CountPlusOne/2.0), CEILING(CountPlusOne/2.0)) 
            then Temperature
            else null end) as MedianTemp
from base_t 
group by City

回复收藏 0 原文

~没有更多了~