Oracle:如何“分组依据”超过一个范围?

发布于 2024-08-26 06:54:24 字数 401 浏览 8 评论 0原文

如果我有一个这样的表:

pkey   age
----   ---
   1     8
   2     5
   3    12
   4    12
   5    22

我可以“分组”来获取每个年龄的计数。

select age,count(*) n from tbl group by age;
age  n
---  -
  5  1
  8  1
 12  2
 22  1

我可以使用什么查询来按年龄范围进行分组?

  age  n
-----  -
 1-10  2
11-20  2
20+    1

我使用 10gR2,但我也对任何特定于 11g 的方法感兴趣。

If I have a table like this:

pkey   age
----   ---
   1     8
   2     5
   3    12
   4    12
   5    22

I can "group by" to get a count of each age.

select age,count(*) n from tbl group by age;
age  n
---  -
  5  1
  8  1
 12  2
 22  1

What query can I use to group by age ranges?

  age  n
-----  -
 1-10  2
11-20  2
20+    1

I'm on 10gR2, but I'd be interested in any 11g-specific approaches as well.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

轮廓§ 2024-09-02 06:54:24
SELECT CASE 
         WHEN age <= 10 THEN '1-10' 
         WHEN age <= 20 THEN '11-20' 
         ELSE '21+' 
       END AS age, 
       COUNT(*) AS n
FROM age
GROUP BY CASE 
           WHEN age <= 10 THEN '1-10' 
           WHEN age <= 20 THEN '11-20' 
           ELSE '21+' 
         END
SELECT CASE 
         WHEN age <= 10 THEN '1-10' 
         WHEN age <= 20 THEN '11-20' 
         ELSE '21+' 
       END AS age, 
       COUNT(*) AS n
FROM age
GROUP BY CASE 
           WHEN age <= 10 THEN '1-10' 
           WHEN age <= 20 THEN '11-20' 
           ELSE '21+' 
         END
樱花落人离去 2024-09-02 06:54:24

尝试:

select to_char(floor(age/10) * 10) || '-' 
|| to_char(ceil(age/10) * 10 - 1)) as age, 
count(*) as n from tbl group by floor(age/10);

Try:

select to_char(floor(age/10) * 10) || '-' 
|| to_char(ceil(age/10) * 10 - 1)) as age, 
count(*) as n from tbl group by floor(age/10);
微凉 2024-09-02 06:54:24

您正在寻找的基本上是直方图的数据。

x 轴上有年龄(或年龄范围),y 轴上有计数 n(或频率)。

在最简单的形式中,人们可以简单地计算每个不同年龄值的数量,就像您已经描述的那样:

SELECT age, count(*)
FROM tbl
GROUP BY age

但是,当 x 轴有太多不同的值时,人们可能想要创建组(或簇或桶)。在您的例子中,您按照 10 的恒定范围进行分组。

我们可以避免为每个范围编写 WHEN ... THEN 行 - 如果与年龄无关,则可能有数百个。相反,由于 @NitinMidha 提到的原因,@MatthewFlaschen 的方法更可取。

现在让我们构建 SQL...

首先,我们需要将年龄分为 10 个范围组,如下所示:

  • 0-9
  • 10-19
  • 20 - 29
  • 等。

这可以通过将年龄列除以 10 然后计算来实现结果的 FLOOR:

FLOOR(age/10)

“FLOOR 返回等于或小于 n 的最大整数”
http://docs.oracle.com/cd/E11882_01/server.112/ e26088/functions067.htm#SQLRF00643

然后我们采用原始 SQL 并将 age 替换为该表达式:

SELECT FLOOR(age/10), count(*)
FROM tbl
GROUP BY FLOOR(age/10)

这没问题,但我们还看不到范围。相反,我们只能看到计算出的下限值,即 0, 1, 2 ... n

为了获得实际的下限,我们需要再次将其乘以 10,这样我们就得到了 0, 10, 20 ... n

FLOOR(age/10) * 10

我们还需要每个范围的上限,即下限 bound + 10 - 1

FLOOR(age/10) * 10 + 10 - 1

最后,我们将两者连接成一个字符串,如下所示:

TO_CHAR(FLOOR(age/10) * 10) || '-' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1)

这会创建 '0-9'、'10-19'、'20-29'

。我们的 SQL 看起来像这样:

SELECT 
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1),
COUNT(*)
FROM tbl
GROUP BY FLOOR(age/10)

最后,应用一个顺序和漂亮的列别名:

SELECT 
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1) AS range,
COUNT(*) AS frequency
FROM tbl
GROUP BY FLOOR(age/10)
ORDER BY FLOOR(age/10)

但是,在更复杂的场景中,这些范围可能不会被分组为大小为 10 的常量块,而是需要动态集群。
Oracle 包含更高级的直方图函数,请参阅 http://docs.oracle.com /cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366

感谢@MatthewFlaschen 的方法;我只解释了细节。

What you are looking for, is basically the data for a histogram.

You would have the age (or age-range) on the x-axis and the count n (or frequency) on the y-axis.

In the simplest form, one could simply count the number of each distinct age value like you already described:

SELECT age, count(*)
FROM tbl
GROUP BY age

When there are too many different values for the x-axis however, one may want to create groups (or clusters or buckets). In your case, you group by a constant range of 10.

We can avoid writing a WHEN ... THEN line for each range - there could be hundreds if it were not about age. Instead, the approach by @MatthewFlaschen is preferable for the reasons mentioned by @NitinMidha.

Now let's build the SQL...

First, we need to split the ages into range-groups of 10 like so:

  • 0-9
  • 10-19
  • 20 - 29
  • etc.

This can be achieved by dividing the age column by 10 and then calculating the result's FLOOR:

FLOOR(age/10)

"FLOOR returns the largest integer equal to or less than n"
http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643

Then we take the original SQL and replace age with that expression:

SELECT FLOOR(age/10), count(*)
FROM tbl
GROUP BY FLOOR(age/10)

This is OK, but we cannot see the range, yet. Instead we only see the calculated floor values which are 0, 1, 2 ... n.

To get the actual lower bound, we need to multiply it with 10 again so we get 0, 10, 20 ... n:

FLOOR(age/10) * 10

We also need the upper bound of each range which is lower bound + 10 - 1 or

FLOOR(age/10) * 10 + 10 - 1

Finally, we concatenate both into a string like this:

TO_CHAR(FLOOR(age/10) * 10) || '-' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1)

This creates '0-9', '10-19', '20-29' etc.

Now our SQL looks like this:

SELECT 
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1),
COUNT(*)
FROM tbl
GROUP BY FLOOR(age/10)

Finally, apply an order and nice column aliases:

SELECT 
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1) AS range,
COUNT(*) AS frequency
FROM tbl
GROUP BY FLOOR(age/10)
ORDER BY FLOOR(age/10)

However, in more complex scenarios, these ranges might not be grouped into constant chunks of size 10, but need dynamical clustering.
Oracle has more advanced histogram functions included, see http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366

Credits to @MatthewFlaschen for his approach; I only explained the details.

飘落散花 2024-09-02 06:54:24

这是一个在子查询中创建“范围”表,然后使用它对主表中的数据进行分区的解决方案:

SELECT DISTINCT descr
  , COUNT(*) OVER (PARTITION BY descr) n
FROM age_table INNER JOIN (
  select '1-10' descr, 1 rng_start, 10 rng_stop from dual
  union (
  select '11-20', 11, 20 from dual
  ) union (
  select '20+', 21, null from dual
)) ON age BETWEEN nvl(rng_start, age) AND nvl(rng_stop, age)
ORDER BY descr;

Here is a solution which creates a "range" table in a sub-query and then uses this to partition the data from the main table:

SELECT DISTINCT descr
  , COUNT(*) OVER (PARTITION BY descr) n
FROM age_table INNER JOIN (
  select '1-10' descr, 1 rng_start, 10 rng_stop from dual
  union (
  select '11-20', 11, 20 from dual
  ) union (
  select '20+', 21, null from dual
)) ON age BETWEEN nvl(rng_start, age) AND nvl(rng_stop, age)
ORDER BY descr;
下雨或天晴 2024-09-02 06:54:24

我必须按照一小时内出现的交易数量对数据进行分组。我通过从时间戳中提取小时来做到这一点:

select extract(hour from transaction_time) as hour
      ,count(*)
from   table
where  transaction_date='01-jan-2000'
group by
       extract(hour from transaction_time)
order by
       extract(hour from transaction_time) asc
;

给出输出:

HOUR COUNT(*)
---- --------
   1     9199 
   2     9167 
   3     9997 
   4     7218

如您所见,这提供了一种对每小时记录数进行分组的简单方法。

I had to group data by how many transactions appeared in an hour. I did this by extracting the hour from the timestamp:

select extract(hour from transaction_time) as hour
      ,count(*)
from   table
where  transaction_date='01-jan-2000'
group by
       extract(hour from transaction_time)
order by
       extract(hour from transaction_time) asc
;

Giving output:

HOUR COUNT(*)
---- --------
   1     9199 
   2     9167 
   3     9997 
   4     7218

As you can see this gives a nice easy way of grouping the number of records per hour.

空城旧梦 2024-09-02 06:54:24

将age_range表和age_range_id字段添加到您的表中,并按其分组。

// 原谅 DDL,但你应该明白这个想法

create table age_range(
age_range_id tinyint unsigned not null primary key,
name varchar(255) not null);

insert into age_range values 
(1, '18-24'),(2, '25-34'),(3, '35-44'),(4, '45-54'),(5, '55-64');

// 再次原谅 DML,但你应该明白这个想法

select
 count(*) as counter, p.age_range_id, ar.name
from
  person p
inner join age_range ar on p.age_range_id = ar.age_range_id
group by
  p.age_range_id, ar.name order by counter desc;

如果你愿意,你可以完善这个想法 - 在age_range 表中添加 from_age to_age 列等 - 但我会把它留给你。

希望这有帮助:)

add an age_range table and an age_range_id field to your table and group by that instead.

// excuse the DDL but you should get the idea

create table age_range(
age_range_id tinyint unsigned not null primary key,
name varchar(255) not null);

insert into age_range values 
(1, '18-24'),(2, '25-34'),(3, '35-44'),(4, '45-54'),(5, '55-64');

// again excuse the DML but you should get the idea

select
 count(*) as counter, p.age_range_id, ar.name
from
  person p
inner join age_range ar on p.age_range_id = ar.age_range_id
group by
  p.age_range_id, ar.name order by counter desc;

You can refine this idea if you like - add from_age to_age columns in the age_range table etc - but i'll leave that to you.

hope this helps :)

讽刺将军 2024-09-02 06:54:24

如果使用 Oracle 9i+,您可能能够使用NTILE 分析函数

WITH tiles AS (
  SELECT t.age,
         NTILE(3) OVER (ORDER BY t.age) AS tile
    FROM TABLE t)
  SELECT MIN(t.age) AS min_age,
         MAX(t.age) AS max_age,
         COUNT(t.tile) As n
    FROM tiles t
GROUP BY t.tile

NTILE 的警告是您只能指定分区数量,不是断点本身。因此,您需要指定一个合适的数字。 IE:对于 100 行,NTILE(4) 将为四个存储桶/分区中的每一个分配 25 行。您不能嵌套分析函数,因此您必须使用子查询/子查询分解来对它们进行分层以获得所需的粒度。否则,使用:

  SELECT CASE t.age
           WHEN BETWEEN 1 AND 10 THEN '1-10' 
           WHEN BETWEEN 11 AND 20 THEN '11-20' 
           ELSE '21+' 
         END AS age, 
         COUNT(*) AS n
    FROM TABLE t
GROUP BY CASE t.age
           WHEN BETWEEN 1 AND 10 THEN '1-10' 
           WHEN BETWEEN 11 AND 20 THEN '11-20' 
           ELSE '21+' 
         END

If using Oracle 9i+, you might be able to use the NTILE analytic function:

WITH tiles AS (
  SELECT t.age,
         NTILE(3) OVER (ORDER BY t.age) AS tile
    FROM TABLE t)
  SELECT MIN(t.age) AS min_age,
         MAX(t.age) AS max_age,
         COUNT(t.tile) As n
    FROM tiles t
GROUP BY t.tile

The caveat to NTILE is that you can only specify the number of partitions, not the break points themselves. So you need to specify a number that is appropriate. IE: With 100 rows, NTILE(4) will allot 25 rows to each of the four buckets/partitions. You can not nest analytic functions, so you'd have to layer them using subqueries/subquery factoring to get desired granularity. Otherwise, use:

  SELECT CASE t.age
           WHEN BETWEEN 1 AND 10 THEN '1-10' 
           WHEN BETWEEN 11 AND 20 THEN '11-20' 
           ELSE '21+' 
         END AS age, 
         COUNT(*) AS n
    FROM TABLE t
GROUP BY CASE t.age
           WHEN BETWEEN 1 AND 10 THEN '1-10' 
           WHEN BETWEEN 11 AND 20 THEN '11-20' 
           ELSE '21+' 
         END
月朦胧 2024-09-02 06:54:24

我必须每天清点样本数量。受@Clarkey的启发,我使用 TO_CHAR 将样本日期从时间戳提取为 ISO-8601 日期格式,并在 GROUP BY 和 ORDER BY 子句中使用它。 (进一步的启发,我也将其发布在这里,以防对其他人有用。)

SELECT 
  TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') AS TS_DAY, 
  COUNT(*) 
FROM   
  TABLE X
GROUP BY
  TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD')
ORDER BY
  TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') ASC
/

I had to get a count of samples by day. Inspired by @Clarkey I used TO_CHAR to extract the date of sample from the timestamp to an ISO-8601 date format and used that in the GROUP BY and ORDER BY clauses. (Further inspired, I also post it here in case it is useful to others.)

SELECT 
  TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') AS TS_DAY, 
  COUNT(*) 
FROM   
  TABLE X
GROUP BY
  TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD')
ORDER BY
  TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') ASC
/
不喜欢何必死缠烂打 2024-09-02 06:54:24

您可以尝试以下解决方案吗:

SELECT count (1), '1-10'  where age between 1 and 10
union all 
SELECT count (1), '11-20'  where age between 11 and 20
union all
select count (1), '21+' where age >20
from age 

Can you try the below solution:

SELECT count (1), '1-10'  where age between 1 and 10
union all 
SELECT count (1), '11-20'  where age between 11 and 20
union all
select count (1), '21+' where age >20
from age 
人生百味 2024-09-02 06:54:24

我的方法:

select range, count(1) from (
select case 
  when age < 5 then '0-4' 
  when age < 10 then '5-9' 
  when age < 15 then '10-14' 
  when age < 20 then '15-20' 
  when age < 30 then '21-30' 
  when age < 40 then '31-40' 
  when age < 50 then '41-50' 
  else                '51+' 
end 
as range from
(select round(extract(day from feedback_update_time - feedback_time), 1) as age
from txn_history
) ) group by range  
  • 我可以灵活地定义范围,
  • 我不会在 select 和 group 子句中重复范围
  • ,但请有人告诉我,如何按大小对它们进行排序!

My approach:

select range, count(1) from (
select case 
  when age < 5 then '0-4' 
  when age < 10 then '5-9' 
  when age < 15 then '10-14' 
  when age < 20 then '15-20' 
  when age < 30 then '21-30' 
  when age < 40 then '31-40' 
  when age < 50 then '41-50' 
  else                '51+' 
end 
as range from
(select round(extract(day from feedback_update_time - feedback_time), 1) as age
from txn_history
) ) group by range  
  • I have flexibility in defining the ranges
  • I do not repeat the ranges in select and group clauses
  • but some one please tell me, how to order them by magnitude!
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文