Oracle:如何“分组依据”超过一个范围?
如果我有一个这样的表:
pkey age
---- ---
1 8
2 5
3 12
4 12
5 22
我可以“分组”来获取每个年龄的计数。
select age,count(*) n from tbl group by age;
age n
--- -
5 1
8 1
12 2
22 1
我可以使用什么查询来按年龄范围进行分组?
age n
----- -
1-10 2
11-20 2
20+ 1
我使用 10gR2,但我也对任何特定于 11g 的方法感兴趣。
If I have a table like this:
pkey age
---- ---
1 8
2 5
3 12
4 12
5 22
I can "group by" to get a count of each age.
select age,count(*) n from tbl group by age;
age n
--- -
5 1
8 1
12 2
22 1
What query can I use to group by age ranges?
age n
----- -
1-10 2
11-20 2
20+ 1
I'm on 10gR2, but I'd be interested in any 11g-specific approaches as well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
尝试:
Try:
您正在寻找的基本上是直方图的数据。
x 轴上有年龄(或年龄范围),y 轴上有计数 n(或频率)。
在最简单的形式中,人们可以简单地计算每个不同年龄值的数量,就像您已经描述的那样:
但是,当 x 轴有太多不同的值时,人们可能想要创建组(或簇或桶)。在您的例子中,您按照 10 的恒定范围进行分组。
我们可以避免为每个范围编写
WHEN ... THEN
行 - 如果与年龄无关,则可能有数百个。相反,由于 @NitinMidha 提到的原因,@MatthewFlaschen 的方法更可取。现在让我们构建 SQL...
首先,我们需要将年龄分为 10 个范围组,如下所示:
这可以通过将年龄列除以 10 然后计算来实现结果的 FLOOR:
“FLOOR 返回等于或小于 n 的最大整数”
http://docs.oracle.com/cd/E11882_01/server.112/ e26088/functions067.htm#SQLRF00643
然后我们采用原始 SQL 并将 age 替换为该表达式:
这没问题,但我们还看不到范围。相反,我们只能看到计算出的下限值,即
0, 1, 2 ... n
。为了获得实际的下限,我们需要再次将其乘以 10,这样我们就得到了
0, 10, 20 ... n
:我们还需要每个范围的上限,即下限 bound + 10 - 1 或
最后,我们将两者连接成一个字符串,如下所示:
这会创建
'0-9'、'10-19'、'20-29'
等。我们的 SQL 看起来像这样:
最后,应用一个顺序和漂亮的列别名:
但是,在更复杂的场景中,这些范围可能不会被分组为大小为 10 的常量块,而是需要动态集群。
Oracle 包含更高级的直方图函数,请参阅 http://docs.oracle.com /cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366
感谢@MatthewFlaschen 的方法;我只解释了细节。
What you are looking for, is basically the data for a histogram.
You would have the age (or age-range) on the x-axis and the count n (or frequency) on the y-axis.
In the simplest form, one could simply count the number of each distinct age value like you already described:
When there are too many different values for the x-axis however, one may want to create groups (or clusters or buckets). In your case, you group by a constant range of 10.
We can avoid writing a
WHEN ... THEN
line for each range - there could be hundreds if it were not about age. Instead, the approach by @MatthewFlaschen is preferable for the reasons mentioned by @NitinMidha.Now let's build the SQL...
First, we need to split the ages into range-groups of 10 like so:
This can be achieved by dividing the age column by 10 and then calculating the result's FLOOR:
"FLOOR returns the largest integer equal to or less than n"
http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643
Then we take the original SQL and replace age with that expression:
This is OK, but we cannot see the range, yet. Instead we only see the calculated floor values which are
0, 1, 2 ... n
.To get the actual lower bound, we need to multiply it with 10 again so we get
0, 10, 20 ... n
:We also need the upper bound of each range which is lower bound + 10 - 1 or
Finally, we concatenate both into a string like this:
This creates
'0-9', '10-19', '20-29'
etc.Now our SQL looks like this:
Finally, apply an order and nice column aliases:
However, in more complex scenarios, these ranges might not be grouped into constant chunks of size 10, but need dynamical clustering.
Oracle has more advanced histogram functions included, see http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366
Credits to @MatthewFlaschen for his approach; I only explained the details.
这是一个在子查询中创建“范围”表,然后使用它对主表中的数据进行分区的解决方案:
Here is a solution which creates a "range" table in a sub-query and then uses this to partition the data from the main table:
我必须按照一小时内出现的交易数量对数据进行分组。我通过从时间戳中提取小时来做到这一点:
给出输出:
如您所见,这提供了一种对每小时记录数进行分组的简单方法。
I had to group data by how many transactions appeared in an hour. I did this by extracting the hour from the timestamp:
Giving output:
As you can see this gives a nice easy way of grouping the number of records per hour.
将age_range表和age_range_id字段添加到您的表中,并按其分组。
// 原谅 DDL,但你应该明白这个想法
// 再次原谅 DML,但你应该明白这个想法
如果你愿意,你可以完善这个想法 - 在age_range 表中添加 from_age to_age 列等 - 但我会把它留给你。
希望这有帮助:)
add an age_range table and an age_range_id field to your table and group by that instead.
// excuse the DDL but you should get the idea
// again excuse the DML but you should get the idea
You can refine this idea if you like - add from_age to_age columns in the age_range table etc - but i'll leave that to you.
hope this helps :)
如果使用 Oracle 9i+,您可能能够使用
NTILE
分析函数:NTILE 的警告是您只能指定分区数量,不是断点本身。因此,您需要指定一个合适的数字。 IE:对于 100 行,
NTILE(4)
将为四个存储桶/分区中的每一个分配 25 行。您不能嵌套分析函数,因此您必须使用子查询/子查询分解来对它们进行分层以获得所需的粒度。否则,使用:If using Oracle 9i+, you might be able to use the
NTILE
analytic function:The caveat to NTILE is that you can only specify the number of partitions, not the break points themselves. So you need to specify a number that is appropriate. IE: With 100 rows,
NTILE(4)
will allot 25 rows to each of the four buckets/partitions. You can not nest analytic functions, so you'd have to layer them using subqueries/subquery factoring to get desired granularity. Otherwise, use:我必须每天清点样本数量。受@Clarkey的启发,我使用 TO_CHAR 将样本日期从时间戳提取为 ISO-8601 日期格式,并在 GROUP BY 和 ORDER BY 子句中使用它。 (进一步的启发,我也将其发布在这里,以防对其他人有用。)
I had to get a count of samples by day. Inspired by @Clarkey I used TO_CHAR to extract the date of sample from the timestamp to an ISO-8601 date format and used that in the GROUP BY and ORDER BY clauses. (Further inspired, I also post it here in case it is useful to others.)
您可以尝试以下解决方案吗:
Can you try the below solution:
我的方法:
My approach: