Oracle“分区依据” 关键词

发布于 2024-07-13 15:55:23 字数 251 浏览 13 评论 0原文

有人可以解释一下 partition by 关键字的作用并给出一个简单的操作示例,以及为什么要使用它? 我有一个别人写的 SQL 查询,我想弄清楚它的作用。

分区依据的示例:

SELECT empno, deptno, COUNT(*) 
OVER (PARTITION BY deptno) DEPT_COUNT
FROM emp

我在网上看到的示例似乎有点太深入了。

Can someone please explain what the partition by keyword does and give a simple example of it in action, as well as why one would want to use it? I have a SQL query written by someone else and I'm trying to figure out what it does.

An example of partition by:

SELECT empno, deptno, COUNT(*) 
OVER (PARTITION BY deptno) DEPT_COUNT
FROM emp

The examples I've seen online seem a bit too in-depth.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

掀纱窥君容 2024-07-20 15:55:23

PARTITION BY 子句设置将用于 OVER 子句中每个“GROUP”的记录范围。

在您的示例 SQL 中,DEPT_COUNT 将为每个员工记录返回该部门内的员工人数。 (就好像您对 emp 表进行去规范化;您仍然返回 emp 表中的每条记录。)

emp_no  dept_no  DEPT_COUNT
1       10       3
2       10       3
3       10       3 <- three because there are three "dept_no = 10" records
4       20       2
5       20       2 <- two because there are two "dept_no = 20" records

如果还有另一列(例如,>州),那么您可以计算该州有多少个部门。

这就像获取GROUP BYSUMAVG等)的结果而无需聚合结果集(即删除匹配记录) )。

例如,当您使用 LAST OVERMIN OVER 函数获取部门中的最低和最高工资,然后将其用于计算时,此功能非常有用记录工资没有子选择,这要快得多。

阅读链接的 AskTom 文章 了解更多详情。

The PARTITION BY clause sets the range of records that will be used for each "GROUP" within the OVER clause.

In your example SQL, DEPT_COUNT will return the number of employees within that department for every employee record. (It is as if you're de-nomalising the emp table; you still return every record in the emp table.)

emp_no  dept_no  DEPT_COUNT
1       10       3
2       10       3
3       10       3 <- three because there are three "dept_no = 10" records
4       20       2
5       20       2 <- two because there are two "dept_no = 20" records

If there was another column (e.g., state) then you could count how many departments in that State.

It is like getting the results of a GROUP BY (SUM, AVG, etc.) without the aggregating the result set (i.e. removing matching records).

It is useful when you use the LAST OVER or MIN OVER functions to get, for example, the lowest and highest salary in the department and then use that in a calculation against this records salary without a sub select, which is much faster.

Read the linked AskTom article for further details.

冰之心 2024-07-20 15:55:23

这个概念已经被接受的答案很好地解释了,但我发现看到的例子越多,理解得越好。这是一个增量示例:

  1. 老板说“让我知道我们拥有的物品数量”库存按品牌分组”

你说:“没问题”

SELECT 
      BRAND
      ,COUNT(ITEM_ID) 
FROM 
      ITEMS
GROUP BY 
      BRAND;

结果:

+--------------+---------------+
|  Brand       |   Count       | 
+--------------+---------------+
| H&M          |     50        |
+--------------+---------------+
| Hugo Boss    |     100       |
+--------------+---------------+
| No brand     |     22        |
+--------------+---------------+
  1. 老板说“现在给我一份所有商品的清单,包括它们的品牌和商品数量各自的品牌有”

您可以尝试:

 SELECT 
      ITEM_NR
      ,BRAND
      ,COUNT(ITEM_ID) 
 FROM 
      ITEMS
 GROUP BY 
      BRAND;

但是您得到:

ORA-00979: not a GROUP BY expression 

这就是 OVER (PARTITION BY BRAND) 出现的地方:

 SELECT 
      ITEM_NR
      ,BRAND
      ,COUNT(ITEM_ID) OVER (PARTITION BY BRAND) 
 FROM 
      ITEMS;

这意味着:

  • COUNT(ITEM_ID) - 获取号码项目
  • OVER - 在行集
  • (按品牌划分) - 具有相同品牌的

结果是:

+--------------+---------------+----------+
|  Items       |  Brand        | Count()  |
+--------------+---------------+----------+
|  Item 1      |  Hugo Boss    |   100    | 
+--------------+---------------+----------+
|  Item 2      |  Hugo Boss    |   100    | 
+--------------+---------------+----------+
|  Item 3      |  No brand     |   22     | 
+--------------+---------------+----------+
|  Item 4      |  No brand     |   22     | 
+--------------+---------------+----------+
|  Item 5      |  H&M          |   50     | 
+--------------+---------------+----------+

等等...

The concept is very well explained by the accepted answer, but I find that the more example one sees, the better it sinks in. Here's an incremental example:

  1. Boss says "get me number of items we have in stock grouped by brand"

You say: "no problem"

SELECT 
      BRAND
      ,COUNT(ITEM_ID) 
FROM 
      ITEMS
GROUP BY 
      BRAND;

Result:

+--------------+---------------+
|  Brand       |   Count       | 
+--------------+---------------+
| H&M          |     50        |
+--------------+---------------+
| Hugo Boss    |     100       |
+--------------+---------------+
| No brand     |     22        |
+--------------+---------------+
  1. The boss says "Now get me a list of all items, with their brand AND number of items that the respective brand has"

You may try:

 SELECT 
      ITEM_NR
      ,BRAND
      ,COUNT(ITEM_ID) 
 FROM 
      ITEMS
 GROUP BY 
      BRAND;

But you get:

ORA-00979: not a GROUP BY expression 

This is where the OVER (PARTITION BY BRAND) comes in:

 SELECT 
      ITEM_NR
      ,BRAND
      ,COUNT(ITEM_ID) OVER (PARTITION BY BRAND) 
 FROM 
      ITEMS;

Which means:

  • COUNT(ITEM_ID) - get the number of items
  • OVER - Over the set of rows
  • (PARTITION BY BRAND) - that have the same brand

And the result is:

+--------------+---------------+----------+
|  Items       |  Brand        | Count()  |
+--------------+---------------+----------+
|  Item 1      |  Hugo Boss    |   100    | 
+--------------+---------------+----------+
|  Item 2      |  Hugo Boss    |   100    | 
+--------------+---------------+----------+
|  Item 3      |  No brand     |   22     | 
+--------------+---------------+----------+
|  Item 4      |  No brand     |   22     | 
+--------------+---------------+----------+
|  Item 5      |  H&M          |   50     | 
+--------------+---------------+----------+

etc...

罪#恶を代价 2024-07-20 15:55:23

它是称为分析的 SQL 扩展。 select语句中的“over”告诉oracle该函数是一个分析函数,而不是一个group by函数。 使用分析的优点是,您只需一次传递数据即可收集总和、计数等,而不是使用子选择或更糟糕的 PL/SQL 循环数据。

乍一看确实令人困惑,但这很快就会成为第二天性。 没有人能比汤姆·凯特更好地解释这一点。 所以上面的链接很棒。

当然,阅读文档是必须的。

It is the SQL extension called analytics. The "over" in the select statement tells oracle that the function is a analytical function, not a group by function. The advantage to using analytics is that you can collect sums, counts, and a lot more with just one pass through of the data instead of looping through the data with sub selects or worse, PL/SQL.

It does look confusing at first but this will be second nature quickly. No one explains it better then Tom Kyte. So the link above is great.

Of course, reading the documentation is a must.

云裳 2024-07-20 15:55:23
EMPNO     DEPTNO DEPT_COUNT

 7839         10          4
 5555         10          4
 7934         10          4
 7782         10          4 --- 4 records in table for dept 10
 7902         20          4
 7566         20          4
 7876         20          4
 7369         20          4 --- 4 records in table for dept 20
 7900         30          6
 7844         30          6
 7654         30          6
 7521         30          6
 7499         30          6
 7698         30          6 --- 6 records in table for dept 30

在这里我们得到了相应部门编号的计数。
至于 deptno 10,我们在表 emp 中有 4 条记录,deptno 20 和 30 也有类似的结果。

EMPNO     DEPTNO DEPT_COUNT

 7839         10          4
 5555         10          4
 7934         10          4
 7782         10          4 --- 4 records in table for dept 10
 7902         20          4
 7566         20          4
 7876         20          4
 7369         20          4 --- 4 records in table for dept 20
 7900         30          6
 7844         30          6
 7654         30          6
 7521         30          6
 7499         30          6
 7698         30          6 --- 6 records in table for dept 30

Here we are getting count for respective deptno.
As for deptno 10 we have 4 records in table emp similar results for deptno 20 and 30 also.

十秒萌定你 2024-07-20 15:55:23

over partition 关键字就好像我们按 client_id 对数据进行分区
创建每个客户端 id 的子集

select client_id, operation_date,
       row_number() count(*) over (partition by client_id order by client_id ) as operationctrbyclient
from client_operations e
order by e.client_id;

此查询将返回 client_id 完成的操作数

the over partition keyword is as if we are partitioning the data by client_id
creation a subset of each client id

select client_id, operation_date,
       row_number() count(*) over (partition by client_id order by client_id ) as operationctrbyclient
from client_operations e
order by e.client_id;

this query will return the number of operations done by the client_id

暖风昔人 2024-07-20 15:55:23

我认为,这个例子暗示了分区如何工作以及分组如何工作的细微差别。 我的例子是来自Oracle 12,如果我的例子恰好是一个编译bug。

我尝试过:

SELECT t.data_key
,      SUM ( CASE when t.state = 'A' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_a_rows
,      SUM ( CASE when t.state = 'B' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_b_rows
,      SUM ( CASE when t.state = 'C' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_c_rows
,      COUNT (1) total_rows
from mytable t
group by t.data_key  ---- This does not compile as the compiler feels that t.state isn't in the group by and doesn't recognize the aggregation I'm looking for

但这按预期工作:

SELECT distinct t.data_key
,      SUM ( CASE when t.state = 'A' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_a_rows
,      SUM ( CASE when t.state = 'B' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_b_rows
,      SUM ( CASE when t.state = 'C' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_c_rows
,      COUNT (1) total_rows
from mytable t;

根据外部键“data_key”生成每个状态下的元素数量。 因此,如果 data_key = 'APPLE' 有 3 行状态为 'A'、2 行状态为 'B'、1 行状态为 'C',则 'APPLE' 的相应行将为 'APPLE', 3, 2 , 1, 6.

I think, this example suggests a small nuance on how the partitioning works and how group by works. My example is from Oracle 12, if my example happens to be a compiling bug.

I tried :

SELECT t.data_key
,      SUM ( CASE when t.state = 'A' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_a_rows
,      SUM ( CASE when t.state = 'B' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_b_rows
,      SUM ( CASE when t.state = 'C' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_c_rows
,      COUNT (1) total_rows
from mytable t
group by t.data_key  ---- This does not compile as the compiler feels that t.state isn't in the group by and doesn't recognize the aggregation I'm looking for

This however works as expected :

SELECT distinct t.data_key
,      SUM ( CASE when t.state = 'A' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_a_rows
,      SUM ( CASE when t.state = 'B' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_b_rows
,      SUM ( CASE when t.state = 'C' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_c_rows
,      COUNT (1) total_rows
from mytable t;

Producing the number of elements in each state based on the external key "data_key". So, if, data_key = 'APPLE' had 3 rows with state 'A', 2 rows with state 'B', a row with state 'C', the corresponding row for 'APPLE' would be 'APPLE', 3, 2, 1, 6.

一向肩并 2024-07-20 15:55:23

您可以将分析函数视为添加派生查询和联接之类的分析函数。

SELECT e.empno, e.deptno, A.DEPT_COUNT
FROM emp e
INNER JOIN (
    SELECT deptno, COUNT(*) as DEPT_COUNT
    FROM emp
    GROUP BY deptno
) A
ON e.deptno = A.deptno

在 SQLServer 支持窗口函数(分析函数)之前,我的代码中经常包含这些内容。

请注意,它是 2 个查询和一个联接,而不是选择子查询,后者将为 emp 表中的每一行执行子查询。 根据返回的行数,其中一个比另一个更有效。 对于大型行集,“联合派生分析”方法速度更快。

如果“OVER PARTITION BY”使用相同的机制,则同样适用。

You can think of the analytic functions like adding a derived query and joining.

SELECT e.empno, e.deptno, A.DEPT_COUNT
FROM emp e
INNER JOIN (
    SELECT deptno, COUNT(*) as DEPT_COUNT
    FROM emp
    GROUP BY deptno
) A
ON e.deptno = A.deptno

I used to have these all over my code until windowing functions (Analytic functions) were supported in SQLServer.

Notice that it is 2 queries, and one join, as opposed to a select subquery which would execute the subquery for every row in the emp table. Depending on how many rows are returned, one is more efficient than the other. For large rowsets, the "Joined Derived Analytics" method is faster.

If "OVER PARTITION BY" uses the same mechanism, the same applies.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文