Oracle“分区依据” 关键词
有人可以解释一下 partition by
关键字的作用并给出一个简单的操作示例,以及为什么要使用它? 我有一个别人写的 SQL 查询,我想弄清楚它的作用。
分区依据的示例:
SELECT empno, deptno, COUNT(*)
OVER (PARTITION BY deptno) DEPT_COUNT
FROM emp
我在网上看到的示例似乎有点太深入了。
Can someone please explain what the partition by
keyword does and give a simple example of it in action, as well as why one would want to use it? I have a SQL query written by someone else and I'm trying to figure out what it does.
An example of partition by:
SELECT empno, deptno, COUNT(*)
OVER (PARTITION BY deptno) DEPT_COUNT
FROM emp
The examples I've seen online seem a bit too in-depth.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
PARTITION BY 子句设置将用于 OVER 子句中每个“GROUP”的记录范围。
在您的示例 SQL 中,
DEPT_COUNT
将为每个员工记录返回该部门内的员工人数。 (就好像您对emp
表进行去规范化;您仍然返回emp
表中的每条记录。)如果还有另一列(例如,
>州
),那么您可以计算该州有多少个部门。这就像获取
GROUP BY
(SUM
、AVG
等)的结果而无需聚合结果集(即删除匹配记录) )。例如,当您使用
LAST OVER
或MIN OVER
函数获取部门中的最低和最高工资,然后将其用于计算时,此功能非常有用记录工资没有子选择,这要快得多。阅读链接的 AskTom 文章 了解更多详情。
The
PARTITION BY
clause sets the range of records that will be used for each "GROUP" within theOVER
clause.In your example SQL,
DEPT_COUNT
will return the number of employees within that department for every employee record. (It is as if you're de-nomalising theemp
table; you still return every record in theemp
table.)If there was another column (e.g.,
state
) then you could count how many departments in that State.It is like getting the results of a
GROUP BY
(SUM
,AVG
, etc.) without the aggregating the result set (i.e. removing matching records).It is useful when you use the
LAST OVER
orMIN OVER
functions to get, for example, the lowest and highest salary in the department and then use that in a calculation against this records salary without a sub select, which is much faster.Read the linked AskTom article for further details.
这个概念已经被接受的答案很好地解释了,但我发现看到的例子越多,理解得越好。这是一个增量示例:
你说:“没问题”
结果:
您可以尝试:
但是您得到:
这就是
OVER (PARTITION BY BRAND)
出现的地方:这意味着:
COUNT(ITEM_ID)
- 获取号码项目OVER
- 在行集(按品牌划分)
- 具有相同品牌的结果是:
等等...
The concept is very well explained by the accepted answer, but I find that the more example one sees, the better it sinks in. Here's an incremental example:
You say: "no problem"
Result:
You may try:
But you get:
This is where the
OVER (PARTITION BY BRAND)
comes in:Which means:
COUNT(ITEM_ID)
- get the number of itemsOVER
- Over the set of rows(PARTITION BY BRAND)
- that have the same brandAnd the result is:
etc...
它是称为分析的 SQL 扩展。 select语句中的“over”告诉oracle该函数是一个分析函数,而不是一个group by函数。 使用分析的优点是,您只需一次传递数据即可收集总和、计数等,而不是使用子选择或更糟糕的 PL/SQL 循环数据。
乍一看确实令人困惑,但这很快就会成为第二天性。 没有人能比汤姆·凯特更好地解释这一点。 所以上面的链接很棒。
当然,阅读文档是必须的。
It is the SQL extension called analytics. The "over" in the select statement tells oracle that the function is a analytical function, not a group by function. The advantage to using analytics is that you can collect sums, counts, and a lot more with just one pass through of the data instead of looping through the data with sub selects or worse, PL/SQL.
It does look confusing at first but this will be second nature quickly. No one explains it better then Tom Kyte. So the link above is great.
Of course, reading the documentation is a must.
在这里我们得到了相应部门编号的计数。
至于 deptno 10,我们在表 emp 中有 4 条记录,deptno 20 和 30 也有类似的结果。
Here we are getting count for respective deptno.
As for deptno 10 we have 4 records in table emp similar results for deptno 20 and 30 also.
over partition 关键字就好像我们按 client_id 对数据进行分区
创建每个客户端 id 的子集
此查询将返回 client_id 完成的操作数
the over partition keyword is as if we are partitioning the data by client_id
creation a subset of each client id
this query will return the number of operations done by the client_id
我认为,这个例子暗示了分区如何工作以及分组如何工作的细微差别。 我的例子是来自Oracle 12,如果我的例子恰好是一个编译bug。
我尝试过:
但这按预期工作:
根据外部键“data_key”生成每个状态下的元素数量。 因此,如果 data_key = 'APPLE' 有 3 行状态为 'A'、2 行状态为 'B'、1 行状态为 'C',则 'APPLE' 的相应行将为 'APPLE', 3, 2 , 1, 6.
I think, this example suggests a small nuance on how the partitioning works and how group by works. My example is from Oracle 12, if my example happens to be a compiling bug.
I tried :
This however works as expected :
Producing the number of elements in each state based on the external key "data_key". So, if, data_key = 'APPLE' had 3 rows with state 'A', 2 rows with state 'B', a row with state 'C', the corresponding row for 'APPLE' would be 'APPLE', 3, 2, 1, 6.
您可以将分析函数视为添加派生查询和联接之类的分析函数。
在 SQLServer 支持窗口函数(分析函数)之前,我的代码中经常包含这些内容。
请注意,它是 2 个查询和一个联接,而不是选择子查询,后者将为 emp 表中的每一行执行子查询。 根据返回的行数,其中一个比另一个更有效。 对于大型行集,“联合派生分析”方法速度更快。
如果“OVER PARTITION BY”使用相同的机制,则同样适用。
You can think of the analytic functions like adding a derived query and joining.
I used to have these all over my code until windowing functions (Analytic functions) were supported in SQLServer.
Notice that it is 2 queries, and one join, as opposed to a select subquery which would execute the subquery for every row in the emp table. Depending on how many rows are returned, one is more efficient than the other. For large rowsets, the "Joined Derived Analytics" method is faster.
If "OVER PARTITION BY" uses the same mechanism, the same applies.