SQL OVER() 子句 - 何时以及为何有用?
USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN(43659,43664);
我读到了该条款,但不明白为什么我需要它。 函数Over
有什么作用? 分区依据
有什么作用? 为什么我无法通过编写 Group By SalesOrderID
进行查询?
USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN(43659,43664);
I read about that clause and I don't understand why I need it.
What does the function Over
do? What does Partition By
do?
Why can't I make a query with writing Group By SalesOrderID
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您可以使用
GROUP BY SalesOrderID
。不同之处在于,使用 GROUP BY 只能获得未包含在 GROUP BY 中的列的聚合值。相反,使用窗口聚合函数而不是 GROUP BY,您可以检索聚合值和非聚合值。也就是说,虽然您没有在示例查询中执行此操作,但您可以检索相同
SalesOrderID
组中的各个OrderQty
值及其总和、计数、平均值等。下面是一个实际示例,说明窗口聚合为何如此出色。假设您需要计算每个值占总数的百分比。如果没有窗口聚合,您必须首先派生聚合值列表,然后将其连接回原始行集,即像这样:
现在看看如何使用窗口聚合执行相同操作:
更容易和更清晰,不是吗它?
You can use
GROUP BY SalesOrderID
. The difference is, with GROUP BY you can only have the aggregated values for the columns that are not included in GROUP BY.In contrast, using windowed aggregate functions instead of GROUP BY, you can retrieve both aggregated and non-aggregated values. That is, although you are not doing that in your example query, you could retrieve both individual
OrderQty
values and their sums, counts, averages etc. over groups of sameSalesOrderID
s.Here's a practical example of why windowed aggregates are great. Suppose you need to calculate what percent of a total every value is. Without windowed aggregates you'd have to first derive a list of aggregated values and then join it back to the original rowset, i.e. like this:
Now look how you can do the same with a windowed aggregate:
Much easier and cleaner, isn't it?
OVER
子句的强大之处在于,无论您是否使用GROUP BY
,您都可以在不同范围内进行聚合(“窗口化”)示例:获取每个
的计数SalesOrderID
和所有Get 不同
COUNT
的计数,无GROUP BY
The
OVER
clause is powerful in that you can have aggregates over different ranges ("windowing"), whether you use aGROUP BY
or notExample: get count per
SalesOrderID
and count of allGet different
COUNT
s, noGROUP BY
让我用一个例子来解释,您将能够看到它是如何工作的。
假设您有下表 DIM_EQUIPMENT:
在 SQL 下运行
结果将如下所示
看看发生了什么。
您可以在不使用 YEAR 分组和与 ROW 匹配的情况下进行计数。
另一种有趣的方法是使用WITH子句获得相同的结果,WITH作为内联视图工作,可以简化查询,特别是复杂的查询,但这里的情况并非如此,因为我只是想展示用法
Let me explain with an example and you would be able to see how it works.
Assuming you have the following table DIM_EQUIPMENT:
Run below SQL
The result would be as below
See what happened.
You are able to count without Group By on YEAR and Match with ROW.
Another Interesting WAY to get same result if as below using WITH Clause, WITH works as in-line VIEW and can simplify the query especially complex ones, which is not the case here though since I am just trying to show usage
如果您只想按 SalesOrderID 进行 GROUP,那么您将无法在 SELECT 子句中包含 ProductID 和 OrderQty 列。
PARTITION BY 子句可让您分解聚合函数。一个明显且有用的示例是,如果您想为订单上的订单行生成行号:(
我的语法可能略有偏差)
然后您会得到类似以下内容的信息:
If you only wanted to GROUP BY the SalesOrderID then you wouldn't be able to include the ProductID and OrderQty columns in the SELECT clause.
The PARTITION BY clause let's you break up your aggregate functions. One obvious and useful example would be if you wanted to generate line numbers for order lines on an order:
(My syntax might be off slightly)
You would then get back something like:
OVER 子句与 PARTITION BY 结合使用时,声明前面的函数调用必须通过评估查询返回的行来分析完成。将其视为内联 GROUP BY 语句。
OVER (PARTITION BY SalesOrderID) 表示对于 SUM、AVG 等函数,返回值 OVER 从查询返回的记录的子集,并 PARTITION 该子集 BY 外键 SalesOrderID 。
因此,我们将对每个唯一 SalesOrderID 的每个 OrderQty 记录进行求和,并且该列名称将称为“总计”。
这是比使用多个内联视图查找相同信息更有效的方法。您可以将此查询放在内联视图中,然后根据“总计”进行筛选。
The OVER clause when combined with PARTITION BY state that the preceding function call must be done analytically by evaluating the returned rows of the query. Think of it as an inline GROUP BY statement.
OVER (PARTITION BY SalesOrderID)
is stating that for SUM, AVG, etc... function, return the value OVER a subset of the returned records from the query, and PARTITION that subset BY the foreign key SalesOrderID.So we will SUM every OrderQty record for EACH UNIQUE SalesOrderID, and that column name will be called 'Total'.
It is a MUCH more efficient means than using multiple inline views to find out the same information. You can put this query within an inline view and filter on Total then.
简而言之:
Over 子句可用于选择非聚合值和聚合值。
Partition BY、内部的ORDER BY以及ROWS或RANGE是OVER() by子句的一部分。
Partition by 用于对数据进行分区,然后执行这些窗口、聚合函数,如果我们没有 Partition by 则整个结果集将被视为单个分区。
OVER 子句可与排名函数(Rank、Row_Number、Dense_Rank..)、聚合函数(如(AVG、Max、Min、SUM...等))和分析函数(如(First_Value、Last_Value 等))一起使用。
让我们看看 OVER 子句
PARTITION BY 的基本语法:
它用于对数据进行分区并对具有相同数据的组进行操作。
订购方式:
它用于定义分区中数据的逻辑顺序。当我们不指定 Partition 时,整个结果集被视为单个分区
:
这可用于指定执行操作时应考虑分区中的哪些行。
让我们举个例子:
这是我的数据集:
因此,让我执行不同的场景,看看数据如何受到影响,我将从困难的语法变为简单的语法,
只需观察 sum_sal 部分即可。在这里,我使用按薪水排序并使用“无界前行和当前行之间的范围”。
在这种情况下,我们不使用分区,因此整个数据将被视为一个分区,并且我们按工资订购。
这里重要的是无界前行和当前行。这意味着当我们计算总和时,每行从起始行到当前行。
但是,如果我们看到工资为 5000 且名称=“Pavan”的行,理想情况下它应该是 17000,而对于工资=5000 和名称=Mark,它应该是 22000。但是当我们使用 RANGE 并在在这种情况下,如果它找到任何相似的元素,那么它会将它们视为相同的逻辑组并对它们执行操作并为该组中的每个项目分配值。这就是为什么我们的工资值相同= 5000。引擎上升到salary=5000和Name=Ron并计算总和,然后将其分配给所有salary=5000。
因此,对于ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,区别在于相同值的项目而不是将它们分组在一起,它计算从起始行到当前行的SUM,并且不会以不同的方式对待具有相同值的项目,例如RANGE
这些结果与 That 相同
,因为 Over(order by salaries) 只是 Over(order by salaries RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT行)
因此,只要我们简单地指定 Order by 而没有 ROWS 或 RANGE,它就会将 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 作为默认值。
注意:这仅适用于实际接受 RANGE/ROW 的函数。例如,ROW_NUMBER 和其他少数不接受 RANGE/ROW,在这种情况下,这不会出现在图中。
到目前为止,我们看到带有 order by 的 Over 子句采用 Range/ROWS,语法看起来像这样 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
它实际上是从第一行开始计算到当前行。但是,如果它想要计算整个数据分区的值并为每一列(即从第一行到最后一行)计算值,该怎么办?这是该查询,
而不是 CURRENT ROW,我指定 UNBOUNDED FOLLOWING ,它指示引擎计算每行的分区的最后一个记录。
现在来谈谈什么是带有空括号的 OVER() 吗?
这只是 Over(order by salaries ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) 的捷径
这里我们间接指定将所有结果集视为单个分区,然后从每个分区的第一条记录到最后一条记录执行计算。
我确实为此制作了一个视频,如果您有兴趣可以访问它。
https://www.youtube.com/watch?v=CvVenuVUqto&t=1177s谢谢
,
帕万·库马尔·阿里亚索马亚朱鲁
HTTP://xyzcoder.github.io
So in simple words:
Over clause can be used to select non aggregated values along with Aggregated ones.
Partition BY, ORDER BY inside, and ROWS or RANGE are part of OVER() by clause.
partition by is used to partition data and then perform these window, aggregated functions, and if we don't have partition by the then entire result set is considered as a single partition.
OVER clause can be used with Ranking Functions(Rank, Row_Number, Dense_Rank..), Aggregate Functions like (AVG, Max, Min, SUM...etc) and Analytics Functions like (First_Value, Last_Value, and few others).
Let's See basic syntax of OVER clause
PARTITION BY:
It is used to partition data and perform operations on groups with the same data.
ORDER BY:
It is used to define the logical order of data in Partitions. When we don't specify Partition, entire resultset is considered as a single partition
:
This can be used to specify what rows are supposed to be considered in a partition when performing the operation.
Let's take an example:
Here is my dataset:
So let me execute different scenarios and see how data is impacted and I'll come from difficult syntax to simple one
Just observe the sum_sal part. Here I am using order by Salary and using "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW".
In this case, we are not using partition so entire data will be treated as one partition and we are ordering on salary.
And the important thing here is UNBOUNDED PRECEDING AND CURRENT ROW. This means when we are calculating the sum, from starting row to the current row for each row.
But if we see rows with salary 5000 and name="Pavan", ideally it should be 17000 and for salary=5000 and name=Mark, it should be 22000. But as we are using RANGE and in this case, if it finds any similar elements then it considers them as the same logical group and performs an operation on them and assigns value to each item in that group. That is the reason why we have the same value for salary=5000. The engine went up to salary=5000 and Name=Ron and calculated sum and then assigned it to all salary=5000.
So with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW The difference is for same value items instead of grouping them together, It calculates SUM from starting row to current row and it doesn't treat items with same value differently like RANGE
These results are the same as
That is because Over(order by salary) is just a short cut of Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
So wherever we simply specify Order by without ROWS or RANGE it is taking RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW as default.
Note: This is applicable only to Functions that actually accept RANGE/ROW. For example, ROW_NUMBER and few others don't accept RANGE/ROW and in that case, this doesn't come into the picture.
Till now we saw that Over clause with an order by is taking Range/ROWS and syntax looks something like this RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
And it is actually calculating up to the current row from the first row. But what If it wants to calculate values for the entire partition of data and have it for each column (that is from 1st row to last row). Here is the query for that
Instead of CURRENT ROW, I am specifying UNBOUNDED FOLLOWING which instructs the engine to calculate till the last record of partition for each row.
Now coming to your point on what is OVER() with empty braces?
It is just a short cut for Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
Here we are indirectly specifying to treat all my resultset as a single partition and then perform calculations from the first record to the last record of each partition.
I did create a video on this and if you are interested you can visit it.
https://www.youtube.com/watch?v=CvVenuVUqto&t=1177s
Thanks,
Pavan Kumar Aryasomayajulu
HTTP://xyzcoder.github.io
查询请求
子句。类似于
Group By
子句语法:
函数 (...) OVER (PARTITION BY col1 col3,...)
函数
COUNT()
、SUM()
、MIN()
、MAX()
等等ROW_NUMBER()
、RATION_TO_REOIRT()
等)更多信息示例: http://msdn.microsoft.com/en-us/库/ms189461.aspx
Query Petition
Clause.Similar to the
Group By
ClauseSyntax:
function (...) OVER (PARTITION BY col1 col3,...)
Functions
COUNT()
,SUM()
,MIN()
,MAX()
, etcROW_NUMBER()
,RATION_TO_REOIRT()
, etc.)More info with example : http://msdn.microsoft.com/en-us/library/ms189461.aspx
这是查询的结果。用作源的表是相同的,只是它没有最后一列。此列是第三列的移动总和。
查询:(
表为 public.iuk)
有点超过 dbase(1986) 的水平,我不知道为什么需要 25 年以上才能完成。
That's a result of query. Table used as source is the same exept that it has no last column. This column is a moving sum of third one.
Query:
(table goes as public.iuk)
It's a little over dbase(1986) level, I don't know why 25+ years has been needed to finish it up.