SQL OVER() 子句 - 何时以及为何有用?

发布于 2024-11-13 09:08:37 字数 599 浏览 4 评论 0原文

    USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
    ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
    ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
    ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
    ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail 
WHERE SalesOrderID IN(43659,43664);

我读到了该条款,但不明白为什么我需要它。 函数Over有什么作用? 分区依据有什么作用? 为什么我无法通过编写 Group By SalesOrderID 进行查询?

    USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
    ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
    ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
    ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
    ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail 
WHERE SalesOrderID IN(43659,43664);

I read about that clause and I don't understand why I need it.
What does the function Over do? What does Partition By do?
Why can't I make a query with writing Group By SalesOrderID?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

倾城泪 2024-11-20 09:08:37

可以使用GROUP BY SalesOrderID。不同之处在于,使用 GROUP BY 只能获得未包含在 GROUP BY 中的列的聚合值。

相反,使用窗口聚合函数而不是 GROUP BY,您可以检索聚合值和非聚合值。也就是说,虽然您没有在示例查询中执行此操作,但您可以检索相同 SalesOrderID 组中的各个 OrderQty 值及其总和、计数、平均值等。

下面是一个实际示例,说明窗口聚合为何如此出色。假设您需要计算每个值占总数的百分比。如果没有窗口聚合,您必须首先派生聚合值列表,然后将其连接回原始行集,即像这样:

SELECT
  orig.[Partition],
  orig.Value,
  orig.Value * 100.0 / agg.TotalValue AS ValuePercent
FROM OriginalRowset orig
  INNER JOIN (
    SELECT
      [Partition],
      SUM(Value) AS TotalValue
    FROM OriginalRowset
    GROUP BY [Partition]
  ) agg ON orig.[Partition] = agg.[Partition]

现在看看如何使用窗口聚合执行相同操作:

SELECT
  [Partition],
  Value,
  Value * 100.0 / SUM(Value) OVER (PARTITION BY [Partition]) AS ValuePercent
FROM OriginalRowset orig

更容易和更清晰,不是吗它?

You can use GROUP BY SalesOrderID. The difference is, with GROUP BY you can only have the aggregated values for the columns that are not included in GROUP BY.

In contrast, using windowed aggregate functions instead of GROUP BY, you can retrieve both aggregated and non-aggregated values. That is, although you are not doing that in your example query, you could retrieve both individual OrderQty values and their sums, counts, averages etc. over groups of same SalesOrderIDs.

Here's a practical example of why windowed aggregates are great. Suppose you need to calculate what percent of a total every value is. Without windowed aggregates you'd have to first derive a list of aggregated values and then join it back to the original rowset, i.e. like this:

SELECT
  orig.[Partition],
  orig.Value,
  orig.Value * 100.0 / agg.TotalValue AS ValuePercent
FROM OriginalRowset orig
  INNER JOIN (
    SELECT
      [Partition],
      SUM(Value) AS TotalValue
    FROM OriginalRowset
    GROUP BY [Partition]
  ) agg ON orig.[Partition] = agg.[Partition]

Now look how you can do the same with a windowed aggregate:

SELECT
  [Partition],
  Value,
  Value * 100.0 / SUM(Value) OVER (PARTITION BY [Partition]) AS ValuePercent
FROM OriginalRowset orig

Much easier and cleaner, isn't it?

寄风 2024-11-20 09:08:37

OVER 子句的强大之处在于,无论您是否使用 GROUP BY,您都可以在不同范围内进行聚合(“窗口化”)

示例:获取每个 的计数SalesOrderID 和所有

SELECT
    SalesOrderID, ProductID, OrderQty
    ,COUNT(OrderQty) AS 'Count'
    ,COUNT(*) OVER () AS 'CountAll'
FROM Sales.SalesOrderDetail 
WHERE
     SalesOrderID IN(43659,43664)
GROUP BY
     SalesOrderID, ProductID, OrderQty

Get 不同 COUNT 的计数,无 GROUP BY

SELECT
    SalesOrderID, ProductID, OrderQty
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'CountQtyPerOrder'
    ,COUNT(OrderQty) OVER(PARTITION BY ProductID) AS 'CountQtyPerProduct',
    ,COUNT(*) OVER () AS 'CountAllAgain'
FROM Sales.SalesOrderDetail 
WHERE
     SalesOrderID IN(43659,43664)

The OVER clause is powerful in that you can have aggregates over different ranges ("windowing"), whether you use a GROUP BY or not

Example: get count per SalesOrderID and count of all

SELECT
    SalesOrderID, ProductID, OrderQty
    ,COUNT(OrderQty) AS 'Count'
    ,COUNT(*) OVER () AS 'CountAll'
FROM Sales.SalesOrderDetail 
WHERE
     SalesOrderID IN(43659,43664)
GROUP BY
     SalesOrderID, ProductID, OrderQty

Get different COUNTs, no GROUP BY

SELECT
    SalesOrderID, ProductID, OrderQty
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'CountQtyPerOrder'
    ,COUNT(OrderQty) OVER(PARTITION BY ProductID) AS 'CountQtyPerProduct',
    ,COUNT(*) OVER () AS 'CountAllAgain'
FROM Sales.SalesOrderDetail 
WHERE
     SalesOrderID IN(43659,43664)
安稳善良 2024-11-20 09:08:37

让我用一个例子来解释,您将能够看到它是如何工作的。

假设您有下表 DIM_EQUIPMENT:

VIN         MAKE    MODEL   YEAR    COLOR
-----------------------------------------
1234ASDF    Ford    Taurus  2008    White
1234JKLM    Chevy   Truck   2005    Green
5678ASDF    Ford    Mustang 2008    Yellow

在 SQL 下运行

SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR ,
  COUNT(*) OVER (PARTITION BY YEAR) AS COUNT2
FROM DIM_EQUIPMENT

结果将如下所示

VIN         MAKE    MODEL   YEAR    COLOR     COUNT2
 ----------------------------------------------  
1234JKLM    Chevy   Truck   2005    Green     1
5678ASDF    Ford    Mustang 2008    Yellow    2
1234ASDF    Ford    Taurus  2008    White     2

看看发生了什么。

您可以在不使用 YEAR 分组和与 ROW 匹配的情况下进行计数。

另一种有趣的方法是使用WITH子句获得相同的结果,WITH作为内联视图工作,可以简化查询,特别是复杂的查询,但这里的情况并非如此,因为我只是想展示用法

 WITH EQ AS
  ( SELECT YEAR AS YEAR2, COUNT(*) AS COUNT2 FROM DIM_EQUIPMENT GROUP BY YEAR
  )
SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR,
  COUNT2
FROM DIM_EQUIPMENT,
  EQ
WHERE EQ.YEAR2=DIM_EQUIPMENT.YEAR;

Let me explain with an example and you would be able to see how it works.

Assuming you have the following table DIM_EQUIPMENT:

VIN         MAKE    MODEL   YEAR    COLOR
-----------------------------------------
1234ASDF    Ford    Taurus  2008    White
1234JKLM    Chevy   Truck   2005    Green
5678ASDF    Ford    Mustang 2008    Yellow

Run below SQL

SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR ,
  COUNT(*) OVER (PARTITION BY YEAR) AS COUNT2
FROM DIM_EQUIPMENT

The result would be as below

VIN         MAKE    MODEL   YEAR    COLOR     COUNT2
 ----------------------------------------------  
1234JKLM    Chevy   Truck   2005    Green     1
5678ASDF    Ford    Mustang 2008    Yellow    2
1234ASDF    Ford    Taurus  2008    White     2

See what happened.

You are able to count without Group By on YEAR and Match with ROW.

Another Interesting WAY to get same result if as below using WITH Clause, WITH works as in-line VIEW and can simplify the query especially complex ones, which is not the case here though since I am just trying to show usage

 WITH EQ AS
  ( SELECT YEAR AS YEAR2, COUNT(*) AS COUNT2 FROM DIM_EQUIPMENT GROUP BY YEAR
  )
SELECT VIN,
  MAKE,
  MODEL,
  YEAR,
  COLOR,
  COUNT2
FROM DIM_EQUIPMENT,
  EQ
WHERE EQ.YEAR2=DIM_EQUIPMENT.YEAR;
还不是爱你 2024-11-20 09:08:37

如果您只想按 SalesOrderID 进行 GROUP,那么您将无法在 SELECT 子句中包含 ProductID 和 OrderQty 列。

PARTITION BY 子句可让您分解聚合函数。一个明显且有用的示例是,如果您想为订单上的订单行生成行号:(

SELECT
    O.order_id,
    O.order_date,
    ROW_NUMBER() OVER(PARTITION BY O.order_id) AS line_item_no,
    OL.product_id
FROM
    Orders O
INNER JOIN Order_Lines OL ON OL.order_id = O.order_id

我的语法可能略有偏差)

然后您会得到类似以下内容的信息:

order_id    order_date    line_item_no    product_id
--------    ----------    ------------    ----------
    1       2011-05-02         1              5
    1       2011-05-02         2              4
    1       2011-05-02         3              7
    2       2011-05-12         1              8
    2       2011-05-12         2              1

If you only wanted to GROUP BY the SalesOrderID then you wouldn't be able to include the ProductID and OrderQty columns in the SELECT clause.

The PARTITION BY clause let's you break up your aggregate functions. One obvious and useful example would be if you wanted to generate line numbers for order lines on an order:

SELECT
    O.order_id,
    O.order_date,
    ROW_NUMBER() OVER(PARTITION BY O.order_id) AS line_item_no,
    OL.product_id
FROM
    Orders O
INNER JOIN Order_Lines OL ON OL.order_id = O.order_id

(My syntax might be off slightly)

You would then get back something like:

order_id    order_date    line_item_no    product_id
--------    ----------    ------------    ----------
    1       2011-05-02         1              5
    1       2011-05-02         2              4
    1       2011-05-02         3              7
    2       2011-05-12         1              8
    2       2011-05-12         2              1
北方的巷 2024-11-20 09:08:37

OVER 子句与 PARTITION BY 结合使用时,声明前面的函数调用必须通过评估查询返回的行来分析完成。将其视为内联 GROUP BY 语句。

OVER (PARTITION BY SalesOrderID) 表示对于 SUM、AVG 等函数,返回值 OVER 从查询返回的记录的子集,并 PARTITION 该子集 BY 外键 SalesOrderID 。

因此,我们将对每个唯一 SalesOrderID 的每个 OrderQty 记录进行求和,并且该列名称将称为“总计”。

这是比使用多个内联视图查找相同信息更有效的方法。您可以将此查询放在内联视图中,然后根据“总计”进行筛选。

SELECT ...,
FROM (your query) inlineview
WHERE Total < 200

The OVER clause when combined with PARTITION BY state that the preceding function call must be done analytically by evaluating the returned rows of the query. Think of it as an inline GROUP BY statement.

OVER (PARTITION BY SalesOrderID) is stating that for SUM, AVG, etc... function, return the value OVER a subset of the returned records from the query, and PARTITION that subset BY the foreign key SalesOrderID.

So we will SUM every OrderQty record for EACH UNIQUE SalesOrderID, and that column name will be called 'Total'.

It is a MUCH more efficient means than using multiple inline views to find out the same information. You can put this query within an inline view and filter on Total then.

SELECT ...,
FROM (your query) inlineview
WHERE Total < 200
慵挽 2024-11-20 09:08:37

简而言之:
Over 子句可用于选择非聚合值和聚合值。

Partition BY、内部的ORDER BY以及ROWS或RANGE是OVER() by子句的一部分。

Partition by 用于对数据进行分区,然后执行这些窗口、聚合函数,如果我们没有 Partition by 则整个结果集将被视为单个分区。

OVER 子句可与排名函数(Rank、Row_Number、Dense_Rank..)、聚合函数(如(AVG、Max、Min、SUM...等))和分析函数(如(First_Value、Last_Value 等))一起使用。

让我们看看 OVER 子句

OVER (   
       [ <PARTITION BY clause> ]  
       [ <ORDER BY clause> ]   
       [ <ROW or RANGE clause> ]  
      )  

PARTITION BY 的基本语法:
它用于对数据进行分区并对具有相同数据的组进行操作。

订购方式:
它用于定义分区中数据的逻辑顺序。当我们不指定 Partition 时,整个结果集被视为单个分区


这可用于指定执行操作时应考虑分区中的哪些行。

让我们举个例子:

这是我的数据集:

Id          Name                                               Gender     Salary
----------- -------------------------------------------------- ---------- -----------
1           Mark                                               Male       5000
2           John                                               Male       4500
3           Pavan                                              Male       5000
4           Pam                                                Female     5500
5           Sara                                               Female     4000
6           Aradhya                                            Female     3500
7           Tom                                                Male       5500
8           Mary                                               Female     5000
9           Ben                                                Male       6500
10          Jodi                                               Female     7000
11          Tom                                                Male       5500
12          Ron                                                Male       5000

因此,让我执行不同的场景,看看数据如何受到影响,我将从困难的语法变为简单的语法,

Select *,SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        32000
1           Mark                                               Male       5000        32000
8           Mary                                               Female     5000        32000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        48500
7           Tom                                                Male       5500        48500
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

只需观察 sum_sal 部分即可。在这里,我使用按薪水排序并使用“无界前行和当前行之间的范围”
在这种情况下,我们不使用分区,因此整个数据将被视为一个分区,并且我们按工资订购。
这里重要的是无界前行和当前行。这意味着当我们计算总和时,每行从起始行到当前行。
但是,如果我们看到工资为 5000 且名称=“Pavan”的行,理想情况下它应该是 17000,而对于工资=5000 和名称=Mark,它应该是 22000。但是当我们使用 RANGE 并在在这种情况下,如果它找到任何相似的元素,那么它会将它们视为相同的逻辑组并对它们执行操作并为该组中的每个项目分配值。这就是为什么我们的工资值相同= 5000。引擎上升到salary=5000和Name=Ron并计算总和,然后将其分配给所有salary=5000。

Select *,SUM(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees


   Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        17000
1           Mark                                               Male       5000        22000
8           Mary                                               Female     5000        27000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        37500
7           Tom                                                Male       5500        43000
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

因此,对于ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,区别在于相同值的项目而不是将它们分组在一起,它计算从起始行到当前行的SUM,并且不会以不同的方式对待具有相同值的项目,例如RANGE

Select *,SUM(salary) Over(order by salary) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        32000
1           Mark                                               Male       5000        32000
8           Mary                                               Female     5000        32000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        48500
7           Tom                                                Male       5500        48500
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

这些结果与 That 相同

Select *, SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees

,因为 Over(order by salaries) 只是 Over(order by salaries RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT行)
因此,只要我们简单地指定 Order by 而没有 ROWS 或 RANGE,它就会将 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 作为默认值。

注意:这仅适用于实际接受 RANGE/ROW 的函数。例如,ROW_NUMBER 和其他少数不接受 RANGE/ROW,在这种情况下,这不会出现在图中。

到目前为止,我们看到带有 order by 的 Over 子句采用 Range/ROWS,语法看起来像这样 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
它实际上是从第一行开始计算到当前行。但是,如果它想要计算整个数据分区的值并为每一列(即从第一行到最后一行)计算值,该怎么办?这是该查询,

Select *,sum(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1           Mark                                               Male       5000        62000
2           John                                               Male       4500        62000
3           Pavan                                              Male       5000        62000
4           Pam                                                Female     5500        62000
5           Sara                                               Female     4000        62000
6           Aradhya                                            Female     3500        62000
7           Tom                                                Male       5500        62000
8           Mary                                               Female     5000        62000
9           Ben                                                Male       6500        62000
10          Jodi                                               Female     7000        62000
11          Tom                                                Male       5500        62000
12          Ron                                                Male       5000        62000

而不是 CURRENT ROW,我指定 UNBOUNDED FOLLOWING ,它指示引擎计算每行的分区的最后一个记录。

现在来谈谈什么是带有空括号的 OVER() 吗?

这只是 Over(order by salaries ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) 的捷径

这里我们间接指定将所有结果集视为单个分区,然后从每个分区的第一条记录到最后一条记录执行计算。

Select *,Sum(salary) Over() as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1           Mark                                               Male       5000        62000
2           John                                               Male       4500        62000
3           Pavan                                              Male       5000        62000
4           Pam                                                Female     5500        62000
5           Sara                                               Female     4000        62000
6           Aradhya                                            Female     3500        62000
7           Tom                                                Male       5500        62000
8           Mary                                               Female     5000        62000
9           Ben                                                Male       6500        62000
10          Jodi                                               Female     7000        62000
11          Tom                                                Male       5500        62000
12          Ron                                                Male       5000        62000

我确实为此制作了一个视频,如果您有兴趣可以访问它。
https://www.youtube.com/watch?v=CvVenuVUqto&t=1177s谢谢


帕万·库马尔·阿里亚索马亚朱鲁
HTTP://xyzcoder.github.io

So in simple words:
Over clause can be used to select non aggregated values along with Aggregated ones.

Partition BY, ORDER BY inside, and ROWS or RANGE are part of OVER() by clause.

partition by is used to partition data and then perform these window, aggregated functions, and if we don't have partition by the then entire result set is considered as a single partition.

OVER clause can be used with Ranking Functions(Rank, Row_Number, Dense_Rank..), Aggregate Functions like (AVG, Max, Min, SUM...etc) and Analytics Functions like (First_Value, Last_Value, and few others).

Let's See basic syntax of OVER clause

OVER (   
       [ <PARTITION BY clause> ]  
       [ <ORDER BY clause> ]   
       [ <ROW or RANGE clause> ]  
      )  

PARTITION BY:
It is used to partition data and perform operations on groups with the same data.

ORDER BY:
It is used to define the logical order of data in Partitions. When we don't specify Partition, entire resultset is considered as a single partition

:
This can be used to specify what rows are supposed to be considered in a partition when performing the operation.

Let's take an example:

Here is my dataset:

Id          Name                                               Gender     Salary
----------- -------------------------------------------------- ---------- -----------
1           Mark                                               Male       5000
2           John                                               Male       4500
3           Pavan                                              Male       5000
4           Pam                                                Female     5500
5           Sara                                               Female     4000
6           Aradhya                                            Female     3500
7           Tom                                                Male       5500
8           Mary                                               Female     5000
9           Ben                                                Male       6500
10          Jodi                                               Female     7000
11          Tom                                                Male       5500
12          Ron                                                Male       5000

So let me execute different scenarios and see how data is impacted and I'll come from difficult syntax to simple one

Select *,SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        32000
1           Mark                                               Male       5000        32000
8           Mary                                               Female     5000        32000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        48500
7           Tom                                                Male       5500        48500
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

Just observe the sum_sal part. Here I am using order by Salary and using "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW".
In this case, we are not using partition so entire data will be treated as one partition and we are ordering on salary.
And the important thing here is UNBOUNDED PRECEDING AND CURRENT ROW. This means when we are calculating the sum, from starting row to the current row for each row.
But if we see rows with salary 5000 and name="Pavan", ideally it should be 17000 and for salary=5000 and name=Mark, it should be 22000. But as we are using RANGE and in this case, if it finds any similar elements then it considers them as the same logical group and performs an operation on them and assigns value to each item in that group. That is the reason why we have the same value for salary=5000. The engine went up to salary=5000 and Name=Ron and calculated sum and then assigned it to all salary=5000.

Select *,SUM(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees


   Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        17000
1           Mark                                               Male       5000        22000
8           Mary                                               Female     5000        27000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        37500
7           Tom                                                Male       5500        43000
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

So with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW The difference is for same value items instead of grouping them together, It calculates SUM from starting row to current row and it doesn't treat items with same value differently like RANGE

Select *,SUM(salary) Over(order by salary) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
6           Aradhya                                            Female     3500        3500
5           Sara                                               Female     4000        7500
2           John                                               Male       4500        12000
3           Pavan                                              Male       5000        32000
1           Mark                                               Male       5000        32000
8           Mary                                               Female     5000        32000
12          Ron                                                Male       5000        32000
11          Tom                                                Male       5500        48500
7           Tom                                                Male       5500        48500
4           Pam                                                Female     5500        48500
9           Ben                                                Male       6500        55000
10          Jodi                                               Female     7000        62000

These results are the same as

Select *, SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees

That is because Over(order by salary) is just a short cut of Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
So wherever we simply specify Order by without ROWS or RANGE it is taking RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW as default.

Note: This is applicable only to Functions that actually accept RANGE/ROW. For example, ROW_NUMBER and few others don't accept RANGE/ROW and in that case, this doesn't come into the picture.

Till now we saw that Over clause with an order by is taking Range/ROWS and syntax looks something like this RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
And it is actually calculating up to the current row from the first row. But what If it wants to calculate values for the entire partition of data and have it for each column (that is from 1st row to last row). Here is the query for that

Select *,sum(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1           Mark                                               Male       5000        62000
2           John                                               Male       4500        62000
3           Pavan                                              Male       5000        62000
4           Pam                                                Female     5500        62000
5           Sara                                               Female     4000        62000
6           Aradhya                                            Female     3500        62000
7           Tom                                                Male       5500        62000
8           Mary                                               Female     5000        62000
9           Ben                                                Male       6500        62000
10          Jodi                                               Female     7000        62000
11          Tom                                                Male       5500        62000
12          Ron                                                Male       5000        62000

Instead of CURRENT ROW, I am specifying UNBOUNDED FOLLOWING which instructs the engine to calculate till the last record of partition for each row.

Now coming to your point on what is OVER() with empty braces?

It is just a short cut for Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

Here we are indirectly specifying to treat all my resultset as a single partition and then perform calculations from the first record to the last record of each partition.

Select *,Sum(salary) Over() as sum_sal from employees

Id          Name                                               Gender     Salary      sum_sal
----------- -------------------------------------------------- ---------- ----------- -----------
1           Mark                                               Male       5000        62000
2           John                                               Male       4500        62000
3           Pavan                                              Male       5000        62000
4           Pam                                                Female     5500        62000
5           Sara                                               Female     4000        62000
6           Aradhya                                            Female     3500        62000
7           Tom                                                Male       5500        62000
8           Mary                                               Female     5000        62000
9           Ben                                                Male       6500        62000
10          Jodi                                               Female     7000        62000
11          Tom                                                Male       5500        62000
12          Ron                                                Male       5000        62000

I did create a video on this and if you are interested you can visit it.
https://www.youtube.com/watch?v=CvVenuVUqto&t=1177s

Thanks,
Pavan Kumar Aryasomayajulu
HTTP://xyzcoder.github.io

jJeQQOZ5 2024-11-20 09:08:37
  • 也称为查询请求子句。
  • 类似于Group By子句

    • 将数据分解为块(或分区)
    • 按分区边界分隔
    • 函数在分区内执行
    • 跨越分离边界时重新初始化

语法:
函数 (...) OVER (PARTITION BY col1 col3,...)

  • 函数

    • 熟悉的函数,例如 COUNT()SUM()MIN()MAX()等等
    • 还有新函数(例如 ROW_NUMBER()RATION_TO_REOIRT() 等)

更多信息示例: http://msdn.microsoft.com/en-us/库/ms189461.aspx

  • Also Called Query Petition Clause.
  • Similar to the Group By Clause

    • break up data into chunks (or partitions)
    • separate by partition bounds
    • function performs within partitions
    • re-initialised when crossing parting boundary

Syntax:
function (...) OVER (PARTITION BY col1 col3,...)

  • Functions

    • Familiar functions such as COUNT(), SUM(), MIN(), MAX(), etc
    • New Functions as well (eg ROW_NUMBER(), RATION_TO_REOIRT(), etc.)

More info with example : http://msdn.microsoft.com/en-us/library/ms189461.aspx

这样的小城市 2024-11-20 09:08:37
prkey   whatsthat               cash   
890    "abb                "   32  32
43     "abbz               "   2   34
4      "bttu               "   1   35
45     "gasstuff           "   2   37
545    "gasz               "   5   42
80009  "hoo                "   9   51
2321   "ibm                "   1   52
998    "krk                "   2   54
42     "kx-5010            "   2   56
32     "lto                "   4   60
543    "mp                 "   5   65
465    "multipower         "   2   67
455    "O.N.               "   1   68
7887   "prem               "   7   75
434    "puma               "   3   78
23     "retractble         "   3   81
242    "Trujillo's stuff   "   4   85

这是查询的结果。用作源的表是相同的,只是它没有最后一列。此列是第三列的移动总和。

查询:(

SELECT prkey,whatsthat,cash,SUM(cash) over (order by whatsthat)
    FROM public.iuk order by whatsthat,prkey
    ;

表为 public.iuk)

sql version:  2012

有点超过 dbase(1986) 的水平,我不知道为什么需要 25 年以上才能完成。

prkey   whatsthat               cash   
890    "abb                "   32  32
43     "abbz               "   2   34
4      "bttu               "   1   35
45     "gasstuff           "   2   37
545    "gasz               "   5   42
80009  "hoo                "   9   51
2321   "ibm                "   1   52
998    "krk                "   2   54
42     "kx-5010            "   2   56
32     "lto                "   4   60
543    "mp                 "   5   65
465    "multipower         "   2   67
455    "O.N.               "   1   68
7887   "prem               "   7   75
434    "puma               "   3   78
23     "retractble         "   3   81
242    "Trujillo's stuff   "   4   85

That's a result of query. Table used as source is the same exept that it has no last column. This column is a moving sum of third one.

Query:

SELECT prkey,whatsthat,cash,SUM(cash) over (order by whatsthat)
    FROM public.iuk order by whatsthat,prkey
    ;

(table goes as public.iuk)

sql version:  2012

It's a little over dbase(1986) level, I don't know why 25+ years has been needed to finish it up.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文