计算 SQL 公用表表达式中列的中位数

发布于 2024-09-09 19:25:40 字数 801 浏览 9 评论 0 原文

在 MSSQL2008 中,我尝试使用经典中值查询从公共表表达式计算一列数字的中值,如下所示:

WITH cte AS
(
   SELECT number
   FROM table
) 

SELECT cte.*,
(SELECT 
  (SELECT (   
    (SELECT TOP 1 cte.number  
     FROM     
     (SELECT TOP 50 PERCENT cte.number     
      FROM cte
      ORDER BY cte.number) AS medianSubquery1   
    ORDER BY cte.number DESC)  
    +   
  (SELECT TOP 1 cte.number
   FROM     
    (SELECT TOP 50 PERCENT cte.number    
     FROM cte   
     ORDER BY cte.number DESC) AS medianSubquery2   
   ORDER BY cte.number ASC) ) / 2)) AS median

FROM cte
ORDER BY cte.number

我得到的结果集如下:

NUMBER    MEDIAN
x1        x1
x1        x1
x1        x1
x2        x2
x3        x3

换句话说,“中值”列是相同的当我期望中位数列一直向下为“x1”时,作为“数字”列。我使用类似的表达式来计算模式,并且它在相同的公用表表达式上运行良好。

In MSSQL2008, I am trying to compute the median of a column of numbers from a common table expression using the classic median query as follows:

WITH cte AS
(
   SELECT number
   FROM table
) 

SELECT cte.*,
(SELECT 
  (SELECT (   
    (SELECT TOP 1 cte.number  
     FROM     
     (SELECT TOP 50 PERCENT cte.number     
      FROM cte
      ORDER BY cte.number) AS medianSubquery1   
    ORDER BY cte.number DESC)  
    +   
  (SELECT TOP 1 cte.number
   FROM     
    (SELECT TOP 50 PERCENT cte.number    
     FROM cte   
     ORDER BY cte.number DESC) AS medianSubquery2   
   ORDER BY cte.number ASC) ) / 2)) AS median

FROM cte
ORDER BY cte.number

The result set that I get is the following:

NUMBER    MEDIAN
x1        x1
x1        x1
x1        x1
x2        x2
x3        x3

In other words, the "median" column is the same as the "number" column when I would expect the median column to be "x1" all the way down. I use a similar expression to compute the mode and it works fine over the same common table expression.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

冬天旳寂寞 2024-09-16 19:25:41

这是一种稍微不同的方法:

WITH cte AS
(
   SELECT number
   FROM table1
)
SELECT T1.number, T3.median
FROM cte T1, 
(
    SELECT AVG(number) AS median
    FROM
    (
        SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
        FROM cte
    ) T2
    WHERE T2.rn = ((SELECT COUNT(*) FROM table1) + 1) / 2
    OR T2.rn = ((SELECT COUNT(*) FROM table1) + 2) / 2
) T3

Here's a slightly different way to do it:

WITH cte AS
(
   SELECT number
   FROM table1
)
SELECT T1.number, T3.median
FROM cte T1, 
(
    SELECT AVG(number) AS median
    FROM
    (
        SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
        FROM cte
    ) T2
    WHERE T2.rn = ((SELECT COUNT(*) FROM table1) + 1) / 2
    OR T2.rn = ((SELECT COUNT(*) FROM table1) + 2) / 2
) T3
伤痕我心 2024-09-16 19:25:41

您的查询的问题是您正在执行

SELECT TOP 1 cte.number FROM... ,

但它与子查询不相关,它与外部查询相关,因此子查询不相关。这就解释了为什么你最终会得到相同的值。删除cte.(如下)得出 CTE 的中值。这是一个恒定值。你想做什么?

WITH cte AS
    ( SELECT NUMBER
    FROM master.dbo.spt_values
    WHERE TYPE='p'
    )

SELECT cte.*,
(SELECT 
  (SELECT (   
    (SELECT TOP 1 number  
     FROM     
     (SELECT TOP 50 PERCENT cte.number     
      FROM cte
      ORDER BY cte.number) AS medianSubquery1   
    ORDER BY number DESC)  
    +   
  (SELECT TOP 1 number
   FROM     
    (SELECT TOP 50 PERCENT cte.number    
     FROM cte   
     ORDER BY cte.number DESC) AS medianSubquery2   
   ORDER BY number ASC) ) / 2)) AS median
FROM cte
ORDER BY cte.number

退货

NUMBER      median
----------- -----------
0           1023
1           1023
2           1023
3           1023
4           1023
5           1023
6           1023
7           1023

The problem with your query is that you are doing

SELECT TOP 1 cte.number FROM...

but it isn't correlated with the sub query it is correlated with the Outer query so the subquery is irrelevant. Which explains why you simply end up with the same value all the way down. Removing the cte. (as below) gives the median of the CTE. Which is a constant value. What are you trying to do?

WITH cte AS
    ( SELECT NUMBER
    FROM master.dbo.spt_values
    WHERE TYPE='p'
    )

SELECT cte.*,
(SELECT 
  (SELECT (   
    (SELECT TOP 1 number  
     FROM     
     (SELECT TOP 50 PERCENT cte.number     
      FROM cte
      ORDER BY cte.number) AS medianSubquery1   
    ORDER BY number DESC)  
    +   
  (SELECT TOP 1 number
   FROM     
    (SELECT TOP 50 PERCENT cte.number    
     FROM cte   
     ORDER BY cte.number DESC) AS medianSubquery2   
   ORDER BY number ASC) ) / 2)) AS median
FROM cte
ORDER BY cte.number

Returns

NUMBER      median
----------- -----------
0           1023
1           1023
2           1023
3           1023
4           1023
5           1023
6           1023
7           1023
泛泛之交 2024-09-16 19:25:41

这并不是一个全新的答案,因为它主要扩展了 Mark Byer 的答案,但有几个选项可以进一步简化查询。

首先是真正利用 CTE。您不仅可以有多个 CTE,而且它们可以互相引用。考虑到这一点,我们可以创建一个额外的 CTE 来根据第一个结果计算中值。这封装了中值计算,并让实际的 SELECT 只做它需要做的事情。请注意,ROW_NUMBER() 必须移至第一个 CTE。

;WITH cte AS
(
   SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
   FROM table1
),
med AS
(
    SELECT AVG(number) AS median
    FROM cte
    WHERE cte.rn = ((SELECT COUNT(*) FROM cte) + 1) / 2
    OR cte.rn = ((SELECT COUNT(*) FROM cte) + 2) / 2
)
SELECT cte.number, med.median
FROM cte
CROSS JOIN med

为了进一步降低复杂性,您“可以”使用自定义 CLR 聚合来处理中位数(例如 http://www.SQLsharp.com/ [我是其作者])。

;WITH cte AS
(
   SELECT number
   FROM table1
),
med AS
(
    SELECT  SQL#.Agg_Median(cte.number) AS median
    FROM    cte
)
SELECT cte.number, med.median
FROM cte
CROSS JOIN med

This is not an entirely new answer as it mostly expands on Mark Byer's answer, but there are a couple of options for simplifying the query even further.

The first thing is to really make use of CTE's. Not only can you have multiple CTE's, but they can refer to each other. With this in mind, we can create an additional CTE to compute the median based on the results of the first. This encapsulates the median computation and leaves the actual SELECT to do only what it needs to do. Note that the ROW_NUMBER() had to be moved into the first CTE.

;WITH cte AS
(
   SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
   FROM table1
),
med AS
(
    SELECT AVG(number) AS median
    FROM cte
    WHERE cte.rn = ((SELECT COUNT(*) FROM cte) + 1) / 2
    OR cte.rn = ((SELECT COUNT(*) FROM cte) + 2) / 2
)
SELECT cte.number, med.median
FROM cte
CROSS JOIN med

And to further reduce complexity, you "could" use a custom CLR Aggregate to handle the Median (such as the one provided in the free SQL# library at http://www.SQLsharp.com/ [which I am the author of]).

;WITH cte AS
(
   SELECT number
   FROM table1
),
med AS
(
    SELECT  SQL#.Agg_Median(cte.number) AS median
    FROM    cte
)
SELECT cte.number, med.median
FROM cte
CROSS JOIN med
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文