用MySQL计算中位数的简单方法

发布于 2024-08-01 15:52:12 字数 407 浏览 11 评论 0原文

使用 MySQL 计算中位数的最简单(希望不会太慢)的方法是什么? 我使用 AVG(x) 来查找平均值,但我很难找到计算中位数的简单方法。 现在,我将所有行返回给 PHP,进行排序,然后选择中间行,但肯定有一些简单的方法可以在单个 MySQL 查询中完成此操作。

示例数据:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

val 排序得到 2 2 3 4 7 8 9,因此中位数应为 4,而 SELECT AVG( val) 其中 == 5

What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x) for finding the mean, but I'm having a hard time finding a simple way of calculating the median. For now, I'm returning all the rows to PHP, doing a sort, and then picking the middle row, but surely there must be some simple way of doing it in a single MySQL query.

Example data:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

Sorting on val gives 2 2 3 4 7 8 9, so the median should be 4, versus SELECT AVG(val) which == 5.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(30

如歌彻婉言 2024-08-08 15:52:12

我在 HackerRank 上找到了下面的代码,它非常简单并且适用于每种情况。

SELECT M.MEDIAN_COL FROM MEDIAN_TABLE M WHERE  
  (SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL < M.MEDIAN_COL ) = 
  (SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL > M.MEDIAN_COL );

I have this below code which I found on HackerRank and it is pretty simple and works in each and every case.

SELECT M.MEDIAN_COL FROM MEDIAN_TABLE M WHERE  
  (SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL < M.MEDIAN_COL ) = 
  (SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL > M.MEDIAN_COL );
笑,眼淚并存 2024-08-08 15:52:12

根据维可牢尼龙搭扣的答案,对于那些必须对由另一个参数分组的东西进行中值计算的人:

SELECT grp_field, t1.val FROM (
   SELECT grp_field, @rownum:=IF(@s = grp_field, @rownum + 1, 0) AS row_number,
   @s:=IF(@s = grp_field, @s, grp_field) AS sec, d.val
  FROM data d,  (SELECT @rownum:=0, @s:=0) r
  ORDER BY grp_field, d.val
) as t1 JOIN (
  SELECT grp_field, count(*) as total_rows
  FROM data d
  GROUP BY grp_field
) as t2
ON t1.grp_field = t2.grp_field
WHERE t1.row_number=floor(total_rows/2)+1;

Building off of velcro's answer, for those of you having to do a median off of something that is grouped by another parameter:


SELECT grp_field, t1.val FROM (
SELECT grp_field, @rownum:=IF(@s = grp_field, @rownum + 1, 0) AS row_number,
@s:=IF(@s = grp_field, @s, grp_field) AS sec, d.val
FROM data d, (SELECT @rownum:=0, @s:=0) r
ORDER BY grp_field, d.val
) as t1 JOIN (
SELECT grp_field, count(*) as total_rows
FROM data d
GROUP BY grp_field
) as t2
ON t1.grp_field = t2.grp_field
WHERE t1.row_number=floor(total_rows/2)+1;

烟燃烟灭 2024-08-08 15:52:12

您可以使用此处找到的用户定义函数。

You could use the user-defined function that's found here.

心碎无痕… 2024-08-08 15:52:12

通常,我们可能不仅需要计算整个表的中位数,还需要计算与 ID 相关的聚合的中位数。 换句话说,计算表中每个 ID 的中位数,其中每个 ID 有许多记录。 (良好的性能并且适用于许多 SQL + 修复了偶数和奇数问题,更多关于不同中位数方法的性能 https://sqlperformance.com/2012/08/t-sql-queries/median

SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val, 
  COUNT(*) OVER (PARTITION BY our_id) AS cnt,
  ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rn
  FROM our_table
) AS x
WHERE rn IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;

希望有帮助

Often, we may need to calculate Median not just for the whole table, but for aggregates with respect to our ID. In other words, calculate median for each ID in our table, where each ID has many records. (good performance and works in many SQL + fixes problem of even and odds, more about performance of different Median-methods https://sqlperformance.com/2012/08/t-sql-queries/median )

SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val, 
  COUNT(*) OVER (PARTITION BY our_id) AS cnt,
  ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rn
  FROM our_table
) AS x
WHERE rn IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;

Hope it helps

牵你的手,一向走下去 2024-08-08 15:52:12

关注奇数计数 - 在这种情况下给出中间两个值的平均值。

SELECT AVG(val) FROM
  ( SELECT x.id, x.val from data x, data y
      GROUP BY x.id, x.val
      HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
  ) sq

Takes care about an odd value count - gives the avg of the two values in the middle in that case.

SELECT AVG(val) FROM
  ( SELECT x.id, x.val from data x, data y
      GROUP BY x.id, x.val
      HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
  ) sq
℉絮湮 2024-08-08 15:52:12

我的代码,无需表格或附加变量即可高效:

SELECT
((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
+
(SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
as median
FROM table;

My code, efficient without tables or additional variables:

SELECT
((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
+
(SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
as median
FROM table;
无所谓啦 2024-08-08 15:52:12

单个查询来归档完美中位数:

SELECT 
COUNT(*) as total_rows, 
IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median, 
AVG(val) as average 
FROM 
data

Single query to archive the perfect median:

SELECT 
COUNT(*) as total_rows, 
IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median, 
AVG(val) as average 
FROM 
data
娜些时光,永不杰束 2024-08-08 15:52:12

或者,您也可以在存储过程中执行此操作:

DROP PROCEDURE IF EXISTS median;
DELIMITER //
CREATE PROCEDURE median (table_name VARCHAR(255), column_name VARCHAR(255), where_clause VARCHAR(255))
BEGIN
  -- Set default parameters
  IF where_clause IS NULL OR where_clause = '' THEN
    SET where_clause = 1;
  END IF;

  -- Prepare statement
  SET @sql = CONCAT(
    "SELECT AVG(middle_values) AS 'median' FROM (
      SELECT t1.", column_name, " AS 'middle_values' FROM
        (
          SELECT @row:=@row+1 as `row`, x.", column_name, "
          FROM ", table_name," AS x, (SELECT @row:=0) AS r
          WHERE ", where_clause, " ORDER BY x.", column_name, "
        ) AS t1,
        (
          SELECT COUNT(*) as 'count'
          FROM ", table_name, " x
          WHERE ", where_clause, "
        ) AS t2
        -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
        WHERE t1.row >= t2.count/2
          AND t1.row <= ((t2.count/2)+1)) AS t3
    ");

  -- Execute statement
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
END//
DELIMITER ;


-- Sample usage:
-- median(table_name, column_name, where_condition);
CALL median('products', 'price', NULL);

Optionally, you could also do this in a stored procedure:

DROP PROCEDURE IF EXISTS median;
DELIMITER //
CREATE PROCEDURE median (table_name VARCHAR(255), column_name VARCHAR(255), where_clause VARCHAR(255))
BEGIN
  -- Set default parameters
  IF where_clause IS NULL OR where_clause = '' THEN
    SET where_clause = 1;
  END IF;

  -- Prepare statement
  SET @sql = CONCAT(
    "SELECT AVG(middle_values) AS 'median' FROM (
      SELECT t1.", column_name, " AS 'middle_values' FROM
        (
          SELECT @row:=@row+1 as `row`, x.", column_name, "
          FROM ", table_name," AS x, (SELECT @row:=0) AS r
          WHERE ", where_clause, " ORDER BY x.", column_name, "
        ) AS t1,
        (
          SELECT COUNT(*) as 'count'
          FROM ", table_name, " x
          WHERE ", where_clause, "
        ) AS t2
        -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
        WHERE t1.row >= t2.count/2
          AND t1.row <= ((t2.count/2)+1)) AS t3
    ");

  -- Execute statement
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
END//
DELIMITER ;


-- Sample usage:
-- median(table_name, column_name, where_condition);
CALL median('products', 'price', NULL);
滥情空心 2024-08-08 15:52:12

下面介绍的我的解决方案仅适用于一个查询,无需创建表、变量甚至子查询。
另外,它允许您在分组查询中获得每个组的中位数(这就是我所需要的!):

SELECT `columnA`, 
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(`columnB` ORDER BY `columnB`), ',', CEILING((COUNT(`columnB`)/2))), ',', -1) medianOfColumnB
FROM `tableC`
-- some where clause if you want
GROUP BY `columnA`;

它的工作原理是因为巧妙地使用了 group_concat 和 substring_index 。

但是,要允许使用大的 group_concat,您必须将 group_concat_max_len 设置为更高的值(默认情况下为 1024 个字符)。
您可以这样设置(对于当前的sql会话):

SET SESSION group_concat_max_len = 10000; 
-- up to 4294967295 in 32-bits platform.

group_concat_max_len的更多信息:

My solution presented below works in just one query without creation of table, variable or even sub-query.
Plus, it allows you to get median for each group in group-by queries (this is what i needed !):

SELECT `columnA`, 
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(`columnB` ORDER BY `columnB`), ',', CEILING((COUNT(`columnB`)/2))), ',', -1) medianOfColumnB
FROM `tableC`
-- some where clause if you want
GROUP BY `columnA`;

It works because of a smart use of group_concat and substring_index.

But, to allow big group_concat, you have to set group_concat_max_len to a higher value (1024 char by default).
You can set it like that (for current sql session) :

SET SESSION group_concat_max_len = 10000; 
-- up to 4294967295 in 32-bits platform.

More infos for group_concat_max_len: https://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_group_concat_max_len

弄潮 2024-08-08 15:52:12

Velcrow 答案的另一个重复,但使用单个中间表并利用用于行编号的变量来获取计数,而不是执行额外的查询来计算它。 还开始计数,以便第一行是第 0 行,以便简单地使用 Floor 和 Ceil 来选择中间行。

SELECT Avg(tmp.val) as median_val
    FROM (SELECT inTab.val, @rows := @rows + 1 as rowNum
              FROM data as inTab,  (SELECT @rows := -1) as init
              -- Replace with better where clause or delete
              WHERE 2 > 1
              ORDER BY inTab.val) as tmp
    WHERE tmp.rowNum in (Floor(@rows / 2), Ceil(@rows / 2));

Another riff on Velcrow's answer, but uses a single intermediate table and takes advantage of the variable used for row numbering to get the count, rather than performing an extra query to calculate it. Also starts the count so that the first row is row 0 to allow simply using Floor and Ceil to select the median row(s).

SELECT Avg(tmp.val) as median_val
    FROM (SELECT inTab.val, @rows := @rows + 1 as rowNum
              FROM data as inTab,  (SELECT @rows := -1) as init
              -- Replace with better where clause or delete
              WHERE 2 > 1
              ORDER BY inTab.val) as tmp
    WHERE tmp.rowNum in (Floor(@rows / 2), Ceil(@rows / 2));
甜尕妞 2024-08-08 15:52:12

知道确切的行数,您可以使用此查询:

SELECT <value> AS VAL FROM <table> ORDER BY VAL LIMIT 1 OFFSET <half>

Where ; = 天花板(<尺寸> / 2.0) - 1

Knowing exact row count you can use this query:

SELECT <value> AS VAL FROM <table> ORDER BY VAL LIMIT 1 OFFSET <half>

Where <half> = ceiling(<size> / 2.0) - 1

情未る 2024-08-08 15:52:12
SELECT 
    SUBSTRING_INDEX(
        SUBSTRING_INDEX(
            GROUP_CONCAT(field ORDER BY field),
            ',',
            ((
                ROUND(
                    LENGTH(GROUP_CONCAT(field)) - 
                    LENGTH(
                        REPLACE(
                            GROUP_CONCAT(field),
                            ',',
                            ''
                        )
                    )
                ) / 2) + 1
            )),
            ',',
            -1
        )
FROM
    table

以上似乎对我有用。

SELECT 
    SUBSTRING_INDEX(
        SUBSTRING_INDEX(
            GROUP_CONCAT(field ORDER BY field),
            ',',
            ((
                ROUND(
                    LENGTH(GROUP_CONCAT(field)) - 
                    LENGTH(
                        REPLACE(
                            GROUP_CONCAT(field),
                            ',',
                            ''
                        )
                    )
                ) / 2) + 1
            )),
            ',',
            -1
        )
FROM
    table

The above seems to work for me.

意犹 2024-08-08 15:52:12

ORACLE 的简单解决方案:

SELECT ROUND(MEDIAN(Lat_N), 4) FROM Station;

MySQL 的易于理解的解决方案:

select case MOD(count(lat_n),2) 
when 1 then (select round(S.LAT_N,4) from station S where (select count(Lat_N) from station where Lat_N < S.LAT_N ) = (select count(Lat_N) from station where Lat_N > S.LAT_N))
else (select round(AVG(S.LAT_N),4) from station S where 1 = (select count(Lat_N) from station where Lat_N < S.LAT_N ) - (select count(Lat_N) from station where Lat_N > S.LAT_N))
end from station;

说明

STATION 是表名。 LAT_N 是具有数值的列名

假设站点表中有101 条记录(奇数)。 这意味着如果表按升序或降序排序,则中位数为第 51 条记录。

在上面对 S 表的每个 S.LAT_N 的查询中,我创建了两个表。 一个用于小于 S.LAT_N 的 LAT_N 值的数量,另一个用于大于 S.LAT_N 的 LAT_N 值的数量。 稍后我会比较这两个表,如果它们匹配,那么我将选择 S.LAT_N 值。 当我检查第 51 条记录时,有 50 个值小于第 51 条记录,有 50 条记录大于第 51 条记录。 如您所见,两个表中都有 50 条记录。 这就是我们的答案。 对于每个其他记录,在创建的两个表中都有不同数量的记录用于比较。 因此,只有第 51 条记录满足条件。

现在假设车站表中有100条记录(偶数)。 这意味着如果表按升序或降序排序,则中位数是第 50 条和第 51 条记录的平均值。

与奇怪的逻辑相同,我正在创建两个表。 一个用于小于 S.LAT_N 的 LAT_N 值的数量,另一个用于大于 S.LAT_N 的 LAT_N 值的数量。 后来我比较这两个表,如果它们的差值等于 1,那么我选择 S.LAT_N 值并找到平均值。 当我检查第 50 条记录时,有 49 个值小于第 50 条记录,有 51 个记录大于第 50 条记录。 如您所见,两个表中存在 1 条记录的差异。 所以这个(第 50 条记录)是我们的第一个平均值记录。 同样,当我检查第 51 条记录时,有 50 个值小于第 51 条记录,有 49 个记录大于第 51 条记录。 如您所见,两个表中存在 1 条记录的差异。 所以这个(第 51 条记录)是我们的第二条平均记录。 对于每个其他记录,在创建的两个表中都有不同数量的记录用于比较。 因此,只有第 50 条和第 51 条记录满足条件。

Simple Solution For ORACLE:

SELECT ROUND(MEDIAN(Lat_N), 4) FROM Station;

Easy Solution to Understand For MySQL:

select case MOD(count(lat_n),2) 
when 1 then (select round(S.LAT_N,4) from station S where (select count(Lat_N) from station where Lat_N < S.LAT_N ) = (select count(Lat_N) from station where Lat_N > S.LAT_N))
else (select round(AVG(S.LAT_N),4) from station S where 1 = (select count(Lat_N) from station where Lat_N < S.LAT_N ) - (select count(Lat_N) from station where Lat_N > S.LAT_N))
end from station;

Explanation

STATION is table name. LAT_N is the column name having numeric value

Suppose there are 101 records(odd number) in station table. This means that the median is 51st record if the tabled sorted either asc or desc.

In above query for every S.LAT_N of S table I am creating two tables. One for number of LAT_N values less than S.LAT_N and another for number of LAT_N values greater than S.LAT_N. Later I am comparing these two tables and if they are matched then I am selecting that S.LAT_N value. When I check for 51st records there are 50 values less than 51st record and there 50 records greater than 51st record. As you see, there are 50 records in both tables. So this is our answer. For every other record there are different number of records in two tables created for comparison. So, only 51st record meets the condition.

Now suppose there are 100 records(even number) in station table. This means that the median is average of 50th and 51st records if the tabled sorted either asc or desc.

Same as odd logic I am creating two tables. One for number of LAT_N values less than S.LAT_N and another for number of LAT_N values greater than S.LAT_N. Later I am comparing these two tables and if their difference is equal to 1 then I am selecting that S.LAT_N value and find the average. When I check for 50th records there are 49 values less than 50th record and there 51 records greater than 50th record. As you see, there is difference of 1 record in both tables. So this(50th record) is our 1st record for average. Similarly, When I check for 51st records there are 50 values less than 51st record and there 49 records greater than 51st record. As you see, there is difference of 1 record in both tables. So this(51st record) is our 2nd record for average. For every other record there are different number of records in two tables created for comparison. So, only 50th and 51st records meet the condition.

薄荷港 2024-08-08 15:52:12

我在 MySQL 中使用下表作为解决方案:

CREATE TABLE transactions (
  transaction_id int , user_id int , merchant_name varchar(255), transaction_date date , amount int
);

INSERT INTO transactions (transaction_id, user_id, merchant_name, transaction_date, amount)  
VALUES (1, 1 ,'abc', '2015-08-17', 100),(2, 2, 'ced', '2015-2-17', 100),(3, 1, 'def', '2015-2-16', 121),
(4, 1 ,'ced', '2015-3-17', 110),(5, 1, 'ced', '2015-3-17', 150),(6, 2 ,'abc', '2015-4-17', 130), 
(7, 3 ,'ced', '2015-12-17', 10),(8, 3 ,'abc', '2015-8-17', 100),(9, 2 ,'abc', '2015-12-17', 140),(10, 1,'abc', '2015-9-17', 100),
(11, 1 ,'abc', '2015-08-17', 121),(12, 2 ,'ced', '2015-12-23', 130),(13, 1 ,'def', '2015-12-23', 13),(3, 4, 'abc', '2015-2-16', 120),(3, 4, 'def', '2015-2-16', 121),(3, 4, 'ced', '2015-2-16', 121);

计算“金额”列的中位数:

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS TotatRecords,
    ROW_NUMBER() OVER (ORDER BY amount) AS RowNum
FROM transactions
)
SELECT Avg(amount)
FROM Numbered
WHERE RowNum IN ( FLOOR((TotatRecords+1)/2), FLOOR((TotatRecords+2)/2) )
;

TotalRecords = 16 和 Median = 120.5000

此查询适用于这两种条件,即偶数和奇数记录。

I am using the below table for the solution in MySQL:

CREATE TABLE transactions (
  transaction_id int , user_id int , merchant_name varchar(255), transaction_date date , amount int
);

INSERT INTO transactions (transaction_id, user_id, merchant_name, transaction_date, amount)  
VALUES (1, 1 ,'abc', '2015-08-17', 100),(2, 2, 'ced', '2015-2-17', 100),(3, 1, 'def', '2015-2-16', 121),
(4, 1 ,'ced', '2015-3-17', 110),(5, 1, 'ced', '2015-3-17', 150),(6, 2 ,'abc', '2015-4-17', 130), 
(7, 3 ,'ced', '2015-12-17', 10),(8, 3 ,'abc', '2015-8-17', 100),(9, 2 ,'abc', '2015-12-17', 140),(10, 1,'abc', '2015-9-17', 100),
(11, 1 ,'abc', '2015-08-17', 121),(12, 2 ,'ced', '2015-12-23', 130),(13, 1 ,'def', '2015-12-23', 13),(3, 4, 'abc', '2015-2-16', 120),(3, 4, 'def', '2015-2-16', 121),(3, 4, 'ced', '2015-2-16', 121);

Calculating Median for 'amount' column:

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS TotatRecords,
    ROW_NUMBER() OVER (ORDER BY amount) AS RowNum
FROM transactions
)
SELECT Avg(amount)
FROM Numbered
WHERE RowNum IN ( FLOOR((TotatRecords+1)/2), FLOOR((TotatRecords+2)/2) )
;

TotalRecords = 16 and Median = 120.5000

This query will work for both the conditions i.e. Even and Odd records.

女中豪杰 2024-08-08 15:52:12

如果您需要每组的中位数,请在 ROW_NUMBER() OVER (...) 中使用“PARTITION BY”

WITH Numbered AS 
(
  SELECT groupingColumn, 
  val,
  COUNT(*) OVER (partition by groupingColumn) AS Cnt,
  ROW_NUMBER() OVER (partition by groupingColumn ORDER BY val) AS RowNum
  FROM yourtable
)
SELECT groupingColumn, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
ORDER BY groupingColumn
;

If you need median per group then use "PARTITION BY" in ROW_NUMBER() OVER (...)

WITH Numbered AS 
(
  SELECT groupingColumn, 
  val,
  COUNT(*) OVER (partition by groupingColumn) AS Cnt,
  ROW_NUMBER() OVER (partition by groupingColumn ORDER BY val) AS RowNum
  FROM yourtable
)
SELECT groupingColumn, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
ORDER BY groupingColumn
;
眼波传意 2024-08-08 15:52:12

计算中位数的另一种方法是在 MySQL 5.7+、8+ 和 MariaDB 10.2+ 中使用 JSON 函数。

这是我在 MySQL 8.0 中测试的存储函数:

CREATE FUNCTION JSON_MEDIAN(input_json JSON)
RETURNS FLOAT NO SQL
BEGIN
    DECLARE median FLOAT;
    DECLARE middle INT;
    DECLARE arr_length INT;
    DECLARE peek_count INT;
    
    -- count non-empty items
    SELECT COUNT(*) INTO arr_length
    FROM JSON_TABLE(input_json, '$[*]' COLUMNS (item FLOAT PATH '

您现在可以将此函数与包含数字项的 JSON 数组一起使用,或使用 JSON_ARRAYAGG 函数创建输入,如下所示:

SELECT JSON_MEDIAN(JSON_ARRAYAGG(`val`))
FROM `my_table`

此方法没有 GROUP_CONCAT 限制.

)) s1 WHERE item IS NOT NULL; -- peek 1 item if length is odd or 2 items if length is even SET peek_count = 2 - arr_length % 2; SET middle = CEIL(arr_length / 2) - 1; SELECT AVG(item) INTO median FROM ( SELECT item FROM JSON_TABLE(input_json, '$[*]' COLUMNS (item FLOAT PATH '

您现在可以将此函数与包含数字项的 JSON 数组一起使用,或使用 JSON_ARRAYAGG 函数创建输入,如下所示:


此方法没有 GROUP_CONCAT 限制.

)) s1 WHERE item IS NOT NULL ORDER BY item LIMIT middle, peek_count ) s2; RETURN median; END

您现在可以将此函数与包含数字项的 JSON 数组一起使用,或使用 JSON_ARRAYAGG 函数创建输入,如下所示:

此方法没有 GROUP_CONCAT 限制.

A different way to calculate the Median is using JSON functions in MySQL 5.7+, 8+ and MariaDB 10.2+.

This is my stored function tested in MySQL 8.0:

CREATE FUNCTION JSON_MEDIAN(input_json JSON)
RETURNS FLOAT NO SQL
BEGIN
    DECLARE median FLOAT;
    DECLARE middle INT;
    DECLARE arr_length INT;
    DECLARE peek_count INT;
    
    -- count non-empty items
    SELECT COUNT(*) INTO arr_length
    FROM JSON_TABLE(input_json, '$[*]' COLUMNS (item FLOAT PATH '

You can now use this function with JSON arrays containing numeric items or create the input using the JSON_ARRAYAGG function like this:

SELECT JSON_MEDIAN(JSON_ARRAYAGG(`val`))
FROM `my_table`

This method does not have GROUP_CONCAT limits.

)) s1 WHERE item IS NOT NULL; -- peek 1 item if length is odd or 2 items if length is even SET peek_count = 2 - arr_length % 2; SET middle = CEIL(arr_length / 2) - 1; SELECT AVG(item) INTO median FROM ( SELECT item FROM JSON_TABLE(input_json, '$[*]' COLUMNS (item FLOAT PATH '

You can now use this function with JSON arrays containing numeric items or create the input using the JSON_ARRAYAGG function like this:


This method does not have GROUP_CONCAT limits.

)) s1 WHERE item IS NOT NULL ORDER BY item LIMIT middle, peek_count ) s2; RETURN median; END

You can now use this function with JSON arrays containing numeric items or create the input using the JSON_ARRAYAGG function like this:

This method does not have GROUP_CONCAT limits.

╭⌒浅淡时光〆 2024-08-08 15:52:12

我使用了两种查询方法:

  • 第一个查询方法获取计数、最小值、最大值和平均值,
  • 第二个查询方法(准备好的语句)使用“LIMIT @count/2, 1”和“ORDER BY ..”子句获取中值,

这些是包装在函数 defn 中,因此一次调用即可返回所有值。

如果您的范围是静态的并且数据不经常更改,则预先计算/存储这些值并使用存储的值可能会更有效,而不是每次都从头开始查询。

I used a two query approach:

  • first one to get count, min, max and avg
  • second one (prepared statement) with a "LIMIT @count/2, 1" and "ORDER BY .." clauses to get the median value

These are wrapped in a function defn, so all values can be returned from one call.

If your ranges are static and your data does not change often, it might be more efficient to precompute/store these values and use the stored values instead of querying from scratch every time.

朕就是辣么酷 2024-08-08 15:52:12

由于我只需要一个中位数和百分位数解决方案,因此我根据该线程中的发现创建了一个简单且非常灵活的函数。 我知道,如果我发现“现成”的功能很容易包含在我的项目中,我自己会很高兴,所以我决定快速分享:

function mysql_percentile($table, $column, $where, $percentile = 0.5) {

    $sql = "
            SELECT `t1`.`".$column."` as `percentile` FROM (
            SELECT @rownum:=@rownum+1 as `row_number`, `d`.`".$column."`
              FROM `".$table."` `d`,  (SELECT @rownum:=0) `r`
              ".$where."
              ORDER BY `d`.`".$column."`
            ) as `t1`, 
            (
              SELECT count(*) as `total_rows`
              FROM `".$table."` `d`
              ".$where."
            ) as `t2`
            WHERE 1
            AND `t1`.`row_number`=floor(`total_rows` * ".$percentile.")+1;
        ";

    $result = sql($sql, 1);

    if (!empty($result)) {
        return $result['percentile'];       
    } else {
        return 0;
    }

}

用法非常简单,来自我当前项目的示例:

...
$table = DBPRE."zip_".$slug;
$column = 'seconds';
$where = "WHERE `reached` = '1' AND `time` >= '".$start_time."'";

    $reaching['median'] = mysql_percentile($table, $column, $where, 0.5);
    $reaching['percentile25'] = mysql_percentile($table, $column, $where, 0.25);
    $reaching['percentile75'] = mysql_percentile($table, $column, $where, 0.75);
...

as i just needed a median AND percentile solution, I made a simple and quite flexible function based on the findings in this thread. I know that I am happy myself if I find "readymade" functions that are easy to include in my projects, so I decided to quickly share:

function mysql_percentile($table, $column, $where, $percentile = 0.5) {

    $sql = "
            SELECT `t1`.`".$column."` as `percentile` FROM (
            SELECT @rownum:=@rownum+1 as `row_number`, `d`.`".$column."`
              FROM `".$table."` `d`,  (SELECT @rownum:=0) `r`
              ".$where."
              ORDER BY `d`.`".$column."`
            ) as `t1`, 
            (
              SELECT count(*) as `total_rows`
              FROM `".$table."` `d`
              ".$where."
            ) as `t2`
            WHERE 1
            AND `t1`.`row_number`=floor(`total_rows` * ".$percentile.")+1;
        ";

    $result = sql($sql, 1);

    if (!empty($result)) {
        return $result['percentile'];       
    } else {
        return 0;
    }

}

Usage is very easy, example from my current project:

...
$table = DBPRE."zip_".$slug;
$column = 'seconds';
$where = "WHERE `reached` = '1' AND `time` >= '".$start_time."'";

    $reaching['median'] = mysql_percentile($table, $column, $where, 0.5);
    $reaching['percentile25'] = mysql_percentile($table, $column, $where, 0.25);
    $reaching['percentile75'] = mysql_percentile($table, $column, $where, 0.75);
...
尘曦 2024-08-08 15:52:12

这是我的方法。 当然,您可以将其放入过程中:-)

SET @median_counter = (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`);

SET @median = CONCAT('SELECT `val` FROM `data` ORDER BY `val` LIMIT ', @median_counter, ', 1');

PREPARE median FROM @median;

EXECUTE median;

如果您替换变量 @median_counter,则可以避免使用它:

SET @median = CONCAT( 'SELECT `val` FROM `data` ORDER BY `val` LIMIT ',
                      (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`),
                      ', 1'
                    );

PREPARE median FROM @median;

EXECUTE median;

Here is my way . Of course, you could put it into a procedure :-)

SET @median_counter = (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`);

SET @median = CONCAT('SELECT `val` FROM `data` ORDER BY `val` LIMIT ', @median_counter, ', 1');

PREPARE median FROM @median;

EXECUTE median;

You could avoid the variable @median_counter, if you substitude it:

SET @median = CONCAT( 'SELECT `val` FROM `data` ORDER BY `val` LIMIT ',
                      (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`),
                      ', 1'
                    );

PREPARE median FROM @median;

EXECUTE median;
蹲在坟头点根烟 2024-08-08 15:52:12

根据 @bob 的回答,这概括了查询能够返回按某些条件分组的多个中位数。

例如,考虑一下按年月分组的停车场二手车的中位销售价格。

SELECT 
    period, 
    AVG(middle_values) AS 'median' 
FROM (
    SELECT t1.sale_price AS 'middle_values', t1.row_num, t1.period, t2.count
    FROM (
        SELECT 
            @last_period:=@period AS 'last_period',
            @period:=DATE_FORMAT(sale_date, '%Y-%m') AS 'period',
            IF (@period<>@last_period, @row:=1, @row:=@row+1) as `row_num`, 
            x.sale_price
          FROM listings AS x, (SELECT @row:=0) AS r
          WHERE 1
            -- where criteria goes here
          ORDER BY DATE_FORMAT(sale_date, '%Y%m'), x.sale_price
        ) AS t1
    LEFT JOIN (  
          SELECT COUNT(*) as 'count', DATE_FORMAT(sale_date, '%Y-%m') AS 'period'
          FROM listings x
          WHERE 1
            -- same where criteria goes here
          GROUP BY DATE_FORMAT(sale_date, '%Y%m')
        ) AS t2
        ON t1.period = t2.period
    ) AS t3
WHERE 
    row_num >= (count/2) 
    AND row_num <= ((count/2) + 1)
GROUP BY t3.period
ORDER BY t3.period;

Based on @bob's answer, this generalizes the query to have the ability to return multiple medians, grouped by some criteria.

Think, e.g., median sale price for used cars in a car lot, grouped by year-month.

SELECT 
    period, 
    AVG(middle_values) AS 'median' 
FROM (
    SELECT t1.sale_price AS 'middle_values', t1.row_num, t1.period, t2.count
    FROM (
        SELECT 
            @last_period:=@period AS 'last_period',
            @period:=DATE_FORMAT(sale_date, '%Y-%m') AS 'period',
            IF (@period<>@last_period, @row:=1, @row:=@row+1) as `row_num`, 
            x.sale_price
          FROM listings AS x, (SELECT @row:=0) AS r
          WHERE 1
            -- where criteria goes here
          ORDER BY DATE_FORMAT(sale_date, '%Y%m'), x.sale_price
        ) AS t1
    LEFT JOIN (  
          SELECT COUNT(*) as 'count', DATE_FORMAT(sale_date, '%Y-%m') AS 'period'
          FROM listings x
          WHERE 1
            -- same where criteria goes here
          GROUP BY DATE_FORMAT(sale_date, '%Y%m')
        ) AS t2
        ON t1.period = t2.period
    ) AS t3
WHERE 
    row_num >= (count/2) 
    AND row_num <= ((count/2) + 1)
GROUP BY t3.period
ORDER BY t3.period;
一紙繁鸢 2024-08-08 15:52:12

阅读完之前的所有内容后,它们与我的实际要求不符,因此我实现了自己的一个,不需要任何过程或复杂的语句,只需 GROUP_CONCAT 我想要获取的列中的所有值MEDIAN 并应用 COUNT DIV BY 2 我从列表中间提取值,如下查询所示:

(POS 是我想要获取其中位数的列的名称)

(query) SELECT
SUBSTRING_INDEX ( 
   SUBSTRING_INDEX ( 
       GROUP_CONCAT(pos ORDER BY CAST(pos AS SIGNED INTEGER) desc SEPARATOR ';') 
    , ';', COUNT(*)/2 ) 
, ';', -1 ) AS `pos_med`
FROM table_name
GROUP BY any_criterial

我希望这对以下人员有用就像这个网站上的许多其他评论一样。

After reading all previous ones they didn't match with my actual requirement so I implemented my own one which doesn't need any procedure or complicate statements, just I GROUP_CONCAT all values from the column I wanted to obtain the MEDIAN and applying a COUNT DIV BY 2 I extract the value in from the middle of the list like the following query does :

(POS is the name of the column I want to get its median)

(query) SELECT
SUBSTRING_INDEX ( 
   SUBSTRING_INDEX ( 
       GROUP_CONCAT(pos ORDER BY CAST(pos AS SIGNED INTEGER) desc SEPARATOR ';') 
    , ';', COUNT(*)/2 ) 
, ';', -1 ) AS `pos_med`
FROM table_name
GROUP BY any_criterial

I hope this could be useful for someone in the way many of other comments were for me from this website.

柠檬 2024-08-08 15:52:12

上面的大多数解决方案仅适用于表的一个字段,您可能需要获取查询中许多字段的中位数(第 50 个百分位)。

我使用这个:

SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
 GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
  ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
FROM table_name;

您可以将上面示例中的“50”替换为任何百分位数,非常有效。

只需确保您有足够的内存用于 GROUP_CONCAT,您可以通过以下方式更改它:

SET group_concat_max_len = 10485760; #10MB max length

更多详细信息:http://web.performancerasta.com/metrics-tips-calculate-95th-99th-or-any-percentile-with-single-mysql-query/

Most of the solutions above work only for one field of the table, you might need to get the median (50th percentile) for many fields on the query.

I use this:

SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
 GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
  ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
FROM table_name;

You can replace the "50" in example above to any percentile, is very efficient.

Just make sure you have enough memory for the GROUP_CONCAT, you can change it with:

SET group_concat_max_len = 10485760; #10MB max length

More details: http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/

辞取 2024-08-08 15:52:12

MySQL 文档中的此页面的评论有以下建议:

-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:

DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;


INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);



-- Calculating the MEDIAN
SELECT @a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
@a := (@a + 1) mod o.c AS shifted_n,
IF(@a mod o.c=0, o.c, @a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,

-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;

-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1

-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4


-- from there we can select the n-th element on the position: count div 2 + 1 

A comment on this page in the MySQL documentation has the following suggestion:

-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:

DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;


INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);



-- Calculating the MEDIAN
SELECT @a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
@a := (@a + 1) mod o.c AS shifted_n,
IF(@a mod o.c=0, o.c, @a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,

-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;

-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1

-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4


-- from there we can select the n-th element on the position: count div 2 + 1 
囚我心虐我身 2024-08-08 15:52:12

在 MariaDB / MySQL 中:

SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
  FROM data d, (SELECT @rownum:=0) r
  WHERE d.val is NOT NULL
  -- put some where clause here
  ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

Steve Cohen 指出,在第一次传递之后,@rownum 将包含总行数。 这可用于确定中位数,因此不需要第二次传递或连接。

另外,AVG(dd.val)dd.row_number IN(...) 用于在存在偶数条记录时正确生成中位数。 推理:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

最后,MariaDB 10.3.3+ 包含 MEDIAN 函数

In MariaDB / MySQL:

SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
  FROM data d, (SELECT @rownum:=0) r
  WHERE d.val is NOT NULL
  -- put some where clause here
  ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

Steve Cohen points out, that after the first pass, @rownum will contain the total number of rows. This can be used to determine the median, so no second pass or join is needed.

Also AVG(dd.val) and dd.row_number IN(...) is used to correctly produce a median when there are an even number of records. Reasoning:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

Finally, MariaDB 10.3.3+ contains a MEDIAN function

银河中√捞星星 2024-08-08 15:52:12

我刚刚在评论中在线找到了另一个答案 :

对于几乎所有 SQL 中的中位数:

从数据 x、数据 y 中选择 x.val 
  按 x.val 分组 
  具有 SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2 
  

确保您的列有良好的索引,并且索引用于过滤和排序。 与解释计划进行验证。

select count(*) from table --find the number of rows

计算“中位数”行数。 也许使用:median_row = Floor(count / 2)

然后从列表中选择它:

select val from table order by val asc limit median_row,1

这应该返回一行,其中仅包含您想要的值。

I just found another answer online in the comments:

For medians in almost any SQL:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2

Make sure your columns are well indexed and the index is used for filtering and sorting. Verify with the explain plans.

select count(*) from table --find the number of rows

Calculate the "median" row number. Maybe use: median_row = floor(count / 2).

Then pick it out of the list:

select val from table order by val asc limit median_row,1

This should return you one row with just the value you want.

逆流 2024-08-08 15:52:12

我发现接受的解决方案在我的 MySQL 安装上不起作用,返回一个空集,但这个查询在我测试它的所有情况下都对我有效:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1

I found the accepted solution didn't work on my MySQL install, returning an empty set, but this query worked for me in all situations that I tested it on:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1
梦年海沫深 2024-08-08 15:52:12

不幸的是,TheJacobTaylor 和 velcrow 的答案都没有返回当前版本 MySQL 的准确结果。

Velcro 上面的答案很接近,但对于偶数行的结果集,它无法正确计算。 中位数定义为 1) 奇数组中的中间数字,或 2) 偶数组中两个中间数字的平均值。

因此,这里是 velcro 的解决方案,修补后可以处理奇数和偶数集:

SELECT AVG(middle_values) AS 'median' FROM (
  SELECT t1.median_column AS 'middle_values' FROM
    (
      SELECT @row:=@row+1 as `row`, x.median_column
      FROM median_table AS x, (SELECT @row:=0) AS r
      WHERE 1
      -- put some where clause here
      ORDER BY x.median_column
    ) AS t1,
    (
      SELECT COUNT(*) as 'count'
      FROM median_table x
      WHERE 1
      -- put same where clause here
    ) AS t2
    -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
    WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;

要使用它,请遵循以下 3 个简单步骤:

  1. 将上述代码中的“median_table”(出现 2 次)替换为表的名称
  2. 替换“median_column”(出现 3 次) ) 替换为您想要查找中位数的列名
  3. 如果您有 WHERE 条件,请将“WHERE 1”(出现 2 次)替换为您的 where 条件

Unfortunately, neither TheJacobTaylor's nor velcrow's answers return accurate results for current versions of MySQL.

Velcro's answer from above is close, but it does not calculate correctly for result sets with an even number of rows. Medians are defined as either 1) the middle number on odd numbered sets, or 2) the average of the two middle numbers on even number sets.

So, here's velcro's solution patched to handle both odd and even number sets:

SELECT AVG(middle_values) AS 'median' FROM (
  SELECT t1.median_column AS 'middle_values' FROM
    (
      SELECT @row:=@row+1 as `row`, x.median_column
      FROM median_table AS x, (SELECT @row:=0) AS r
      WHERE 1
      -- put some where clause here
      ORDER BY x.median_column
    ) AS t1,
    (
      SELECT COUNT(*) as 'count'
      FROM median_table x
      WHERE 1
      -- put same where clause here
    ) AS t2
    -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
    WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;

To use this, follow these 3 easy steps:

  1. Replace "median_table" (2 occurrences) in the above code with the name of your table
  2. Replace "median_column" (3 occurrences) with the column name you'd like to find a median for
  3. If you have a WHERE condition, replace "WHERE 1" (2 occurrences) with your where condition
维持三分热 2024-08-08 15:52:12

我提出一个更快的方法。

获取行数:

SELECT CEIL(COUNT(*)/2) FROM data;

然后在排序子查询中取中间值:

SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;

我用 5x10e6 的随机数数据集对此进行了测试,它将在 10 秒内找到中值。

I propose a faster way.

Get the row count:

SELECT CEIL(COUNT(*)/2) FROM data;

Then take the middle value in a sorted subquery:

SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;

I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.

云巢 2024-08-08 15:52:12

安装并使用这个mysql统计函数:http://www.xarg。 org/2012/07/statistical-functions-in-mysql/

之后,计算中位数很容易:

SELECT median(val) FROM data;

Install and use this mysql statistical functions: http://www.xarg.org/2012/07/statistical-functions-in-mysql/

After that, calculate median is easy:

SELECT median(val) FROM data;
夢归不見 2024-08-08 15:52:12

如果 MySQL 有 ROW_NUMBER,则 MEDIAN 为(受此 SQL Server 查询启发):

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS Cnt,
    ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;

如果条目数为偶数,则使用 IN。

如果您想找到每组的中位数,那么只需在 OVER 子句中使用 PARTITION BY group 即可。

If MySQL has ROW_NUMBER, then the MEDIAN is (be inspired by this SQL Server query):

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS Cnt,
    ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;

The IN is used in case you have an even number of entries.

If you want to find the median per group, then just PARTITION BY group in your OVER clauses.

Rob

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文