MYSQL中百分位数/排名的高效分配

发布于 2024-11-30 07:59:57 字数 1784 浏览 3 评论 0原文

我有几个非常大的表（超过 400,000 行），如下所示：

+---------+--------+---------------+
| ID      | M1     | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | NULL          |
| 3684515 | 3.0476 | NULL          |
| 3684516 | 2.6499 | NULL          |
| 3684517 | 0.3585 | NULL          |
| 3684518 | 1.6919 | NULL          |
| 3684519 | 2.8515 | NULL          |
| 3684520 | 4.0728 | NULL          |
| 3684521 | 4.0224 | NULL          |
| 3684522 | 5.8207 | NULL          |
| 3684523 | 6.8291 | NULL          |
+---------+--------+---------------+...about 400,000 more

我需要为 M1_Percentile 列中的每一行分配一个值，该值表示“M1 值等于或的行的百分比”低于当前行的 M1 值”

换句话说，我需要：

在此处输入图像描述

我成功地实现了这一点，但它太慢了。如果有人可以创建以下代码的更有效版本，我将非常感激！

UPDATE myTable AS X JOIN (
SELECT
  s1.ID, COUNT(s2.ID)/ (SELECT COUNT(*) FROM myTable) * 100 AS percentile
FROM
  myTable s1 JOIN myTable s2 on (s2.M1 <= s1.M1)
GROUP BY s1.ID
ORDER BY s1.ID) AS Z 
ON (X.ID = Z.ID) 
SET X.M1_Percentile = Z.percentile;

如果行数限制为您看到的行数（10 行），这是上述查询的（正确但缓慢）结果：

+---------+--------+---------------+
| ID      | M1     | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 |            60 |
| 3684515 | 3.0476 |            50 |
| 3684516 | 2.6499 |            30 |
| 3684517 | 0.3585 |            10 |
| 3684518 | 1.6919 |            20 |
| 3684519 | 2.8515 |            40 |
| 3684520 | 4.0728 |            80 |
| 3684521 | 4.0224 |            70 |
| 3684522 | 5.8207 |            90 |
| 3684523 | 6.8291 |           100 |
+---------+--------+---------------+

为整个 400,000 行生成相同的结果需要更长的时间。

原文

I have a couple of very large tables (over 400,000 rows) that look like the following:

+---------+--------+---------------+
| ID      | M1     | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | NULL          |
| 3684515 | 3.0476 | NULL          |
| 3684516 | 2.6499 | NULL          |
| 3684517 | 0.3585 | NULL          |
| 3684518 | 1.6919 | NULL          |
| 3684519 | 2.8515 | NULL          |
| 3684520 | 4.0728 | NULL          |
| 3684521 | 4.0224 | NULL          |
| 3684522 | 5.8207 | NULL          |
| 3684523 | 6.8291 | NULL          |
+---------+--------+---------------+...about 400,000 more

I need to assign each row in the M1_Percentile column a value that represents "the percent of rows with M1 values equal or lower to the current row's M1 value"

In other words, I need:

enter image description here

I implemented this sucessfully, but it is FAR FAR too slow. If anyone could create a more efficient version of the following code, I would really appreciate it!

UPDATE myTable AS X JOIN (
SELECT
  s1.ID, COUNT(s2.ID)/ (SELECT COUNT(*) FROM myTable) * 100 AS percentile
FROM
  myTable s1 JOIN myTable s2 on (s2.M1 <= s1.M1)
GROUP BY s1.ID
ORDER BY s1.ID) AS Z 
ON (X.ID = Z.ID) 
SET X.M1_Percentile = Z.percentile;

This is the (correct but slow) result from the above query if the number of rows is limited to the ones you see (10 rows):

+---------+--------+---------------+
| ID      | M1     | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 |            60 |
| 3684515 | 3.0476 |            50 |
| 3684516 | 2.6499 |            30 |
| 3684517 | 0.3585 |            10 |
| 3684518 | 1.6919 |            20 |
| 3684519 | 2.8515 |            40 |
| 3684520 | 4.0728 |            80 |
| 3684521 | 4.0224 |            70 |
| 3684522 | 5.8207 |            90 |
| 3684523 | 6.8291 |           100 |
+---------+--------+---------------+

Producing the same results for the entire 400,000 rows takes magnitudes longer.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瑾夏年华 2024-12-07 07:59:57

我无法对此进行测试，但您可以尝试以下操作：

update table t
set mi_percentile = (
    select count(*)
    from table t1
    where M1 < t.M1 / (
        select count(*)
        from table));

更新：

update test t
set m1_pc = (
    (select count(*) from test t1 where t1.M1 < t.M1) * 100 /
    ( select count(*) from test));

这适用于 Oracle（我唯一可用的数据库）。我记得在 MySQL 中遇到过这个错误。这很烦人。

I cannot test this, but you could try something like:

update table t
set mi_percentile = (
    select count(*)
    from table t1
    where M1 < t.M1 / (
        select count(*)
        from table));

UPDATE:

update test t
set m1_pc = (
    (select count(*) from test t1 where t1.M1 < t.M1) * 100 /
    ( select count(*) from test));

This works in Oracle (the only database I have available). I do remember getting that error in MySQL. It is very annoying.

回复收藏 0 原文

梦年海沫深 2024-12-07 07:59:57

公平警告：mysql 不是我的本机环境。然而，经过一些研究，我认为以下查询应该是可行的：

UPDATE myTable AS X 
JOIN ( 
  SELECT  X.ID, (
      SELECT  COUNT(*)
      FROM    myTable X1
      WHERE   (X.M1, X.id) >= (X1.M1, X1.id) as Rank)
  FROM myTable as X
) AS RowRank
ON (X.ID = RowRank.ID)
CROSS JOIN (
  SELECT COUNT(*) as TotalCount 
  FROM myTable 
) AS TotalCount
SET X.M1_Percentile = RowRank.Rank / TotalCount.TotalCount;

Fair warning: mysql isn't my native environment. However, after a little research, I think the following query should be workable:

UPDATE myTable AS X 
JOIN ( 
  SELECT  X.ID, (
      SELECT  COUNT(*)
      FROM    myTable X1
      WHERE   (X.M1, X.id) >= (X1.M1, X1.id) as Rank)
  FROM myTable as X
) AS RowRank
ON (X.ID = RowRank.ID)
CROSS JOIN (
  SELECT COUNT(*) as TotalCount 
  FROM myTable 
) AS TotalCount
SET X.M1_Percentile = RowRank.Rank / TotalCount.TotalCount;

回复收藏 0 原文

~没有更多了~