MYSQL中百分位数/排名的高效分配
我有几个非常大的表(超过 400,000 行),如下所示:
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | NULL |
| 3684515 | 3.0476 | NULL |
| 3684516 | 2.6499 | NULL |
| 3684517 | 0.3585 | NULL |
| 3684518 | 1.6919 | NULL |
| 3684519 | 2.8515 | NULL |
| 3684520 | 4.0728 | NULL |
| 3684521 | 4.0224 | NULL |
| 3684522 | 5.8207 | NULL |
| 3684523 | 6.8291 | NULL |
+---------+--------+---------------+...about 400,000 more
我需要为 M1_Percentile
列中的每一行分配一个值,该值表示“M1 值等于或的行的百分比”低于当前行的 M1
值”
换句话说,我需要:
我成功地实现了这一点,但它太慢了。如果有人可以创建以下代码的更有效版本,我将非常感激!
UPDATE myTable AS X JOIN (
SELECT
s1.ID, COUNT(s2.ID)/ (SELECT COUNT(*) FROM myTable) * 100 AS percentile
FROM
myTable s1 JOIN myTable s2 on (s2.M1 <= s1.M1)
GROUP BY s1.ID
ORDER BY s1.ID) AS Z
ON (X.ID = Z.ID)
SET X.M1_Percentile = Z.percentile;
如果行数限制为您看到的行数(10 行),这是上述查询的(正确但缓慢)结果:
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | 60 |
| 3684515 | 3.0476 | 50 |
| 3684516 | 2.6499 | 30 |
| 3684517 | 0.3585 | 10 |
| 3684518 | 1.6919 | 20 |
| 3684519 | 2.8515 | 40 |
| 3684520 | 4.0728 | 80 |
| 3684521 | 4.0224 | 70 |
| 3684522 | 5.8207 | 90 |
| 3684523 | 6.8291 | 100 |
+---------+--------+---------------+
为整个 400,000 行生成相同的结果需要更长的时间。
I have a couple of very large tables (over 400,000 rows) that look like the following:
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | NULL |
| 3684515 | 3.0476 | NULL |
| 3684516 | 2.6499 | NULL |
| 3684517 | 0.3585 | NULL |
| 3684518 | 1.6919 | NULL |
| 3684519 | 2.8515 | NULL |
| 3684520 | 4.0728 | NULL |
| 3684521 | 4.0224 | NULL |
| 3684522 | 5.8207 | NULL |
| 3684523 | 6.8291 | NULL |
+---------+--------+---------------+...about 400,000 more
I need to assign each row in the M1_Percentile
column a value that represents "the percent of rows with M1 values equal or lower to the current row's M1
value"
In other words, I need:
I implemented this sucessfully, but it is FAR FAR too slow. If anyone could create a more efficient version of the following code, I would really appreciate it!
UPDATE myTable AS X JOIN (
SELECT
s1.ID, COUNT(s2.ID)/ (SELECT COUNT(*) FROM myTable) * 100 AS percentile
FROM
myTable s1 JOIN myTable s2 on (s2.M1 <= s1.M1)
GROUP BY s1.ID
ORDER BY s1.ID) AS Z
ON (X.ID = Z.ID)
SET X.M1_Percentile = Z.percentile;
This is the (correct but slow) result from the above query if the number of rows is limited to the ones you see (10 rows):
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | 60 |
| 3684515 | 3.0476 | 50 |
| 3684516 | 2.6499 | 30 |
| 3684517 | 0.3585 | 10 |
| 3684518 | 1.6919 | 20 |
| 3684519 | 2.8515 | 40 |
| 3684520 | 4.0728 | 80 |
| 3684521 | 4.0224 | 70 |
| 3684522 | 5.8207 | 90 |
| 3684523 | 6.8291 | 100 |
+---------+--------+---------------+
Producing the same results for the entire 400,000 rows takes magnitudes longer.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我无法对此进行测试,但您可以尝试以下操作:
更新:
这适用于 Oracle(我唯一可用的数据库)。我记得在 MySQL 中遇到过这个错误。这很烦人。
I cannot test this, but you could try something like:
UPDATE:
This works in Oracle (the only database I have available). I do remember getting that error in MySQL. It is very annoying.
公平警告:mysql 不是我的本机环境。然而,经过一些研究,我认为以下查询应该是可行的:
Fair warning: mysql isn't my native environment. However, after a little research, I think the following query should be workable: