从 A 更新中选择 count(*) B 超慢

发布于 2024-11-18 14:51:35 字数 1904 浏览 2 评论 0原文

我有一个查询需要 2 分钟从表 A 中计数并使用计数结果更新表 B。

每当 Table_B 列 Start 中的数字与 Table_A（readstart/readend）中的范围匹配时，我应该更新 Table_B 中的 read_count。

   id | readstart | readend | read_count
    1 | 2999997   | 3000097  | 0   
    2 | 3000097   | 3000197  | 0   
    3 | 3000497   | 3000597  | 0   
    4 | 3001597   | 3001697  | 0   
    5 | 3001897   | 3001997  | 0   
    6 | 3005397   | 3005497  | 0   
    7 | 3005997   | 3006097  | 0  
    8 | 3006397   | 3006497  | 0   
    9 | 3006797   | 3006897  | 0  
    10| 3007497   | 3007597  | 0

这是我应该用计数结果更新的表：

    CREATE TABLE `rdc_test` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `readstart` int(11) DEFAULT NULL,
    `readend` int(11) DEFAULT NULL,
    `read_count` int(11) DEFAULT NULL,
     PRIMARY KEY (`id`),
     KEY `readstart` (`readstart`),
     KEY `readend` (`readend`)
    ) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8;

这是我想要计算匹配行的表：

   CREATE TABLE `1ips_chr1` (
   `strand` char(1) DEFAULT NULL,
   `chr` varchar(10) DEFAULT NULL,
   `start` int(11) DEFAULT NULL,
   `end` int(11) DEFAULT NULL,
   `name` varchar(255) DEFAULT NULL,
   `name2` varchar(255) DEFAULT NULL,
   `id` int(11) NOT NULL AUTO_INCREMENT,
   PRIMARY KEY (`id`),
   KEY `start` (`start`),
   KEY `end` (`end`)
  ) ENGINE=MyISAM AUTO_INCREMENT=34994289 DEFAULT CHARSET=latin1;

我对 10 行进行了测试，结果很糟糕..2 分钟选择 count(*) 并更新 10 行。我在 Table_A 中有大约 350,000 行需要更新，在 table_B 中有 35,000,000 行。我知道平均每个计数应该返回 30~40 个结果。

这是我的超慢查询：

UPDATE rdc_test
SET rdc_test.read_count =
    ( 
        SELECT COUNT(start) as read_count 
        FROM 1ips_chr1
        WHERE 1ips_chr1.start >= rdc_test.readstart 
        AND 1ips_chr1.start <= rdc_test.readend
    )

Query OK, 10 rows affected (2 min 22.20 sec)
Rows matched: 10  Changed: 10  Warnings: 0

原文

I have a query that takes 2minutes to count from table A and update Table B with the count result.

Everytime that a number in Table_B column Start matches the range in Table_A (readstart/readend) I should update read_count in Table_B.

   id | readstart | readend | read_count
    1 | 2999997   | 3000097  | 0   
    2 | 3000097   | 3000197  | 0   
    3 | 3000497   | 3000597  | 0   
    4 | 3001597   | 3001697  | 0   
    5 | 3001897   | 3001997  | 0   
    6 | 3005397   | 3005497  | 0   
    7 | 3005997   | 3006097  | 0  
    8 | 3006397   | 3006497  | 0   
    9 | 3006797   | 3006897  | 0  
    10| 3007497   | 3007597  | 0

Here is the table I should update with the count result :

    CREATE TABLE `rdc_test` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `readstart` int(11) DEFAULT NULL,
    `readend` int(11) DEFAULT NULL,
    `read_count` int(11) DEFAULT NULL,
     PRIMARY KEY (`id`),
     KEY `readstart` (`readstart`),
     KEY `readend` (`readend`)
    ) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8;

Here is the table from where I wanna count matching rows :

   CREATE TABLE `1ips_chr1` (
   `strand` char(1) DEFAULT NULL,
   `chr` varchar(10) DEFAULT NULL,
   `start` int(11) DEFAULT NULL,
   `end` int(11) DEFAULT NULL,
   `name` varchar(255) DEFAULT NULL,
   `name2` varchar(255) DEFAULT NULL,
   `id` int(11) NOT NULL AUTO_INCREMENT,
   PRIMARY KEY (`id`),
   KEY `start` (`start`),
   KEY `end` (`end`)
  ) ENGINE=MyISAM AUTO_INCREMENT=34994289 DEFAULT CHARSET=latin1;

I did a test on 10 rows, the result was horrible..2 minutes to select count(*) and update 10 rows. I have about 350,000 rows in Table_A to update and 35,000,000 in table_B. I know that in average each count should return 30~40 as a result.

Here is my super slow query :

UPDATE rdc_test
SET rdc_test.read_count =
    ( 
        SELECT COUNT(start) as read_count 
        FROM 1ips_chr1
        WHERE 1ips_chr1.start >= rdc_test.readstart 
        AND 1ips_chr1.start <= rdc_test.readend
    )

Query OK, 10 rows affected (2 min 22.20 sec)
Rows matched: 10  Changed: 10  Warnings: 0

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

过潦 2024-11-25 14:51:35

试试这个：

UPDATE rdc_test t1
    INNER JOIN 
    (
        SELECT r.id AS id, 
               COUNT(l.start) AS read_count
        FROM rdc_test r
            LEFT OUTER JOIN start1ips_chr1 l
                ON l.start >= r.readstart 
                AND l.start <= r.readend
        GROUP BY r.id
    ) t2
    ON t1.id = t2.id
SET t1.read_count =  t2.read_count

编辑：

由于需要更新的数据量，最好的方法是重新创建表而不是执行更新：

CREATE TABLE new_rdc_test AS
SELECT r.id AS id, 
       r.readstart AS readstart,
       r.readend AS readend,
       COUNT(l.start) AS read_count
FROM rdc_test r
    LEFT OUTER JOIN start1ips_chr1 l
        ON l.start >= r.readstart 
        AND l.start <= r.readend
GROUP BY r.id, r.readstart, r.readend

这个查询运行得足够快吗？

Try this :

UPDATE rdc_test t1
    INNER JOIN 
    (
        SELECT r.id AS id, 
               COUNT(l.start) AS read_count
        FROM rdc_test r
            LEFT OUTER JOIN start1ips_chr1 l
                ON l.start >= r.readstart 
                AND l.start <= r.readend
        GROUP BY r.id
    ) t2
    ON t1.id = t2.id
SET t1.read_count =  t2.read_count

Edit :

Due to the amount of datas you need to update, the best way is to recreate the table instead of perform an update :

CREATE TABLE new_rdc_test AS
SELECT r.id AS id, 
       r.readstart AS readstart,
       r.readend AS readend,
       COUNT(l.start) AS read_count
FROM rdc_test r
    LEFT OUTER JOIN start1ips_chr1 l
        ON l.start >= r.readstart 
        AND l.start <= r.readend
GROUP BY r.id, r.readstart, r.readend

Does this query run fast enough ?

回复收藏 0 原文