我还能做些什么来优化这个 MySQL 查询吗?

发布于 2024-08-30 17:43:16 字数 3531 浏览 2 评论 0原文

我有两个表,表 A 有 700,000 个条目,表 B 有 600,000 个条目。结构如下:

表 A:

+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| id        | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment | 
| number    | bigint(20) unsigned | YES  |     | NULL    |                | 
+-----------+---------------------+------+-----+---------+----------------+

表 B:

+-------------+---------------------+------+-----+---------+----------------+
| Field       | Type                | Null | Key | Default | Extra          |
+-------------+---------------------+------+-----+---------+----------------+
| id          | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment | 
| number_s    | bigint(20) unsigned | YES  | MUL | NULL    |                | 
| number_e    | bigint(20) unsigned | YES  | MUL | NULL    |                | 
| source      | varchar(50)         | YES  |     | NULL    |                |
+-------------+---------------------+------+-----+---------+----------------+

我试图使用以下代码查找表 A 中的任何值是否存在于表 B 中:

$sql = "SELECT number from TableA";
$result = mysql_query($sql) or die(mysql_error());

while($row = mysql_fetch_assoc($result)) {
        $number = $row['number'];
        $sql = "SELECT source, count(source) FROM TableB WHERE number_s < $number AND number_e > $number GROUP BY source";
        $re = mysql_query($sql) or die(mysql_error);
        while($ro = mysql_fetch_array($re)) {
                echo $number."\t".$ro[0]."\t".$ro[1]."\n";
        }
}

我希望查询会很快,但由于某种原因,速度并不快。我对选择(具有特定的“数字”值)的解释给出了以下内容:

mysql> explain SELECT source, count(source) FROM TableB WHERE number_s < 1812194440 AND number_e > 1812194440 GROUP BY source;
+----+-------------+------------+------+-------------------------+------+---------+------+--------+----------------------------------------------+
| id | select_type | table      | type | possible_keys           | key  | key_len | ref  | rows   | Extra                                        |
+----+-------------+------------+------+-------------------------+------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | TableB     | ALL  | number_s,number_e       | NULL | NULL    | NULL | 696325 | Using where; Using temporary; Using filesort | 
+----+-------------+------------+------+-------------------------+------+---------+------+--------+----------------------------------------------+
1 row in set (0.00 sec)

我可以从中挤出任何优化吗?

我尝试为同一任务编写一个存储过程,但它似乎一开始就不起作用...它没有给出任何语法错误...我尝试运行它一天,但它仍在运行感觉很奇怪。

CREATE PROCEDURE Filter() 
Begin 
  DECLARE number BIGINT UNSIGNED; 
  DECLARE x INT; 
  DECLARE done INT DEFAULT 0; 
  DECLARE cur1 CURSOR FOR SELECT number FROM TableA; 
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; 
  CREATE TEMPORARY TABLE IF NOT EXISTS Flags(number bigint unsigned, count int(11)); 
  OPEN cur1; 
  hist_loop: LOOP 
    FETCH cur1 INTO number; 
    SELECT count(*) from TableB WHERE number_s < number AND number_e > number INTO x; 
    IF done = 1 THEN 
      LEAVE hist_loop; 
    END IF; 
    IF x IS NOT NULL AND x>0 THEN 
      INSERT INTO Flags(number, count) VALUES(number, x); 
    END IF; 
  END LOOP hist_loop; 
  CLOSE cur1;
END

I have two tables, Table A with 700,000 entries and Table B with 600,000 entries. The structure is as follows:

Table A:

+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| id        | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment | 
| number    | bigint(20) unsigned | YES  |     | NULL    |                | 
+-----------+---------------------+------+-----+---------+----------------+

Table B:

+-------------+---------------------+------+-----+---------+----------------+
| Field       | Type                | Null | Key | Default | Extra          |
+-------------+---------------------+------+-----+---------+----------------+
| id          | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment | 
| number_s    | bigint(20) unsigned | YES  | MUL | NULL    |                | 
| number_e    | bigint(20) unsigned | YES  | MUL | NULL    |                | 
| source      | varchar(50)         | YES  |     | NULL    |                |
+-------------+---------------------+------+-----+---------+----------------+

I am trying to find if any of the values in Table A are present in Table B using the following code:

$sql = "SELECT number from TableA";
$result = mysql_query($sql) or die(mysql_error());

while($row = mysql_fetch_assoc($result)) {
        $number = $row['number'];
        $sql = "SELECT source, count(source) FROM TableB WHERE number_s < $number AND number_e > $number GROUP BY source";
        $re = mysql_query($sql) or die(mysql_error);
        while($ro = mysql_fetch_array($re)) {
                echo $number."\t".$ro[0]."\t".$ro[1]."\n";
        }
}

I was hoping that the query would go fast but then for some reason, it isn't terrible fast. My explain on the select (with a particular value of "number") gives me the following:

mysql> explain SELECT source, count(source) FROM TableB WHERE number_s < 1812194440 AND number_e > 1812194440 GROUP BY source;
+----+-------------+------------+------+-------------------------+------+---------+------+--------+----------------------------------------------+
| id | select_type | table      | type | possible_keys           | key  | key_len | ref  | rows   | Extra                                        |
+----+-------------+------------+------+-------------------------+------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | TableB     | ALL  | number_s,number_e       | NULL | NULL    | NULL | 696325 | Using where; Using temporary; Using filesort | 
+----+-------------+------------+------+-------------------------+------+---------+------+--------+----------------------------------------------+
1 row in set (0.00 sec)

Is there any optimization that I can squeeze out of this?

I tried writing a stored procedure for the same task but it doesn't even seem to work in the first place... It doesn't give any syntax errors... I tried running it for a day and it was still running which felt odd.

CREATE PROCEDURE Filter() 
Begin 
  DECLARE number BIGINT UNSIGNED; 
  DECLARE x INT; 
  DECLARE done INT DEFAULT 0; 
  DECLARE cur1 CURSOR FOR SELECT number FROM TableA; 
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; 
  CREATE TEMPORARY TABLE IF NOT EXISTS Flags(number bigint unsigned, count int(11)); 
  OPEN cur1; 
  hist_loop: LOOP 
    FETCH cur1 INTO number; 
    SELECT count(*) from TableB WHERE number_s < number AND number_e > number INTO x; 
    IF done = 1 THEN 
      LEAVE hist_loop; 
    END IF; 
    IF x IS NOT NULL AND x>0 THEN 
      INSERT INTO Flags(number, count) VALUES(number, x); 
    END IF; 
  END LOOP hist_loop; 
  CLOSE cur1;
END

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

泪眸﹌ 2024-09-06 17:43:16

您正在尝试查找包含点的间隔。对于 B-tree 索引(大多数情况下的默认索引类型),这并不是那么快数据库),但是 R-tree 索引非常适合此类查询。 MySQL 不允许您直接更改索引的类型,但您可以通过使用 GEOMETRY 列类型强制 MySQL 使用 R-Tree。

Quassnoi这篇关于 MySQL 中嵌套集的文章。虽然不完全相同,但非常相似。引用文章中的话:

还有特定类别的任务
需要搜索所有范围
包含已知值:

* 在 IP 范围禁止列表中搜索 IP 地址
* 搜索指定日期范围内的日期

还有其他几个。这些任务可以是
通过使用 R-Tree 功能进行改进
MySQL的

You are trying to find intervals which contain a point. This is not so fast with a B-tree index (the default index type in most databases), however an R-tree index will work well for this sort of query. MySQL doesn't allow you to change the type of an index directly, but you can force MySQL to use an R-Tree by using the GEOMETRY column type.

Quassnoi covers this in his article on nested sets in MySQL. While it's not quite the same, it's very similar. A quote from the article:

There is also a certain class of tasks
that require searching for all ranges
containing a known value:

* Searching for an IP address in the IP range ban list
* Searching for a given date within a date range

and several others. These tasks can be
improved by using R-Tree capabilities
of MySQL

冷夜 2024-09-06 17:43:16

在我看来,您在 number_enumber_s 列上有单独的索引,可能是使用单独的 ADD INDEX(number_e)创建的添加 INDEX(number_s) 列。

如果您添加一个包含这两列的索引,您可能会获得更好的性能,因为它们都在您的查询中使用,并且 MySQL 显然不会选择使用任何一个单列索引,判断整个表扫描会更快(如果您的查询跨越很大范围的值,这种情况并不罕见)。

ALTER TABLE tblB ADD INDEX(number_s,number_e);

此后您将不再需要单独的 number_s 索引,因为 MySQL 可以使用您刚刚创建的索引仅针对 number_s 进行查询,因此您不妨删除该索引。

It looks to me like you have separate indexes on the number_e and number_s columns, probably created with separate ADD INDEX(number_e) and ADD INDEX(number_s) columns.

You will likely get much better performance if you add an index that encompasses both those columns, as they are both being used in your query, and MySQL is clearly not choosing to use either of the single-column indexes, judging that a whole table scan would be faster (not uncommon if your query spans a large range of values).

ALTER TABLE tblB ADD INDEX(number_s,number_e);

You won't need the individual number_s index after that, as MySQL can use the one you just created for queries against number_s only, so you might as well drop that one.

尹雨沫 2024-09-06 17:43:16

首先,我假设所需的输出是将输入位于之间的所有“源”分组
number_e 和 number_s 及其计数。

我对语法不太了解,但您可能会考虑使用“BETWEEN”子句,而不是使用小于/大于运算符进行显式比较

编辑:Zombat 所说的也适用;索引也会有所帮助。

First, I assume the desired output is to group all 'source' where the input lies between
number_e and number_s, and the count thereof.

I'm scratchy on the syntax, but you might consider using a 'BETWEEN' clause up there instead of an explicit comparison using less-than/greater-than operators

Edit: What Zombat says applies too; indexes will help too.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文