MySQL 中能够连接的合理行数和表数是多少？

发布于 2024-08-27 17:32:52 字数 4317 浏览 10 评论 0原文

我有一张将位置映射到邮政编码的表。例如，纽约州有大约 2000 个邮政编码。我有另一个表将邮件映射到其发送到的邮政编码，但该表大约有 500 万行。我想找到所有发送到纽约州的邮件，这看起来很简单，但是查询速度慢得令人难以置信。我什至没能等足够长的时间来完成它。问题是有500万行吗？我不禁想到，500 万对于现在的计算机来说不应该是一个很大的数字......哦，所有的东西都被索引了。难道 SQL 就不是为处理如此大的连接而设计的吗？

更新：正如人们所问，我已经用我正在使用的表定义和查询更新了这个问题。

-- Roughly 70,000 rows
CREATE TABLE `mail_zip` (
  `mail_id` int(11) default NULL,
  `zip` int(11) default NULL,
  KEY `index_mail_zip_on_mail_id` (`mail_id`),
  KEY `index_mail_zip_on_zip` (`zip`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

-- Roughly 5,000,000 rows
CREATE TABLE `geographies` (
  `city_id` int(11) default NULL,
  `postal_code` int(11) default NULL,
  KEY `index_geographies_on_city_id` (`city_id`),
  KEY `index_geographies_on_postal_code` (`postal_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

-- Query
select mz.mail_id from mail_zip mz join geographies g on mz.zip = g.postal_code where g.city_id = 36 limit 10;

更新2：好吧，我撒谎了。使用正确的索引，上述查询可以正常工作。问题实际上是 order by 子句。请参阅下面两个几乎相同的查询：唯一的区别是“order by m.sent_on desc”，这为查询增加了额外的 4 分 30 秒！另外，使用解释，添加顺序使用文件排序，这一定是减慢速度的原因。但是，sent_on 是有索引的，那么为什么它不使用索引呢？我一定没有正确建立索引。

-- Roughly 350,000 rows
CREATE TABLE `mail` (
  `id` int(11) NOT NULL auto_increment,
  `sent_on` datetime default NULL,
  `title` varchar(255) default NULL,
  PRIMARY KEY  (`id`),
  KEY `index_mail_on_sent_on` (`sent_on`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1

-- Runs in 0.19 seconds
-- Query
select distinct(m.id), m.title from mail m join mail_zip mz on mz.mail_id = m.id join geographies g on g.postal_code = mz.zip where g.city_id = 36 limit 10;

+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+-----------------------+
| id | select_type | table | type   | possible_keys                                          | key     | key_len | ref                  | rows    | Extra                 |
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+-----------------------+
|  1 | SIMPLE      | mz    | ALL    | index_mail_zip_on_com_id,index_mail_zip_on_zip         | NULL    | NULL    | NULL                 | 5260053 | Using temporary       | 
|  1 | SIMPLE      | m     | eq_ref | PRIMARY                                                | PRIMARY | 4       |            mz.com_id |       1 |                       | 
|  1 | SIMPLE      | g     | ref    | index_geographies_on_city_id,zip                       | zip     | 5       |            mz.zip    |       1 | Using where; Distinct | 
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+-----------------------+

-- Runs in 4 minutes and 30 seconds
-- Query
select distinct(m.id), m.title from mail m join mail_zip mz on mz.mail_id = m.id join geographies g on g.postal_code = mz.zip where g.city_id = 36 order by m.sent_on desc limit 10;

+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+---------------------------------+
| id | select_type | table | type   | possible_keys                                          | key     | key_len | ref                  | rows    | Extra                           |
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+---------------------------------+
|  1 | SIMPLE      | mz    | ALL    | index_mail_zip_on_com_id,index_mail_zip_on_zip         | NULL    | NULL    | NULL                 | 5260053 | Using temporary; Using filesort | 
|  1 | SIMPLE      | m     | eq_ref | PRIMARY                                                | PRIMARY | 4       |            mz.com_id |       1 |                                 | 
|  1 | SIMPLE      | g     | ref    | index_geographies_on_city_id,zip                       | zip     | 5       |            mz.zip    |       1 | Using where; Distinct           | 
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+---------------------------------+

原文

I have one table that maps locations to postal codes. For example, New York State has about 2000 postal codes. I have another table that maps mail to the postal codes it was sent to, but this table has about 5 million rows. I want to find all the mail that was sent to New York State, which seems simple enough, but the query is unbelievably slow. I haven't been able to even wait long enough for it to finish. Is the problem that there are 5 million rows? I can't help but think that 5 million shouldn't be such a large number for a computer these days... Oh, and everything is indexed. Is SQL just not designed to handle such large joins?

UPDATE: as people have asked, I've updated this question with the table definitions and the query that I'm using.

-- Roughly 70,000 rows
CREATE TABLE `mail_zip` (
  `mail_id` int(11) default NULL,
  `zip` int(11) default NULL,
  KEY `index_mail_zip_on_mail_id` (`mail_id`),
  KEY `index_mail_zip_on_zip` (`zip`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

-- Roughly 5,000,000 rows
CREATE TABLE `geographies` (
  `city_id` int(11) default NULL,
  `postal_code` int(11) default NULL,
  KEY `index_geographies_on_city_id` (`city_id`),
  KEY `index_geographies_on_postal_code` (`postal_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

-- Query
select mz.mail_id from mail_zip mz join geographies g on mz.zip = g.postal_code where g.city_id = 36 limit 10;

UPDATE 2: okay, I lied. With the proper indices, the above query works fine. The problem is actually the order by clause. See the two nearly identical queries below: the only difference is "order by m.sent_on desc" which adds an extra 4 minutes and 30 seconds to the query! Also, using explain, adding the order by uses a filesort which must be what's slowing it down. However, sent_on is indexed, so why isn't it using the index? I must not be making the index properly.

-- Roughly 350,000 rows
CREATE TABLE `mail` (
  `id` int(11) NOT NULL auto_increment,
  `sent_on` datetime default NULL,
  `title` varchar(255) default NULL,
  PRIMARY KEY  (`id`),
  KEY `index_mail_on_sent_on` (`sent_on`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1

-- Runs in 0.19 seconds
-- Query
select distinct(m.id), m.title from mail m join mail_zip mz on mz.mail_id = m.id join geographies g on g.postal_code = mz.zip where g.city_id = 36 limit 10;

+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+-----------------------+
| id | select_type | table | type   | possible_keys                                          | key     | key_len | ref                  | rows    | Extra                 |
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+-----------------------+
|  1 | SIMPLE      | mz    | ALL    | index_mail_zip_on_com_id,index_mail_zip_on_zip         | NULL    | NULL    | NULL                 | 5260053 | Using temporary       | 
|  1 | SIMPLE      | m     | eq_ref | PRIMARY                                                | PRIMARY | 4       |            mz.com_id |       1 |                       | 
|  1 | SIMPLE      | g     | ref    | index_geographies_on_city_id,zip                       | zip     | 5       |            mz.zip    |       1 | Using where; Distinct | 
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+-----------------------+

-- Runs in 4 minutes and 30 seconds
-- Query
select distinct(m.id), m.title from mail m join mail_zip mz on mz.mail_id = m.id join geographies g on g.postal_code = mz.zip where g.city_id = 36 order by m.sent_on desc limit 10;

+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+---------------------------------+
| id | select_type | table | type   | possible_keys                                          | key     | key_len | ref                  | rows    | Extra                           |
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+---------------------------------+
|  1 | SIMPLE      | mz    | ALL    | index_mail_zip_on_com_id,index_mail_zip_on_zip         | NULL    | NULL    | NULL                 | 5260053 | Using temporary; Using filesort | 
|  1 | SIMPLE      | m     | eq_ref | PRIMARY                                                | PRIMARY | 4       |            mz.com_id |       1 |                                 | 
|  1 | SIMPLE      | g     | ref    | index_geographies_on_city_id,zip                       | zip     | 5       |            mz.zip    |       1 | Using where; Distinct           | 
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+----------------------+---------+---------------------------------+

分享到QQ

分享到微博