对两个整数列的查询花费了荒谬的时间
我有一个(由 Django )生成的查询,如下所示:
SELECT `geo_ip`.`id`, `geo_ip`.`start_ip`,
`geo_ip`.`end_ip`, `geo_ip`.`start`,
`geo_ip`.`end`, `geo_ip`.`cc`, `geo_ip`.`cn`
FROM `geo_ip`
WHERE (`geo_ip`.`start` <= 2084738290 AND `geo_ip`.`end` >= 2084738290 )
LIMIT 1
它查询包含 134189 个条目的 GeoLocating 表。添加索引后,每个查询需要 100 毫秒以上的时间来执行,这使得它无法用于一次性的事情。我将缓存响应,这样我只需进行一次 IP 查找,但我很好奇我是否错过了一些使其速度更快的明显方法。我的表:
CREATE TABLE `geo_ip` (
`start_ip` char(15) NOT NULL,
`end_ip` char(15) NOT NULL,
`start` bigint(20) NOT NULL,
`end` bigint(20) NOT NULL,
`cc` varchar(6) NOT NULL,
`cn` varchar(150) NOT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=134190 DEFAULT CHARSET=latin1
像这样在两列上创建索引:
ALTER TABLE geo_ip ADD INDEX (start, end);
给出以下解释:
EXPLAIN SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip,
geo_ip.start, geo_ip.end, geo_ip.cc, geo_ip.cn
FROM geo_ip
WHERE (geo_ip.end >= 2084738290 AND geo_ip.start < 2084738290)
LIMIT 1;
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | geo_ip | range | start | start | 8 | NULL | 67005 | 100.00 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
完成选择需要超过 100 毫秒:
SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip,
geo_ip.start, geo_ip.end, geo_ip.cc,
geo_ip.cn
FROM geo_ip
WHERE (geo_ip.end >= 2084738290 and geo_ip.start < 2084738290)
LIMIT 1;
+-------+--------------+----------------+------------+------------+----+-----------+
| id | start_ip | end_ip | start | end | cc | cn |
+-------+--------------+----------------+------------+------------+----+-----------+
| 51725 | 124.66.128.0 | 124.66.159.255 | 2084732928 | 2084741119 | SG | Singapore |
+-------+--------------+----------------+------------+------------+----+-----------+
1 row in set (0.18 sec)
比拥有单个索引更昂贵:
ALTER TABLE geo_ip ADD INDEX (`start`);
ALTER TABLE geo_ip ADD INDEX (`end`);
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
| 1 | SIMPLE | geo_ip | range | start,end | start | 8 | NULL | 68017 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
完成这些请求大约需要 100 毫秒:
SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip, geo_ip.start, geo_ip.end, geo_ip.cc, geo_ip.cn FROM geo_ip
WHERE (geo_ip.end >= 2084738290 AND geo_ip.start < 2084738290) limit 1;
+-------+--------------+----------------+------------+------------+----+-----------+
| id | start_ip | end_ip | start | end | cc | cn |
+-------+--------------+----------------+------------+------------+----+-----------+
| 51725 | 124.66.128.0 | 124.66.159.255 | 2084732928 | 2084741119 | SG | Singapore |
+-------+--------------+----------------+------------+------------+----+-----------+
1 row in set (0.11 sec)
但这两种方法时间太长了,有什么办法可以解决这个问题吗?
I have a query that gets generated (by Django) like this:
SELECT `geo_ip`.`id`, `geo_ip`.`start_ip`,
`geo_ip`.`end_ip`, `geo_ip`.`start`,
`geo_ip`.`end`, `geo_ip`.`cc`, `geo_ip`.`cn`
FROM `geo_ip`
WHERE (`geo_ip`.`start` <= 2084738290 AND `geo_ip`.`end` >= 2084738290 )
LIMIT 1
It queries a GeoLocating table with 134189 entries in it. Each query takes >100ms to perform when indexes are added, which makes it unusable for more than one-off things. I'm going to cache the response so I only have to do the IP lookup once, but I'm curious if I'm missing some obvious way of making it a magnitude faster. My table:
CREATE TABLE `geo_ip` (
`start_ip` char(15) NOT NULL,
`end_ip` char(15) NOT NULL,
`start` bigint(20) NOT NULL,
`end` bigint(20) NOT NULL,
`cc` varchar(6) NOT NULL,
`cn` varchar(150) NOT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=134190 DEFAULT CHARSET=latin1
Creating an index on both columns like so:
ALTER TABLE geo_ip ADD INDEX (start, end);
Gives the following explain:
EXPLAIN SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip,
geo_ip.start, geo_ip.end, geo_ip.cc, geo_ip.cn
FROM geo_ip
WHERE (geo_ip.end >= 2084738290 AND geo_ip.start < 2084738290)
LIMIT 1;
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | geo_ip | range | start | start | 8 | NULL | 67005 | 100.00 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
It takes well over 100ms to complete selects:
SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip,
geo_ip.start, geo_ip.end, geo_ip.cc,
geo_ip.cn
FROM geo_ip
WHERE (geo_ip.end >= 2084738290 and geo_ip.start < 2084738290)
LIMIT 1;
+-------+--------------+----------------+------------+------------+----+-----------+
| id | start_ip | end_ip | start | end | cc | cn |
+-------+--------------+----------------+------------+------------+----+-----------+
| 51725 | 124.66.128.0 | 124.66.159.255 | 2084732928 | 2084741119 | SG | Singapore |
+-------+--------------+----------------+------------+------------+----+-----------+
1 row in set (0.18 sec)
Is more expensive than having a single individual index:
ALTER TABLE geo_ip ADD INDEX (`start`);
ALTER TABLE geo_ip ADD INDEX (`end`);
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
| 1 | SIMPLE | geo_ip | range | start,end | start | 8 | NULL | 68017 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
It takes around 100ms to complete these requests:
SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip, geo_ip.start, geo_ip.end, geo_ip.cc, geo_ip.cn FROM geo_ip
WHERE (geo_ip.end >= 2084738290 AND geo_ip.start < 2084738290) limit 1;
+-------+--------------+----------------+------------+------------+----+-----------+
| id | start_ip | end_ip | start | end | cc | cn |
+-------+--------------+----------------+------------+------------+----+-----------+
| 51725 | 124.66.128.0 | 124.66.159.255 | 2084732928 | 2084741119 | SG | Singapore |
+-------+--------------+----------------+------------+------------+----+-----------+
1 row in set (0.11 sec)
But both of these methods take way too long, is it possible to do anything about this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
时间总是消耗在“where”子句中。
而且由于您正在处理“低于”或“大于”的两个不同字段,因此它必须读取大量索引才能找出您想要的记录。
我应该这样完成我的表格:
这样我就可以选择这个:
在地理上有一个索引。
应该快得多。但抱歉,我没有时间尝试。
Time is always consumed in the "where" clause.
And because you are working on two different fields with "lower than" or "greater than", it has to read a lot of indexes to find out which record is the one you want.
I should have done my table this way :
so that I can select this :
with an index on geo.
Should be much, much faster. But sorry, I have no time to try.