对两个整数列的查询花费了荒谬的时间

发布于 2024-12-05 04:34:12 字数 3745 浏览 1 评论 0原文

我有一个(由 Django )生成的查询,如下所示:

SELECT `geo_ip`.`id`, `geo_ip`.`start_ip`,
       `geo_ip`.`end_ip`, `geo_ip`.`start`,
       `geo_ip`.`end`, `geo_ip`.`cc`, `geo_ip`.`cn`
FROM `geo_ip`
WHERE (`geo_ip`.`start` <= 2084738290 AND `geo_ip`.`end` >= 2084738290 )
LIMIT 1

它查询包含 134189 个条目的 GeoLocating 表。添加索引后,每个查询需要 100 毫秒以上的时间来执行,这使得它无法用于一次性的事情。我将缓存响应,这样我只需进行一次 IP 查找,但我很好奇我是否错过了一些使其速度更快的明显方法。我的表:

CREATE TABLE `geo_ip` (
  `start_ip` char(15) NOT NULL,
  `end_ip` char(15) NOT NULL,
  `start` bigint(20) NOT NULL,
  `end` bigint(20) NOT NULL,
  `cc` varchar(6) NOT NULL,
  `cn` varchar(150) NOT NULL,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=134190 DEFAULT CHARSET=latin1

像这样在两列上创建索引:

ALTER TABLE geo_ip ADD INDEX (start, end);

给出以下解释:

EXPLAIN SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip,
               geo_ip.start, geo_ip.end, geo_ip.cc, geo_ip.cn
FROM geo_ip
WHERE (geo_ip.end >= 2084738290 AND geo_ip.start < 2084738290)
LIMIT 1;
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
| id | select_type | table  | type  | possible_keys | key   | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | geo_ip | range | start         | start | 8       | NULL | 67005 |   100.00 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+

完成选择需要超过 100 毫秒:

SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip,
       geo_ip.start, geo_ip.end, geo_ip.cc,
       geo_ip.cn
FROM geo_ip
WHERE (geo_ip.end >= 2084738290 and geo_ip.start < 2084738290)
LIMIT 1;
+-------+--------------+----------------+------------+------------+----+-----------+
| id    | start_ip     | end_ip         | start      | end        | cc | cn        |
+-------+--------------+----------------+------------+------------+----+-----------+
| 51725 | 124.66.128.0 | 124.66.159.255 | 2084732928 | 2084741119 | SG | Singapore |
+-------+--------------+----------------+------------+------------+----+-----------+
1 row in set (0.18 sec)

比拥有单个索引更昂贵:

ALTER TABLE geo_ip ADD INDEX (`start`);
ALTER TABLE geo_ip ADD INDEX (`end`);
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
| id | select_type | table  | type  | possible_keys | key   | key_len | ref  | rows  | Extra       |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
|  1 | SIMPLE      | geo_ip | range | start,end     | start | 8       | NULL | 68017 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+

完成这些请求大约需要 100 毫秒:

SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip, geo_ip.start, geo_ip.end, geo_ip.cc, geo_ip.cn FROM geo_ip
WHERE (geo_ip.end >= 2084738290 AND geo_ip.start < 2084738290) limit 1;
+-------+--------------+----------------+------------+------------+----+-----------+
| id    | start_ip     | end_ip         | start      | end        | cc | cn        |
+-------+--------------+----------------+------------+------------+----+-----------+
| 51725 | 124.66.128.0 | 124.66.159.255 | 2084732928 | 2084741119 | SG | Singapore |
+-------+--------------+----------------+------------+------------+----+-----------+
1 row in set (0.11 sec)

但这两种方法时间太长了,有什么办法可以解决这个问题吗?

I have a query that gets generated (by Django) like this:

SELECT `geo_ip`.`id`, `geo_ip`.`start_ip`,
       `geo_ip`.`end_ip`, `geo_ip`.`start`,
       `geo_ip`.`end`, `geo_ip`.`cc`, `geo_ip`.`cn`
FROM `geo_ip`
WHERE (`geo_ip`.`start` <= 2084738290 AND `geo_ip`.`end` >= 2084738290 )
LIMIT 1

It queries a GeoLocating table with 134189 entries in it. Each query takes >100ms to perform when indexes are added, which makes it unusable for more than one-off things. I'm going to cache the response so I only have to do the IP lookup once, but I'm curious if I'm missing some obvious way of making it a magnitude faster. My table:

CREATE TABLE `geo_ip` (
  `start_ip` char(15) NOT NULL,
  `end_ip` char(15) NOT NULL,
  `start` bigint(20) NOT NULL,
  `end` bigint(20) NOT NULL,
  `cc` varchar(6) NOT NULL,
  `cn` varchar(150) NOT NULL,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=134190 DEFAULT CHARSET=latin1

Creating an index on both columns like so:

ALTER TABLE geo_ip ADD INDEX (start, end);

Gives the following explain:

EXPLAIN SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip,
               geo_ip.start, geo_ip.end, geo_ip.cc, geo_ip.cn
FROM geo_ip
WHERE (geo_ip.end >= 2084738290 AND geo_ip.start < 2084738290)
LIMIT 1;
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
| id | select_type | table  | type  | possible_keys | key   | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | geo_ip | range | start         | start | 8       | NULL | 67005 |   100.00 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+

It takes well over 100ms to complete selects:

SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip,
       geo_ip.start, geo_ip.end, geo_ip.cc,
       geo_ip.cn
FROM geo_ip
WHERE (geo_ip.end >= 2084738290 and geo_ip.start < 2084738290)
LIMIT 1;
+-------+--------------+----------------+------------+------------+----+-----------+
| id    | start_ip     | end_ip         | start      | end        | cc | cn        |
+-------+--------------+----------------+------------+------------+----+-----------+
| 51725 | 124.66.128.0 | 124.66.159.255 | 2084732928 | 2084741119 | SG | Singapore |
+-------+--------------+----------------+------------+------------+----+-----------+
1 row in set (0.18 sec)

Is more expensive than having a single individual index:

ALTER TABLE geo_ip ADD INDEX (`start`);
ALTER TABLE geo_ip ADD INDEX (`end`);
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
| id | select_type | table  | type  | possible_keys | key   | key_len | ref  | rows  | Extra       |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
|  1 | SIMPLE      | geo_ip | range | start,end     | start | 8       | NULL | 68017 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+

It takes around 100ms to complete these requests:

SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip, geo_ip.start, geo_ip.end, geo_ip.cc, geo_ip.cn FROM geo_ip
WHERE (geo_ip.end >= 2084738290 AND geo_ip.start < 2084738290) limit 1;
+-------+--------------+----------------+------------+------------+----+-----------+
| id    | start_ip     | end_ip         | start      | end        | cc | cn        |
+-------+--------------+----------------+------------+------------+----+-----------+
| 51725 | 124.66.128.0 | 124.66.159.255 | 2084732928 | 2084741119 | SG | Singapore |
+-------+--------------+----------------+------------+------------+----+-----------+
1 row in set (0.11 sec)

But both of these methods take way too long, is it possible to do anything about this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

乖乖公主 2024-12-12 04:34:12

时间总是消耗在“where”子句中。

而且由于您正在处理“低于”或“大于”的两个不同字段,因此它必须读取大量索引才能找出您想要的记录。

我应该这样完成我的表格:

+-------+-------+----------------+------------+----+-----------+
| id    | type  | ip             | geo        | cc | cn        |
+-------+-------+----------------+------------+----+-----------+
| 51725 | start | 124.66.159.255 | 2084732928 | SG | Singapore |
+-------+-------+----------------+------------+----+-----------+
| 51726 | end   | 124.66.159.255 | 2084732928 | SG | Singapore |
+-------+-------+----------------+------------+----+-----------+

这样我就可以选择这个:

select * from table where geo between '2084732927' and '2084732928'

在地理上有一个索引。
应该快得多。但抱歉,我没有时间尝试。

Time is always consumed in the "where" clause.

And because you are working on two different fields with "lower than" or "greater than", it has to read a lot of indexes to find out which record is the one you want.

I should have done my table this way :

+-------+-------+----------------+------------+----+-----------+
| id    | type  | ip             | geo        | cc | cn        |
+-------+-------+----------------+------------+----+-----------+
| 51725 | start | 124.66.159.255 | 2084732928 | SG | Singapore |
+-------+-------+----------------+------------+----+-----------+
| 51726 | end   | 124.66.159.255 | 2084732928 | SG | Singapore |
+-------+-------+----------------+------------+----+-----------+

so that I can select this :

select * from table where geo between '2084732927' and '2084732928'

with an index on geo.
Should be much, much faster. But sorry, I have no time to try.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文