mysql 子查询性能

发布于 2024-11-25 23:40:06 字数 1835 浏览 2 评论 0原文

我的 SQL(带有子查询)花了很长时间(接近 24 小时)。使用子查询对性能不利吗?

我的表如下

mysql> show create table eventnew;
CREATE TABLE `eventnew` (
  `id` int(50) NOT NULL AUTO_INCREMENT,
  `date` datetime DEFAULT NULL,
  `src_ip` int(10) unsigned DEFAULT NULL,
  `src_port` int(10) unsigned DEFAULT NULL,
  `dst_ip` int(10) unsigned DEFAULT NULL,
  `dst_port` int(10) unsigned DEFAULT NULL,
  `repo_ip` varchar(50) DEFAULT NULL,
  `link` varchar(50) DEFAULT NULL,
  `binary_hash` varchar(50) DEFAULT NULL,
  `sensor_id` varchar(50) DEFAULT NULL,
  `repox_ip` int(10) unsigned DEFAULT NULL,
  `flags` varchar(50) DEFAULT NULL,
  `shellcode` varchar(1000) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `date` (`date`),
  KEY `sensor_id` (`sensor_id`),
  KEY `src_ip` (`src_ip`)
) ENGINE=MyISAM AUTO_INCREMENT=883278 DEFAULT CHARSET=latin1

我的 SQL 如下:

SELECT COUNT( DISTINCT binary_hash ) AS cnt
FROM eventnew
WHERE DATE >=  '2010-10-16'
AND DATE <  '2010-10-17'
AND binary_hash NOT 
IN (

SELECT DISTINCT binary_hash
FROM eventnew
WHERE DATE <  '2010-10-16'
AND binary_hash IS NOT NULL
)

下面是运行结果解释:

+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type        | table    | type  | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
|  1 | PRIMARY            | eventnew | range | date          | date | 9       | NULL |  14296 | Using where |
|  2 | DEPENDENT SUBQUERY | eventnew | range | date          | date | 9       | NULL | 384974 | Using where |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+

my SQL(with sub-queries) take so long(nearly 24hour). Is using sub-queries is not good for performance?

My table as below

mysql> show create table eventnew;
CREATE TABLE `eventnew` (
  `id` int(50) NOT NULL AUTO_INCREMENT,
  `date` datetime DEFAULT NULL,
  `src_ip` int(10) unsigned DEFAULT NULL,
  `src_port` int(10) unsigned DEFAULT NULL,
  `dst_ip` int(10) unsigned DEFAULT NULL,
  `dst_port` int(10) unsigned DEFAULT NULL,
  `repo_ip` varchar(50) DEFAULT NULL,
  `link` varchar(50) DEFAULT NULL,
  `binary_hash` varchar(50) DEFAULT NULL,
  `sensor_id` varchar(50) DEFAULT NULL,
  `repox_ip` int(10) unsigned DEFAULT NULL,
  `flags` varchar(50) DEFAULT NULL,
  `shellcode` varchar(1000) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `date` (`date`),
  KEY `sensor_id` (`sensor_id`),
  KEY `src_ip` (`src_ip`)
) ENGINE=MyISAM AUTO_INCREMENT=883278 DEFAULT CHARSET=latin1

my SQL as below:

SELECT COUNT( DISTINCT binary_hash ) AS cnt
FROM eventnew
WHERE DATE >=  '2010-10-16'
AND DATE <  '2010-10-17'
AND binary_hash NOT 
IN (

SELECT DISTINCT binary_hash
FROM eventnew
WHERE DATE <  '2010-10-16'
AND binary_hash IS NOT NULL
)

below are result running EXPLAIN:

+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type        | table    | type  | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
|  1 | PRIMARY            | eventnew | range | date          | date | 9       | NULL |  14296 | Using where |
|  2 | DEPENDENT SUBQUERY | eventnew | range | date          | date | 9       | NULL | 384974 | Using where |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

独留℉清风醉 2024-12-02 23:40:06

使用子查询肯定会影响您的性能。例如,假设表 T1 有 'n' 条记录,T2 有 'm' 条记录。当您对 T1 和 T2 进行联接时,它将获取 n*m 条记录,然后根据您的条件对它们进行排序。同样的情况也适用于 in 关键字。如果子查询中有另一个约束,则会进一步降低效率。然而,在实践中使用子查询是不可避免的。

Using subqueries certainly does affect your performance. For instance, lets say a Table T1 has 'n' records and T2 has 'm' records. when you do a join on T1 and T2, it will take n*m records and then will sort them based on your condition. The same case goes with in keyword as well. and if you have another constraint in subquery, it would further decrease the efficiency. However, using subqueries couldn't be avoided in practice as they are meant to be.

找回味觉 2024-12-02 23:40:06

我建议您使用 NOT EXISTS 而不是 NOT IN

I'd suggest you use NOT EXISTS instead of NOT IN.

星星的軌跡 2024-12-02 23:40:06

试试这个

SELECT COUNT( DISTINCT a.binary_hash ) AS cnt
FROM eventnew a left join eventnew b on (a.binary_hash=b.binary_hash AND b.binary_hash IS NOT NULL AND b.DATE <  '2010-10-16')
WHERE a.DATE >=  '2010-10-16'
AND a.DATE <  '2010-10-17'
and  b.date is null

Try this

SELECT COUNT( DISTINCT a.binary_hash ) AS cnt
FROM eventnew a left join eventnew b on (a.binary_hash=b.binary_hash AND b.binary_hash IS NOT NULL AND b.DATE <  '2010-10-16')
WHERE a.DATE >=  '2010-10-16'
AND a.DATE <  '2010-10-17'
and  b.date is null
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文