MySQL 查询、子查询优化、SELECT、JOIN

发布于 2024-09-14 03:27:16 字数 1281 浏览 4 评论 0原文

我有一个包含一些数据的表，我想为每种类型选择最新的数据...

表：

+----+------+------+---------------------+
| ID | data | type | date                |
+----+------+------+---------------------+
|  1 | just |    2 | 2010-08-07 14:24:48 |
|  2 | some |    2 | 2010-08-07 18:07:32 |
|  3 | data |    9 | 2010-08-06 02:52:17 |
|  4 | abcd |    1 | 2010-08-08 17:23:22 |
|  5 | efg1 |    5 | 2010-07-10 21:36:55 |
|  6 | c123 |    5 | 2010-07-10 20:44:36 |
|  7 | bbey |   12 | 2010-08-09 09:01:26 |
+----+------+------+---------------------+

目前我正在使用简单的子查询，看起来一切正常

SELECT `data`,`type`,`date`
FROM `table1`
WHERE `date` = (
                  SELECT MAX( `date` )
                  FROM `table1` AS tbl2
                  WHERE tbl2.`type` = `table1`.`type`
                )
GROUP BY `type`
ORDER BY `type`,`date`

结果：

+------+------+---------------------+
| data | type | date                |
+------+------+---------------------+
| abcd |    1 | 2010-08-08 17:23:22 |
| some |    2 | 2010-08-07 18:07:32 |
| efg1 |    5 | 2010-07-10 21:36:55 |
| data |    9 | 2010-08-06 02:52:17 |
| bbey |   12 | 2010-08-09 09:01:26 |
+------+------+---------------------+

我的问题是有没有更好的方法来做到这一点，一些优化，改进或者也许可以做出加入？

原文

I have one table with some data and I want select newest data for each type...

Table:

+----+------+------+---------------------+
| ID | data | type | date                |
+----+------+------+---------------------+
|  1 | just |    2 | 2010-08-07 14:24:48 |
|  2 | some |    2 | 2010-08-07 18:07:32 |
|  3 | data |    9 | 2010-08-06 02:52:17 |
|  4 | abcd |    1 | 2010-08-08 17:23:22 |
|  5 | efg1 |    5 | 2010-07-10 21:36:55 |
|  6 | c123 |    5 | 2010-07-10 20:44:36 |
|  7 | bbey |   12 | 2010-08-09 09:01:26 |
+----+------+------+---------------------+

Currently I'm using simple subquery and looks like everything works

SELECT `data`,`type`,`date`
FROM `table1`
WHERE `date` = (
                  SELECT MAX( `date` )
                  FROM `table1` AS tbl2
                  WHERE tbl2.`type` = `table1`.`type`
                )
GROUP BY `type`
ORDER BY `type`,`date`

Result:

+------+------+---------------------+
| data | type | date                |
+------+------+---------------------+
| abcd |    1 | 2010-08-08 17:23:22 |
| some |    2 | 2010-08-07 18:07:32 |
| efg1 |    5 | 2010-07-10 21:36:55 |
| data |    9 | 2010-08-06 02:52:17 |
| bbey |   12 | 2010-08-09 09:01:26 |
+------+------+---------------------+

My question is is there a better way to do this, some optimization, improvement or maybe it's possible to make
join?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风苍溪 2024-09-21 03:27:17

您正在使用相关子查询。子查询依赖于外部查询，因此必须为外部查询的每一行执行一次。

一般来说，可以通过使用子查询作为派生表来改进这一点。由于作为派生表的子查询与外部查询不相关，因此该解决方案被认为更具可扩展性：

SELECT    t1.`data`, t1.`type`, t1.`date`
FROM      `table1` t1
JOIN      (
              SELECT   MAX( `date`) `max_date`, `type`
              FROM     `table1`
              GROUP BY `type`
          ) der_t ON (der_t.`max_date` = t1.`date` AND der_t.`type` = t1.`type`)
GROUP BY  t1.`type`
ORDER BY  t1.`type`, t1.`date`;

测试用例：

CREATE TABLE table1 (id int, data varchar(10), type int, date datetime); 

INSERT INTO table1 VALUES (1, 'just', 2, '2010-08-07 14:24:48');
INSERT INTO table1 VALUES (2, 'some', 2, '2010-08-07 18:07:32');
INSERT INTO table1 VALUES (3, 'data', 9, '2010-08-06 02:52:17');
INSERT INTO table1 VALUES (4, 'abcd', 1, '2010-08-08 17:23:22');
INSERT INTO table1 VALUES (5, 'efg1', 5, '2010-07-10 21:36:55');
INSERT INTO table1 VALUES (6, 'c123', 5, '2010-07-10 20:44:36');
INSERT INTO table1 VALUES (7, 'bbey', 12, '2010-08-09 09:01:26');

结果：

+------+------+---------------------+
| data | type | date                |
+------+------+---------------------+
| abcd |    1 | 2010-08-08 17:23:22 |
| some |    2 | 2010-08-07 18:07:32 |
| efg1 |    5 | 2010-07-10 21:36:55 |
| data |    9 | 2010-08-06 02:52:17 |
| bbey |   12 | 2010-08-09 09:01:26 |
+------+------+---------------------+
5 rows in set (0.00 sec)

通过使用如下解决方案，您似乎也可以完全避免子查询：

SELECT     t1.`data`, t1.`type`, t1.`date`
FROM       `table1` t1
LEFT JOIN  `table1` t2 ON (t1.`date` < t2.`date` AND t1.`type` = t2.`type`)
WHERE      t2.`date` IS NULL
GROUP BY   t1.`type`
ORDER BY   t1.`type`, t1.`date`;

一般来说，此解决方案可以完全避免子查询。其扩展性甚至比使用派生表的解决方案更好，但如果性能至关重要，您可能需要衡量这两种解决方案。 @Naktibalda 提供的文章还提供了一些其他解决方案您可能想测试一下。

You are using a correlated subquery. The subquery is dependent on the outer query, and therefore it has to be executed once for each row of the outer query.

In general, this can be improved by using your subquery as a derived table instead. Since a subquery as a derived table is not correlated to the outer query, this solution is considered more scalable:

SELECT    t1.`data`, t1.`type`, t1.`date`
FROM      `table1` t1
JOIN      (
              SELECT   MAX( `date`) `max_date`, `type`
              FROM     `table1`
              GROUP BY `type`
          ) der_t ON (der_t.`max_date` = t1.`date` AND der_t.`type` = t1.`type`)
GROUP BY  t1.`type`
ORDER BY  t1.`type`, t1.`date`;

Test case:

CREATE TABLE table1 (id int, data varchar(10), type int, date datetime); 

INSERT INTO table1 VALUES (1, 'just', 2, '2010-08-07 14:24:48');
INSERT INTO table1 VALUES (2, 'some', 2, '2010-08-07 18:07:32');
INSERT INTO table1 VALUES (3, 'data', 9, '2010-08-06 02:52:17');
INSERT INTO table1 VALUES (4, 'abcd', 1, '2010-08-08 17:23:22');
INSERT INTO table1 VALUES (5, 'efg1', 5, '2010-07-10 21:36:55');
INSERT INTO table1 VALUES (6, 'c123', 5, '2010-07-10 20:44:36');
INSERT INTO table1 VALUES (7, 'bbey', 12, '2010-08-09 09:01:26');

Result:

+------+------+---------------------+
| data | type | date                |
+------+------+---------------------+
| abcd |    1 | 2010-08-08 17:23:22 |
| some |    2 | 2010-08-07 18:07:32 |
| efg1 |    5 | 2010-07-10 21:36:55 |
| data |    9 | 2010-08-06 02:52:17 |
| bbey |   12 | 2010-08-09 09:01:26 |
+------+------+---------------------+
5 rows in set (0.00 sec)

It also looks like you can avoid subqueries altogether, by using a solution such as the follows:

SELECT     t1.`data`, t1.`type`, t1.`date`
FROM       `table1` t1
LEFT JOIN  `table1` t2 ON (t1.`date` < t2.`date` AND t1.`type` = t2.`type`)
WHERE      t2.`date` IS NULL
GROUP BY   t1.`type`
ORDER BY   t1.`type`, t1.`date`;

In general, this scales even better than the solution with the derived table, but if performance is paramount, you may want to measure both solutions. The article that @Naktibalda provided also provides a few other solutions that you may want to test.

回复收藏 0 原文

~没有更多了~