mysql 子查询结果的简单聚合速度很慢

发布于 2024-09-16 04:47:03 字数 2271 浏览 12 评论 0原文

我正在尝试根据购买的商品数量对所有客户进行“分类”，并显示每个分类的数量。我试图查看有多少人（account_id）购买了一件商品，有多少人购买了两件商品，一直到九件商品，然后是十件或更多。

这是我正在使用的查询 - 就其价值而言，我希望查询对 sales 进行全表扫描以生成结果，但整个过程需要很长时间！

我有 Oracle 背景，我像在 Oracle 中一样编写查询。

 select  thecnt
      ,  count(*) 
   from  (select  count(*)
               ,  case when count(*) >= 10 then 'tenormore' else cast(count(*) as char) end thecnt
            from  sales
           where  created >= SUBDATE( CURRENT_DATE(), INTERVAL 60 DAY )
        group by  account_id) sub
group by  thecnt
order by  thecnt;

mysql在处理子查询时有什么陷阱吗？

解释计划

+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
| id | select_type | table             | type  | possible_keys | key     | key_len | ref  |     rows    | filtered | Extra                                                     |
+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
|  1 | PRIMARY     | <derived2>        | ALL   | NULL          | NULL    | NULL    | NULL | 2143248 |   100.00 | Using temporary; Using filesort                           |
|  2 | DERIVED     | sales             | range | created       | created | 8       | NULL | 2012492 |   100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
2 rows in set, 1 warning (1 hour 4 min 6.14 sec)


mysql> describe sales;
+-----------------+---------------------+------+-----+---------+-------+
| Field           | Type                | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------+-------+
| account_id      | char(36)            | NO   | PRI | NULL    |       |
| created         | datetime            | NO   | MUL | NULL    |       |
| histogram_value | bigint(20) unsigned | NO   | PRI | NULL    |       |
+-----------------+---------------------+------+-----+---------+-------+

原文

I'm trying to 'bin' all of my customers based on the number of items purchased, and display the counts of each bin. I'm trying to see how many people(account_id) purchased one item, how many purchased two items, all the way through nine items, and then ten or more.

Here's the query I'm using - for what its worth, I'd expect the query to do a full-table-scan on sales in order to generate the results, but the whole process takes forever!

I'm coming from an Oracle background and I wrote the query as I would in Oracle.

 select  thecnt
      ,  count(*) 
   from  (select  count(*)
               ,  case when count(*) >= 10 then 'tenormore' else cast(count(*) as char) end thecnt
            from  sales
           where  created >= SUBDATE( CURRENT_DATE(), INTERVAL 60 DAY )
        group by  account_id) sub
group by  thecnt
order by  thecnt;

are there any gotchas in mysql when dealing with subqueries?

explain plan

+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
| id | select_type | table             | type  | possible_keys | key     | key_len | ref  |     rows    | filtered | Extra                                                     |
+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
|  1 | PRIMARY     | <derived2>        | ALL   | NULL          | NULL    | NULL    | NULL | 2143248 |   100.00 | Using temporary; Using filesort                           |
|  2 | DERIVED     | sales             | range | created       | created | 8       | NULL | 2012492 |   100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
2 rows in set, 1 warning (1 hour 4 min 6.14 sec)


mysql> describe sales;
+-----------------+---------------------+------+-----+---------+-------+
| Field           | Type                | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------+-------+
| account_id      | char(36)            | NO   | PRI | NULL    |       |
| created         | datetime            | NO   | MUL | NULL    |       |
| histogram_value | bigint(20) unsigned | NO   | PRI | NULL    |       |
+-----------------+---------------------+------+-----+---------+-------+

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

青朷 2024-09-23 04:47:03

我没有发现你的查询有什么特别错误的地方。查询之所以慢是因为需要使用临时表和文件排序。真正加快此查询速度的唯一方法是修改 MySQL 设置以分配更多内存，从而避免在这些进程中使用磁盘。这里有一篇文章介绍了相关设置。

编辑：一旦你这样做，您还可以通过指定要计数的精确列而不是 COUNT(*) 来节省内存，以及其他一些细微的调整，正如其他一些人提到的那样。您希望获得尽可能小的数据集，以充分利用您的内存。但我认为除非您分配更多内存，否则总体问题不会消失。

回复收藏 0 原文

拥有 2024-09-23 04:47:03

您可能缺少正确的索引。

编辑：

您的查询速度很慢，因为子查询结果不适合内存并且正在使用磁盘上的临时表。

因此，您将受益于 (account_id, created) 上的索引，该索引可以防止它使用磁盘上的 tmp 表进行子查询（如果使用）

ALTER TABLE sales ADD INDEX ix_acc_cre (account_id, created)

You probably are missing proper indexes.

EDIT:

Your query is slow because the subquerys resut dont fit into memory and temporary table on disk is being used.

So you would benefit from index on (account_id, created) which prevents it using tmp table on disk for subquery, if used

ALTER TABLE sales ADD INDEX ix_acc_cre (account_id, created)

回复收藏 0 原文

意犹 2024-09-23 04:47:03

如果 MySQL 可以从索引中提取数据而不是查看实际行，那么索引在全表扫描中会很有用。您不应该在这里需要子查询：

SELECT COUNT(account_id) AS thecnt, 
     IF(COUNT(account_id) < 10, COUNT(account_id), 'tenormore')
   FROM sales
     WHERE created >= SUBDATE( CURRENT_DATE(), INTERVAL 60 DAY )
   GROUP BY account_id 
   ORDER BY thecnt DESC

希望这有帮助。

An index can be useful in a full table scan if MySQL can extract the data out of the index instead of looking at the actual rows. You shouldn't need the subquery here:

SELECT COUNT(account_id) AS thecnt, 
     IF(COUNT(account_id) < 10, COUNT(account_id), 'tenormore')
   FROM sales
     WHERE created >= SUBDATE( CURRENT_DATE(), INTERVAL 60 DAY )
   GROUP BY account_id 
   ORDER BY thecnt DESC

Hope this helps.

回复收藏 0 原文

~没有更多了~