mysql 子查询结果的简单聚合速度很慢

发布于 2024-09-16 04:47:03 字数 2271 浏览 5 评论 0原文

我正在尝试根据购买的商品数量对所有客户进行“分类”,并显示每个分类的数量。我试图查看有多少人(account_id)购买了一件商品,有多少人购买了两件商品,一直到九件商品,然后是十件或更多。

这是我正在使用的查询 - 就其价值而言,我希望查询对 sales 进行全表扫描以生成结果,但整个过程需要很长时间!

我有 Oracle 背景,我像在 Oracle 中一样编写查询。

 select  thecnt
      ,  count(*) 
   from  (select  count(*)
               ,  case when count(*) >= 10 then 'tenormore' else cast(count(*) as char) end thecnt
            from  sales
           where  created >= SUBDATE( CURRENT_DATE(), INTERVAL 60 DAY )
        group by  account_id) sub
group by  thecnt
order by  thecnt;

mysql在处理子查询时有什么陷阱吗?

解释计划

+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
| id | select_type | table             | type  | possible_keys | key     | key_len | ref  |     rows    | filtered | Extra                                                     |
+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
|  1 | PRIMARY     | <derived2>        | ALL   | NULL          | NULL    | NULL    | NULL | 2143248 |   100.00 | Using temporary; Using filesort                           |
|  2 | DERIVED     | sales             | range | created       | created | 8       | NULL | 2012492 |   100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
2 rows in set, 1 warning (1 hour 4 min 6.14 sec)


mysql> describe sales;
+-----------------+---------------------+------+-----+---------+-------+
| Field           | Type                | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------+-------+
| account_id      | char(36)            | NO   | PRI | NULL    |       |
| created         | datetime            | NO   | MUL | NULL    |       |
| histogram_value | bigint(20) unsigned | NO   | PRI | NULL    |       |
+-----------------+---------------------+------+-----+---------+-------+

I'm trying to 'bin' all of my customers based on the number of items purchased, and display the counts of each bin. I'm trying to see how many people(account_id) purchased one item, how many purchased two items, all the way through nine items, and then ten or more.

Here's the query I'm using - for what its worth, I'd expect the query to do a full-table-scan on sales in order to generate the results, but the whole process takes forever!

I'm coming from an Oracle background and I wrote the query as I would in Oracle.

 select  thecnt
      ,  count(*) 
   from  (select  count(*)
               ,  case when count(*) >= 10 then 'tenormore' else cast(count(*) as char) end thecnt
            from  sales
           where  created >= SUBDATE( CURRENT_DATE(), INTERVAL 60 DAY )
        group by  account_id) sub
group by  thecnt
order by  thecnt;

are there any gotchas in mysql when dealing with subqueries?

explain plan

+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
| id | select_type | table             | type  | possible_keys | key     | key_len | ref  |     rows    | filtered | Extra                                                     |
+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
|  1 | PRIMARY     | <derived2>        | ALL   | NULL          | NULL    | NULL    | NULL | 2143248 |   100.00 | Using temporary; Using filesort                           |
|  2 | DERIVED     | sales             | range | created       | created | 8       | NULL | 2012492 |   100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+-------+---------------+---------+---------+------+---------+----------+-----------------------------------------------------------+
2 rows in set, 1 warning (1 hour 4 min 6.14 sec)


mysql> describe sales;
+-----------------+---------------------+------+-----+---------+-------+
| Field           | Type                | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------+-------+
| account_id      | char(36)            | NO   | PRI | NULL    |       |
| created         | datetime            | NO   | MUL | NULL    |       |
| histogram_value | bigint(20) unsigned | NO   | PRI | NULL    |       |
+-----------------+---------------------+------+-----+---------+-------+

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

青朷 2024-09-23 04:47:03

我没有发现你的查询有什么特别错误的地方。查询之所以慢是因为需要使用临时表和文件排序。真正加快此查询速度的唯一方法是修改 MySQL 设置以分配更多内存,从而避免在这些进程中使用磁盘。 这里有一篇文章介绍了相关设置。


编辑:一旦你这样做,您还可以通过指定要计数的精确列而不是 COUNT(*) 来节省内存,以及其他一些细微的调整,正如其他一些人提到的那样。您希望获得尽可能小的数据集,以充分利用您的内存。但我认为除非您分配更多内存,否则总体问题不会消失。

I don't see anything particularly wrong with your query. The reason why the query is slow is because it needs to use temporary tables and filesort. The only way to seriously speed up this query will be modify your MySQL settings to allocate more memory, so as to avoid using the disk for these processes. Here's a spot on article covering the pertinent settings.


Edit: Once you do this, you can also save memory by specifying an exact column to count instead of COUNT(*), and a few other minor tweaks, as some of the others have mentioned. You want to get as small a data set as necessary to make the most of your memory. But I think the overall issue won't go away unless you allocate more memory.

拥有 2024-09-23 04:47:03

您可能缺少正确的索引。

编辑:

您的查询速度很慢,因为子查询结果不适合内存并且正在使用磁盘上的临时表。

因此,您将受益于 (account_id, created) 上的索引,该索引可以防止它使用磁盘上的 tmp 表进行子查询(如果使用)

ALTER TABLE sales ADD INDEX ix_acc_cre (account_id, created)

You probably are missing proper indexes.

EDIT:

Your query is slow because the subquerys resut dont fit into memory and temporary table on disk is being used.

So you would benefit from index on (account_id, created) which prevents it using tmp table on disk for subquery, if used

ALTER TABLE sales ADD INDEX ix_acc_cre (account_id, created)
意犹 2024-09-23 04:47:03

如果 MySQL 可以从索引中提取数据而不是查看实际行,那么索引在全表扫描中会很有用。您不应该在这里需要子查询:

SELECT COUNT(account_id) AS thecnt, 
     IF(COUNT(account_id) < 10, COUNT(account_id), 'tenormore')
   FROM sales
     WHERE created >= SUBDATE( CURRENT_DATE(), INTERVAL 60 DAY )
   GROUP BY account_id 
   ORDER BY thecnt DESC

希望这有帮助。

An index can be useful in a full table scan if MySQL can extract the data out of the index instead of looking at the actual rows. You shouldn't need the subquery here:

SELECT COUNT(account_id) AS thecnt, 
     IF(COUNT(account_id) < 10, COUNT(account_id), 'tenormore')
   FROM sales
     WHERE created >= SUBDATE( CURRENT_DATE(), INTERVAL 60 DAY )
   GROUP BY account_id 
   ORDER BY thecnt DESC

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文