优化计数查询还是重构设计?
我正在尝试按范围日期(从/到日期)为用户查看/点击的产品(大约 30k 个将在数据库中重复记录的产品)创建报告系统。
每次用户单击产品时,我都会在数据库中的单行中记录产品 ID 和日期。当我必须选择并显示报告时,我的问题就来了,因为表格在 2 个月内迅速增长到 400 万个,而且我必须保留长达 6 个月的记录。
我的问题是有没有更好的方法来优化查询或记录它们的方式?
数据库表
CREATE TABLE `product_view` (
`id` int(11) NOT NULL,
`product_id` int(11) NOT NULL,
`date_create` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `product_view`
ADD PRIMARY KEY (`id`),
ADD KEY `product_id` (`product_id`) USING BTREE;
我的选择查询没有范围日期需要大约 50 秒才能提取结果
SELECT SQL_NO_CACHE pd.name, pv.product_id, p.model, COUNT(pv.id) as total
FROM product_view pv
LEFT JOIN product p ON p.product_id = pv.product_id
LEFT JOIN product_description pd ON pd.product_id = pv.product_id
WHERE pv.product_id > 0
GROUP BY pv.product_id
ORDER BY total
DESC LIMIT 0,20
查询示例
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE pv range product_id product_id 4 NULL 1647717 Using where; Using index; Using temporary; Using filesort
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 test.pv.product_id 1
1 SIMPLE pd ref PRIMARY,product_id PRIMARY 4 test.pv.product_id 1
查询有范围日期
SELECT SQL_NO_CACHE pd.name, pv.product_id, p.model, COUNT(pv.id) as total
FROM product_view pv
LEFT JOIN product p ON p.product_id = pv.product_id
LEFT JOIN product_description pd ON pd.product_id = pv.product_id
WHERE pv.product_id > 0
AND DATE(pv.date_create) >= '2021-07-25'
AND DATE(pv.date_create) <= '2022-03-10'
GROUP BY pv.product_id
ORDER BY total DESC LIMIT 0,20
Im trying to do report system by range date (from/to date) for products (around 30k products that will repetitive record in DB) viewed/clicked by users.
Each time user click on product i record product_id and date on single row in Database. My problem is coming when i have to select and display report because table quickly grow up to 4 millions in 2 months and i have to keep record up to 6 months.
My question is there any better way to optimize the query or the way i record them?
DB Table
CREATE TABLE `product_view` (
`id` int(11) NOT NULL,
`product_id` int(11) NOT NULL,
`date_create` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `product_view`
ADD PRIMARY KEY (`id`),
ADD KEY `product_id` (`product_id`) USING BTREE;
My select query without range date take around 50 seconds to pull results
SELECT SQL_NO_CACHE pd.name, pv.product_id, p.model, COUNT(pv.id) as total
FROM product_view pv
LEFT JOIN product p ON p.product_id = pv.product_id
LEFT JOIN product_description pd ON pd.product_id = pv.product_id
WHERE pv.product_id > 0
GROUP BY pv.product_id
ORDER BY total
DESC LIMIT 0,20
Query EXAMPlE
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE pv range product_id product_id 4 NULL 1647717 Using where; Using index; Using temporary; Using filesort
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 test.pv.product_id 1
1 SIMPLE pd ref PRIMARY,product_id PRIMARY 4 test.pv.product_id 1
Query with range date
SELECT SQL_NO_CACHE pd.name, pv.product_id, p.model, COUNT(pv.id) as total
FROM product_view pv
LEFT JOIN product p ON p.product_id = pv.product_id
LEFT JOIN product_description pd ON pd.product_id = pv.product_id
WHERE pv.product_id > 0
AND DATE(pv.date_create) >= '2021-07-25'
AND DATE(pv.date_create) <= '2022-03-10'
GROUP BY pv.product_id
ORDER BY total DESC LIMIT 0,20
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试将查询重写为相关查询:
这不涉及分组,因此它应该比原始查询更快。如果不需要日期过滤器,则从计数子查询中删除
where isn't
部分以及and pv.date_create ...
。其次,我在解释中没有看到任何有用的索引。您应该尝试以下索引:
Try rewriting the query as a correlated query:
This does not involve grouping so it should be faster than your original query. If date filter is not required then remove the
where exists
part andand pv.date_create ...
from the count sub-query.Secondly, I don't see any useful indexes in the explain. You should try the following indexes:
然后 PARTITION BY RANGE(TO_DAYS(date_create)) 使删除更加高效。我建议每周进行 30 次左右的分区。请参阅分区。
COUNT(pv.id)
——通常的模式就是COUNT(*)
。对于 id,它会检查该 id 是否为NOT NULL
,这是不必要的。pv.product_id> 0
-- ids <=0 有什么特别之处吗?让我们重新排列查询,以“开始”计数:
注意:
DATE()
,它不是“可控制的”并且会阻止使用任何索引。AND pv.date_create < CURDATE()
LEFT
暗示“右”行可能丢失;我怀疑情况并非如此。pd
,因为它没有被使用(如果不使用它,它的成本会很高)。SQL_NO_CACHE
将在 8.0 中消失;您现在也可以关闭查询缓存。Then
PARTITION BY RANGE(TO_DAYS(date_create))
to make the deletion much more efficient. I recommend about 30 weekly partitions. See Partition .COUNT(pv.id)
-- The usual pattern is simplyCOUNT(*)
. With an id, it checks that id for beingNOT NULL
, which is unnecessary.pv.product_id > 0
-- is there something special about ids <=0?Let's rearrange the query to "start" with the count:
Note:
DATE()
, it is not "sargable" and prevents the use of any index.AND pv.date_create < CURDATE()
LEFT
implies that the 'right' row may be missing; I suspect that is not the case.pd
since it is not used (and costs a lot to have if it won't be used).SQL_NO_CACHE
is going away in 8.0; you may as well turn off the Query cache now.