在 Mysql 查询中填补时间序列空白的最佳方法
我需要填补mysql查询结果集的时间序列的空白。我正在测试使用包含时间序列的所有数据点的辅助表进行外连接的选项(如本线程所示:如何填充 MySQL 中的日期间隙?)。
我遇到的问题是,添加此联接会显着增加查询响应时间(从不到 1 秒增加到 90 秒)。
这是原始查询:
select date_format(fact_data7.date_collected,'%Y-%m') as date_col
, date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
, fact_data7.batch_id,fact_data7.value as fdvalue,entities.ticker as ticker
, date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
, date_format(fact_data7.date_collected,'%Y') as year
from fact_data7
JOIN entities on fact_data7.entity_id=entities.id
where (1=1)
AND ((entities.id= 963
AND fact_data7.metric_id=1
))
AND date_format(fact_data7.date_collected,'%Y-%m') > '2008-01-01'
order by date_col asc
这是添加了辅助表 (month_fill) 的外连接的查询:
select date_format(month_fill.date,'%Y-%m') as date_col
, date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
, fact_data7.batch_id,fact_data7.value as fdvalue
, entities.ticker as ticker
, date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
, date_format(fact_data7.date_collected,'%Y') as year
from fact_data7
JOIN entities
on fact_data7.entity_id=entities.id
RIGHT OUTER JOIN month_fill
on date_format(fact_data7.date_collected,'%Y-%m') = date_format(month_fill.date,'%Y-%m')
where (1=1)
AND (
(entities.id= 963 AND fact_data7.metric_id=1)
OR (entities.id is null and fact_data7.metric_id is null)
)
AND date_format(month_fill.date,'%Y-%m') > '2008-01-01'
order by date_col asc
我可以重构查询以提高性能吗?是否有替代解决方案来实现我正在寻找的功能?
11/15 更新:
这是第一个查询的 EXPLAIN 输出:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE entities const PRIMARY PRIMARY 4 const 1 Using filesort
1 SIMPLE fact_data7 ALL NULL NULL NULL NULL 230636 Using where
这是第二个查询的 EXPLAIN 输出:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE month_fill index NULL date 8 NULL 204 Using where; Using index; Using temporary; Using filesort
1 SIMPLE fact_data7 ALL NULL NULL NULL NULL 230636 Using where
1 SIMPLE entities eq_ref PRIMARY PRIMARY 4 findata.fact_data7.entity_id 1 Using where
I need to fill in the gaps of a time series of a mysql query result set. I'm in the process of testing the option of doing an outer join with a helper table that contains all of the data points of the time series (as indicated in this thread: How to fill date gaps in MySQL?).
The issue I'm running into is that adding this join significantly increases the query response time (it goes from sub 1 sec to 90 seconds).
Here's the original query:
select date_format(fact_data7.date_collected,'%Y-%m') as date_col
, date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
, fact_data7.batch_id,fact_data7.value as fdvalue,entities.ticker as ticker
, date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
, date_format(fact_data7.date_collected,'%Y') as year
from fact_data7
JOIN entities on fact_data7.entity_id=entities.id
where (1=1)
AND ((entities.id= 963
AND fact_data7.metric_id=1
))
AND date_format(fact_data7.date_collected,'%Y-%m') > '2008-01-01'
order by date_col asc
and here is the query with the outer join to the helper table (month_fill) added:
select date_format(month_fill.date,'%Y-%m') as date_col
, date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
, fact_data7.batch_id,fact_data7.value as fdvalue
, entities.ticker as ticker
, date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
, date_format(fact_data7.date_collected,'%Y') as year
from fact_data7
JOIN entities
on fact_data7.entity_id=entities.id
RIGHT OUTER JOIN month_fill
on date_format(fact_data7.date_collected,'%Y-%m') = date_format(month_fill.date,'%Y-%m')
where (1=1)
AND (
(entities.id= 963 AND fact_data7.metric_id=1)
OR (entities.id is null and fact_data7.metric_id is null)
)
AND date_format(month_fill.date,'%Y-%m') > '2008-01-01'
order by date_col asc
Can I restructure the query to improve the performance is there an alternate solution to achieve what I'm looking for?
Update 11/15:
Here's the EXPLAIN output for the 1st query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE entities const PRIMARY PRIMARY 4 const 1 Using filesort
1 SIMPLE fact_data7 ALL NULL NULL NULL NULL 230636 Using where
Here's the EXPLAIN output for the 2nd query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE month_fill index NULL date 8 NULL 204 Using where; Using index; Using temporary; Using filesort
1 SIMPLE fact_data7 ALL NULL NULL NULL NULL 230636 Using where
1 SIMPLE entities eq_ref PRIMARY PRIMARY 4 findata.fact_data7.entity_id 1 Using where
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为值得尝试重写
where
以避免使用date_format(date_collected)
。你说你在这个字段上有一个索引,但它从未被使用过(字段是函数的参数,MySQL不支持基于函数的索引)
I think it's worth trying to rewrite
where
in order not to usedate_format(date_collected)
. You say you have an index on this field, but it's never used(field is an argument of the function,MySQL doesn't support function-based indexes)
即使不考虑重构查询,我也会首先在日期列fact_data7.data_collected 和month_fill.date 上添加索引。范围查询“>”您所做的是减慢进程速度,从理论上讲,添加索引应该会提高性能,但您需要足够的记录,否则管理索引只会减慢速度,因为管理索引涉及处理。
请参阅此 mysql 文档 http://dev.mysql.com/doc /refman/5.0/en/optimization-indexes.html
我不确定您到底想要实现什么,但您可以尝试使用
ifnull(value1,value2)
函数来实现mysql的。您的查询可能类似于以下内容:Without even looking at refactoring the query I would start by adding an index on the date columns fact_data7.data_collected and month_fill.date. The range query ">" you are doing is slowing down the process and adding an index should theoretically speaking increase the performance but you need enough record otherwise managing the index will only slow down because of the processing involved in managing the index.
See this mysql documentation http://dev.mysql.com/doc/refman/5.0/en/optimization-indexes.html
I am not sure exactly what you are trying to achieve but you could try to do it using the
ifnull(value1,value2)
function of mysql. Your query could be somewhat like the following: