在 Mysql 查询中填补时间序列空白的最佳方法

发布于 2024-12-15 05:24:32 字数 2532 浏览 1 评论 0原文

我需要填补mysql查询结果集的时间序列的空白。我正在测试使用包含时间序列的所有数据点的辅助表进行外连接的选项（如本线程所示：如何填充 MySQL 中的日期间隙？）。

我遇到的问题是，添加此联接会显着增加查询响应时间（从不到 1 秒增加到 90 秒）。

这是原始查询：

select date_format(fact_data7.date_collected,'%Y-%m') as date_col
   , date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
   , fact_data7.batch_id,fact_data7.value as fdvalue,entities.ticker as ticker
   , date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
   , date_format(fact_data7.date_collected,'%Y') as year 
from fact_data7  
JOIN entities on fact_data7.entity_id=entities.id  
where (1=1)
  AND ((entities.id= 963
      AND fact_data7.metric_id=1
      ))
  AND date_format(fact_data7.date_collected,'%Y-%m') > '2008-01-01'
order by date_col asc

这是添加了辅助表 (month_fill) 的外连接的查询：

select date_format(month_fill.date,'%Y-%m') as date_col
    , date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
    , fact_data7.batch_id,fact_data7.value as fdvalue
    , entities.ticker as ticker
    , date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
    , date_format(fact_data7.date_collected,'%Y') as year 
from fact_data7
JOIN entities
  on fact_data7.entity_id=entities.id  
RIGHT OUTER JOIN month_fill
   on date_format(fact_data7.date_collected,'%Y-%m') =  date_format(month_fill.date,'%Y-%m')  
where (1=1)
  AND (
      (entities.id= 963 AND fact_data7.metric_id=1)
      OR (entities.id is null and fact_data7.metric_id is null)
      )
  AND date_format(month_fill.date,'%Y-%m') > '2008-01-01'
order by date_col asc

我可以重构查询以提高性能吗？是否有替代解决方案来实现我正在寻找的功能？

11/15 更新：

这是第一个查询的 EXPLAIN 输出：

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  entities    const   PRIMARY     PRIMARY     4   const   1   Using filesort
1   SIMPLE  fact_data7  ALL     NULL    NULL    NULL    NULL    230636  Using where

这是第二个查询的 EXPLAIN 输出：

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  month_fill  index   NULL    date    8   NULL    204     Using where; Using index; Using temporary; Using filesort
1   SIMPLE  fact_data7  ALL     NULL    NULL    NULL    NULL    230636  Using where
1   SIMPLE  entities    eq_ref  PRIMARY     PRIMARY     4   findata.fact_data7.entity_id    1   Using where

原文

I need to fill in the gaps of a time series of a mysql query result set. I'm in the process of testing the option of doing an outer join with a helper table that contains all of the data points of the time series (as indicated in this thread: How to fill date gaps in MySQL?).

The issue I'm running into is that adding this join significantly increases the query response time (it goes from sub 1 sec to 90 seconds).

Here's the original query:

select date_format(fact_data7.date_collected,'%Y-%m') as date_col
   , date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
   , fact_data7.batch_id,fact_data7.value as fdvalue,entities.ticker as ticker
   , date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
   , date_format(fact_data7.date_collected,'%Y') as year 
from fact_data7  
JOIN entities on fact_data7.entity_id=entities.id  
where (1=1)
  AND ((entities.id= 963
      AND fact_data7.metric_id=1
      ))
  AND date_format(fact_data7.date_collected,'%Y-%m') > '2008-01-01'
order by date_col asc

and here is the query with the outer join to the helper table (month_fill) added:

select date_format(month_fill.date,'%Y-%m') as date_col
    , date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col
    , fact_data7.batch_id,fact_data7.value as fdvalue
    , entities.ticker as ticker
    , date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2
    , date_format(fact_data7.date_collected,'%Y') as year 
from fact_data7
JOIN entities
  on fact_data7.entity_id=entities.id  
RIGHT OUTER JOIN month_fill
   on date_format(fact_data7.date_collected,'%Y-%m') =  date_format(month_fill.date,'%Y-%m')  
where (1=1)
  AND (
      (entities.id= 963 AND fact_data7.metric_id=1)
      OR (entities.id is null and fact_data7.metric_id is null)
      )
  AND date_format(month_fill.date,'%Y-%m') > '2008-01-01'
order by date_col asc

Can I restructure the query to improve the performance is there an alternate solution to achieve what I'm looking for?

Update 11/15:

Here's the EXPLAIN output for the 1st query:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  entities    const   PRIMARY     PRIMARY     4   const   1   Using filesort
1   SIMPLE  fact_data7  ALL     NULL    NULL    NULL    NULL    230636  Using where

Here's the EXPLAIN output for the 2nd query:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  month_fill  index   NULL    date    8   NULL    204     Using where; Using index; Using temporary; Using filesort
1   SIMPLE  fact_data7  ALL     NULL    NULL    NULL    NULL    230636  Using where
1   SIMPLE  entities    eq_ref  PRIMARY     PRIMARY     4   findata.fact_data7.entity_id    1   Using where

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

慵挽 2024-12-22 05:24:33

我认为值得尝试重写 where 以避免使用 date_format(date_collected) 。你说你在这个字段上有一个索引，但它从未被使用过（字段是函数的参数，
MySQL不支持基于函数的索引）

回复收藏 0 原文

遮云壑 2024-12-22 05:24:32

即使不考虑重构查询，我也会首先在日期列fact_data7.data_collected 和month_fill.date 上添加索引。范围查询“>”您所做的是减慢进程速度，从理论上讲，添加索引应该会提高性能，但您需要足够的记录，否则管理索引只会减慢速度，因为管理索引涉及处理。

请参阅此 mysql 文档 http://dev.mysql.com/doc /refman/5.0/en/optimization-indexes.html

我不确定您到底想要实现什么，但您可以尝试使用 ifnull(value1,value2) 函数来实现mysql的。您的查询可能类似于以下内容：

select ifnull(date_format(fact_data7.date_collected,'%Y-%m'),date_format(month_fill.date,'%Y-%m')) as date_col, 
date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col, 
fact_data7.batch_id,
fact_data7.value as fdvalue,
entities.ticker as ticker,
date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2 ,
date_format(fact_data7.date_collected,'%Y') as year 
from fact_data7 , month_fill
JOIN entities on fact_data7.entity_id=entities.id  
where ((entities.id= 963 AND fact_data7.metric_id=1) OR (entities.id is null and fact_data7.metric_id is null))
and date_format(fact_data7.date_collected,'%Y-%m') =  date_format(month_fill.date,'%Y-%m') --you will need a condition similar to this depends on the data
AND date_format(fact_data7.date_collected,'%Y-%m')>'2008-01-01'
order by date_col asc

Without even looking at refactoring the query I would start by adding an index on the date columns fact_data7.data_collected and month_fill.date. The range query ">" you are doing is slowing down the process and adding an index should theoretically speaking increase the performance but you need enough record otherwise managing the index will only slow down because of the processing involved in managing the index.

See this mysql documentation http://dev.mysql.com/doc/refman/5.0/en/optimization-indexes.html

I am not sure exactly what you are trying to achieve but you could try to do it using the ifnull(value1,value2) function of mysql. Your query could be somewhat like the following:

select ifnull(date_format(fact_data7.date_collected,'%Y-%m'),date_format(month_fill.date,'%Y-%m')) as date_col, 
date_format(fact_data7.date_collected,'%d-%H:%i:%s') as time_col, 
fact_data7.batch_id,
fact_data7.value as fdvalue,
entities.ticker as ticker,
date_format(fact_data7.date_collected,'%Y-%m-%d') as date_col2 ,
date_format(fact_data7.date_collected,'%Y') as year 
from fact_data7 , month_fill
JOIN entities on fact_data7.entity_id=entities.id  
where ((entities.id= 963 AND fact_data7.metric_id=1) OR (entities.id is null and fact_data7.metric_id is null))
and date_format(fact_data7.date_collected,'%Y-%m') =  date_format(month_fill.date,'%Y-%m') --you will need a condition similar to this depends on the data
AND date_format(fact_data7.date_collected,'%Y-%m')>'2008-01-01'
order by date_col asc

回复收藏 0 原文

~没有更多了~