SELECT / GROUP BY - 时间段(10 秒、30 秒等)

发布于 2024-09-06 03:31:40 字数 2481 浏览 6 评论 0原文

我有一个表(MySQL),每 n 秒捕获一次样本。该表有很多列,但重要的是两列:时间戳(TIMESTAMP 类型)和计数(INT 类型)。

我想做的是获取一定时间范围内计数列的总和和平均值。例如,我每 2 秒记录一次样本,但我希望对所有样本在 10 秒或 30 秒窗口内的所有样本的计数列进行求和。

这是数据的示例:

+---------------------+-----------------+
| time_stamp          | count           |
+---------------------+-----------------+
| 2010-06-15 23:35:28 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |             942 |
| 2010-06-15 23:35:30 |             180 |
| 2010-06-15 23:35:30 |               4 |
| 2010-06-15 23:35:30 |              52 |
| 2010-06-15 23:35:30 |              12 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:33 |            1468 |
| 2010-06-15 23:35:33 |             247 |
| 2010-06-15 23:35:33 |               1 |
| 2010-06-15 23:35:33 |              81 |
| 2010-06-15 23:35:33 |              16 |
| 2010-06-15 23:35:35 |            1828 |
| 2010-06-15 23:35:35 |             214 |
| 2010-06-15 23:35:35 |              75 |
| 2010-06-15 23:35:35 |               8 |
| 2010-06-15 23:35:37 |            1799 |
| 2010-06-15 23:35:37 |              24 |
| 2010-06-15 23:35:37 |              11 |
| 2010-06-15 23:35:37 |               2 |
| 2010-06-15 23:35:40 |             575 |
| 2010-06-15 23:35:40 |               1 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               1 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               1 |
| 2010-06-17 10:39:40 |              35 |
| 2010-06-17 10:39:40 |              19 |
| 2010-06-17 10:39:40 |              37 |
| 2010-06-17 10:39:42 |              64 |
| 2010-06-17 10:39:42 |               3 |
| 2010-06-17 10:39:42 |              31 |
| 2010-06-17 10:39:42 |               7 |
| 2010-06-17 10:39:42 |             246 |
+---------------------+-----------------+

我想要的输出(基于上面的数据)应该如下所示:

+---------------------+-----------------+
| 2010-06-15 23:35:00 |               1 |  # This is the sum for the 00 - 30 seconds range
| 2010-06-15 23:35:30 |            7544 |  # This is the sum for the 30 - 60 seconds range
| 2010-06-17 10:39:35 |             450 |  # This is the sum for the 30 - 60 seconds range
+---------------------+-----------------+

我已经使用 GROUP BY 按秒或按分钟收集这些数字,但我似乎无法弄清楚列出语法以使亚分钟或秒范围内的 GROUP BY 命令正常工作。

我主要将使用此查询将数据从该表虹吸到另一个表。

谢谢!

I have a table (MySQL) that captures samples every n seconds. The table has many columns, but all that matters for this is two: a time stamp (of type TIMESTAMP) and a count (of type INT).

What I would like to do, is get sums and averages of the count column over a range of times. For instance, I have samples every 2 seconds recorded, but I would like the sum of the count column for all the samples in a 10 second or 30 second window for all samples.

Here's an example of the data:

+---------------------+-----------------+
| time_stamp          | count           |
+---------------------+-----------------+
| 2010-06-15 23:35:28 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |             942 |
| 2010-06-15 23:35:30 |             180 |
| 2010-06-15 23:35:30 |               4 |
| 2010-06-15 23:35:30 |              52 |
| 2010-06-15 23:35:30 |              12 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:33 |            1468 |
| 2010-06-15 23:35:33 |             247 |
| 2010-06-15 23:35:33 |               1 |
| 2010-06-15 23:35:33 |              81 |
| 2010-06-15 23:35:33 |              16 |
| 2010-06-15 23:35:35 |            1828 |
| 2010-06-15 23:35:35 |             214 |
| 2010-06-15 23:35:35 |              75 |
| 2010-06-15 23:35:35 |               8 |
| 2010-06-15 23:35:37 |            1799 |
| 2010-06-15 23:35:37 |              24 |
| 2010-06-15 23:35:37 |              11 |
| 2010-06-15 23:35:37 |               2 |
| 2010-06-15 23:35:40 |             575 |
| 2010-06-15 23:35:40 |               1 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               1 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               1 |
| 2010-06-17 10:39:40 |              35 |
| 2010-06-17 10:39:40 |              19 |
| 2010-06-17 10:39:40 |              37 |
| 2010-06-17 10:39:42 |              64 |
| 2010-06-17 10:39:42 |               3 |
| 2010-06-17 10:39:42 |              31 |
| 2010-06-17 10:39:42 |               7 |
| 2010-06-17 10:39:42 |             246 |
+---------------------+-----------------+

The output I would like (based on the data above) should look like this:

+---------------------+-----------------+
| 2010-06-15 23:35:00 |               1 |  # This is the sum for the 00 - 30 seconds range
| 2010-06-15 23:35:30 |            7544 |  # This is the sum for the 30 - 60 seconds range
| 2010-06-17 10:39:35 |             450 |  # This is the sum for the 30 - 60 seconds range
+---------------------+-----------------+

I have used GROUP BY to gather these numbers by the second, or by the minute, but I can't seem to figure out the syntax to get the sub-minute or range of seconds GROUP BY commands to work correctly.

I am mostly going to be using this query to syphon data from this table to another table.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

青巷忧颜 2024-09-13 03:31:40

GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 30

或者说由于某种原因你想以 20 秒的间隔对它们进行分组,它将是 DIV 20 等。要更改 之间的边界code>GROUP BY 值,您可以使用

GROUP BY (UNIX_TIMESTAMP(time_stamp) + r) DIV 30

其中 r 是小于 30 的文字非负整数。

GROUP BY (UNIX_TIMESTAMP(time_stamp) + 5) DIV 30

应该为您提供 hh:mm:05 和 hh:mm:35 之间以及 hh:mm:35 和 hh:mm+1:05 之间的总和。

GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 30

or say for some reason you wanted to group them in 20-second intervals it would be DIV 20 etc. To change the boundaries between GROUP BY values you could use

GROUP BY (UNIX_TIMESTAMP(time_stamp) + r) DIV 30

where r is a literal nonnegative integer less than 30. So

GROUP BY (UNIX_TIMESTAMP(time_stamp) + 5) DIV 30

should give you sums between hh:mm:05 and hh:mm:35 and between hh:mm:35 and hh:mm+1:05.

ぺ禁宫浮华殁 2024-09-13 03:31:40

我在我的项目中尝试了 Hammerite 的解决方案,但在该系列中缺少样本的情况下,它效果不佳。下面是一个查询示例,该查询应该从 metric_table 中选择时间戳 (ts)、用户名和平均度量,并按 27 分钟时间间隔对结果进行分组:

select 
    min(ts), 
    user_name, 
    sum(measure) / 27
from metric_table 
where 
    ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00' 

group by unix_timestamp(ts) div 1620, user_name 
order by ts, user_name
;

注意:27 分钟(在 select 中)= 1620 秒(在 group by 中) ,2160 分钟 = 3 天(这是时间范围)

当我针对不规则记录样本的时间序列运行此查询时(换句话说:对于任何给定的时间戳,不能保证找到所有用户名的度量值)结果没有按照间隔标记(不是每 27 分钟放置一次)。我怀疑这是由于 min(ts) 在某些组中返回的时间戳大于预期的下限(ts0 + i*interval)。我将前一个查询修改为这个:

select 
    from_unixtime(unix_timestamp(ts) - unix_timestamp(ts) mod 1620) as ts1, 
    user_name, 
    sum(measure) / 27
from metric_table
where 
    ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00' 

group by ts1, user_name 
order by ts1, user_name
;

即使样本丢失,它也能正常工作。我认为这是因为一旦时间数学被移动到选择,它就保证 ts1 将与时间步长保持一致。

I tried Hammerite's solution in my project, but it didn't work well where there were missing samples from the series. Here's an example of the query that is supposed to select timestamp (ts), user name and average measure from metric_table and group the results by 27-minute time intervals:

select 
    min(ts), 
    user_name, 
    sum(measure) / 27
from metric_table 
where 
    ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00' 

group by unix_timestamp(ts) div 1620, user_name 
order by ts, user_name
;

Note: 27 minutes (in select) = 1620 seconds (in group by), 2160 minutes = 3 days (that's the time range)

When I ran this query against a time series where samples were irregularly recorded (in other words: for any given time stamp there was no guarantee to find measure values for all user names) the results were not stamped according to the interval (were not placed every 27 minutes). I suspect that was due to min(ts) returning a time stamp in some groups that was greater than the expected floor(ts0 + i*interval). I modified the former query to this one:

select 
    from_unixtime(unix_timestamp(ts) - unix_timestamp(ts) mod 1620) as ts1, 
    user_name, 
    sum(measure) / 27
from metric_table
where 
    ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00' 

group by ts1, user_name 
order by ts1, user_name
;

and it works fine even when the samples are missing. I think that is because once the time math is moved to select it guarantees that ts1 will align with the time steps.

无声情话 2024-09-13 03:31:40

另一个解决方案。

要在您喜欢的任何间隔上进行平均,您可以将 dt 转换为时间戳,并按您的间隔(示例中为 7 秒)取模进行分组。

select FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7
) as dt, avg(1das4hrz) from `meteor-m2_msgi`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-13 05:02:00'
group by FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7);

为了展示它是如何工作的,我准备了一个请求,显示计算结果。

select dt_record, minute(dt_record) as mm, SECOND(dt_record) as ss,
UNIX_TIMESTAMP(dt_record) as uxt, UNIX_TIMESTAMP(dt_record) mod 7 as ux7,
FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7) as dtsub,
column from `yourtable` where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-13 05:02:00';

+---------------------+--------------------+
| dt                  | avg(column)        |
+---------------------+--------------------+
| 2016-11-13 04:59:43 |  25434.85714285714 |
| 2016-11-13 05:00:42 |  5700.728813559322 |
| 2016-11-13 05:01:41 |  950.1016949152543 |
| 2016-11-13 05:02:40 |  4671.220338983051 |
| 2016-11-13 05:03:39 | 25468.728813559323 |
| 2016-11-13 05:04:38 |  43883.52542372881 |
| 2016-11-13 05:05:37 | 24589.338983050846 |
+---------------------+--------------------+


+---------------------+-----+-----+------------+------+---------------------+----------+
| dt_record           | mm  | ss  | uxt        | ux7  | dtsub               | column   |
+---------------------+------+-----+------------+------+---------------------+----------+
| 2016-11-13 05:00:00 |   0 |   0 | 1479002400 |    1 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:01 |   0 |   1 | 1479002401 |    2 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:02 |   0 |   2 | 1479002402 |    3 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:03 |   0 |   3 | 1479002403 |    4 | 2016-11-13 04:59:59 |    34911 |     
| 2016-11-13 05:00:04 |   0 |   4 | 1479002404 |    5 | 2016-11-13 04:59:59 |    34911 |
| 2016-11-13 05:00:05 |   0 |   5 | 1479002405 |    6 | 2016-11-13 04:59:59 |    34911 |
| 2016-11-13 05:00:06 |   0 |   6 | 1479002406 |    0 | 2016-11-13 05:00:06 |    33726 |
| 2016-11-13 05:00:07 |   0 |   7 | 1479002407 |    1 | 2016-11-13 05:00:06 |    32581 |
| 2016-11-13 05:00:08 |   0 |   8 | 1479002408 |    2 | 2016-11-13 05:00:06 |    32581 |
| 2016-11-13 05:00:09 |   0 |   9 | 1479002409 |    3 | 2016-11-13 05:00:06 |    31475 |
+---------------------+-----+-----+------------+------+---------------------+----------+

任何人都可以建议更快的东西吗?

Another solution.

To average over any interval you like you can convert your dt to timestamp and group by modulo by your interval (7 seconds in the example).

select FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7
) as dt, avg(1das4hrz) from `meteor-m2_msgi`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-13 05:02:00'
group by FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7);

To show how it works, I prepare a request, showing calculations.

select dt_record, minute(dt_record) as mm, SECOND(dt_record) as ss,
UNIX_TIMESTAMP(dt_record) as uxt, UNIX_TIMESTAMP(dt_record) mod 7 as ux7,
FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7) as dtsub,
column from `yourtable` where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-13 05:02:00';

+---------------------+--------------------+
| dt                  | avg(column)        |
+---------------------+--------------------+
| 2016-11-13 04:59:43 |  25434.85714285714 |
| 2016-11-13 05:00:42 |  5700.728813559322 |
| 2016-11-13 05:01:41 |  950.1016949152543 |
| 2016-11-13 05:02:40 |  4671.220338983051 |
| 2016-11-13 05:03:39 | 25468.728813559323 |
| 2016-11-13 05:04:38 |  43883.52542372881 |
| 2016-11-13 05:05:37 | 24589.338983050846 |
+---------------------+--------------------+


+---------------------+-----+-----+------------+------+---------------------+----------+
| dt_record           | mm  | ss  | uxt        | ux7  | dtsub               | column   |
+---------------------+------+-----+------------+------+---------------------+----------+
| 2016-11-13 05:00:00 |   0 |   0 | 1479002400 |    1 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:01 |   0 |   1 | 1479002401 |    2 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:02 |   0 |   2 | 1479002402 |    3 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:03 |   0 |   3 | 1479002403 |    4 | 2016-11-13 04:59:59 |    34911 |     
| 2016-11-13 05:00:04 |   0 |   4 | 1479002404 |    5 | 2016-11-13 04:59:59 |    34911 |
| 2016-11-13 05:00:05 |   0 |   5 | 1479002405 |    6 | 2016-11-13 04:59:59 |    34911 |
| 2016-11-13 05:00:06 |   0 |   6 | 1479002406 |    0 | 2016-11-13 05:00:06 |    33726 |
| 2016-11-13 05:00:07 |   0 |   7 | 1479002407 |    1 | 2016-11-13 05:00:06 |    32581 |
| 2016-11-13 05:00:08 |   0 |   8 | 1479002408 |    2 | 2016-11-13 05:00:06 |    32581 |
| 2016-11-13 05:00:09 |   0 |   9 | 1479002409 |    3 | 2016-11-13 05:00:06 |    31475 |
+---------------------+-----+-----+------------+------+---------------------+----------+

Can anyone suggest something faster?

時窥 2024-09-13 03:31:40

很奇怪,但使用这里的解决方案:

平均给定时间内每 5 分钟的数据

我们可以建议如下:

select convert(
              (min(dt_record) div 50)*50 - 20*((convert(min(dt_record), 
               datetime) div 50) mod 2), datetime)  as dt, 
       avg(1das4hrz) 
from `meteor-m2_msgi`
where dt_record>='2016-11-13 05:00:00'
       and dt_record < '2016-11-14 00:00:00' 
group by convert(dt_record, datetime) div 50;


select (
convert(
min(dt_record), datetime) div 50)*50 - 20*(
(convert(min(dt_record), datetime) div 50) mod 2
) as dt,
avg(column) from `your_table`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-14 00:00:00'
group by convert(dt_record, datetime) div 50;

50 是因为 正常 分钟的 1/2 有 30 秒,而“整数日期格式”假设我们除以 50

Very strange but using the solution here:

Average of data for every 5 minutes in the given times

We can suggest something like:

select convert(
              (min(dt_record) div 50)*50 - 20*((convert(min(dt_record), 
               datetime) div 50) mod 2), datetime)  as dt, 
       avg(1das4hrz) 
from `meteor-m2_msgi`
where dt_record>='2016-11-13 05:00:00'
       and dt_record < '2016-11-14 00:00:00' 
group by convert(dt_record, datetime) div 50;


select (
convert(
min(dt_record), datetime) div 50)*50 - 20*(
(convert(min(dt_record), datetime) div 50) mod 2
) as dt,
avg(column) from `your_table`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-14 00:00:00'
group by convert(dt_record, datetime) div 50;

50 is because 1/2 of NORMAL minute has 30 seconds while 'INTEGER DATE FORMAT' suppose us to divide by 50

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文