SELECT / GROUP BY - 时间段(10 秒、30 秒等)
我有一个表(MySQL),每 n 秒捕获一次样本。该表有很多列,但重要的是两列:时间戳(TIMESTAMP 类型)和计数(INT 类型)。
我想做的是获取一定时间范围内计数列的总和和平均值。例如,我每 2 秒记录一次样本,但我希望对所有样本在 10 秒或 30 秒窗口内的所有样本的计数列进行求和。
这是数据的示例:
+---------------------+-----------------+ | time_stamp | count | +---------------------+-----------------+ | 2010-06-15 23:35:28 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 942 | | 2010-06-15 23:35:30 | 180 | | 2010-06-15 23:35:30 | 4 | | 2010-06-15 23:35:30 | 52 | | 2010-06-15 23:35:30 | 12 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:33 | 1468 | | 2010-06-15 23:35:33 | 247 | | 2010-06-15 23:35:33 | 1 | | 2010-06-15 23:35:33 | 81 | | 2010-06-15 23:35:33 | 16 | | 2010-06-15 23:35:35 | 1828 | | 2010-06-15 23:35:35 | 214 | | 2010-06-15 23:35:35 | 75 | | 2010-06-15 23:35:35 | 8 | | 2010-06-15 23:35:37 | 1799 | | 2010-06-15 23:35:37 | 24 | | 2010-06-15 23:35:37 | 11 | | 2010-06-15 23:35:37 | 2 | | 2010-06-15 23:35:40 | 575 | | 2010-06-15 23:35:40 | 1 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 1 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 1 | | 2010-06-17 10:39:40 | 35 | | 2010-06-17 10:39:40 | 19 | | 2010-06-17 10:39:40 | 37 | | 2010-06-17 10:39:42 | 64 | | 2010-06-17 10:39:42 | 3 | | 2010-06-17 10:39:42 | 31 | | 2010-06-17 10:39:42 | 7 | | 2010-06-17 10:39:42 | 246 | +---------------------+-----------------+
我想要的输出(基于上面的数据)应该如下所示:
+---------------------+-----------------+ | 2010-06-15 23:35:00 | 1 | # This is the sum for the 00 - 30 seconds range | 2010-06-15 23:35:30 | 7544 | # This is the sum for the 30 - 60 seconds range | 2010-06-17 10:39:35 | 450 | # This is the sum for the 30 - 60 seconds range +---------------------+-----------------+
我已经使用 GROUP BY 按秒或按分钟收集这些数字,但我似乎无法弄清楚列出语法以使亚分钟或秒范围内的 GROUP BY 命令正常工作。
我主要将使用此查询将数据从该表虹吸到另一个表。
谢谢!
I have a table (MySQL) that captures samples every n seconds. The table has many columns, but all that matters for this is two: a time stamp (of type TIMESTAMP) and a count (of type INT).
What I would like to do, is get sums and averages of the count column over a range of times. For instance, I have samples every 2 seconds recorded, but I would like the sum of the count column for all the samples in a 10 second or 30 second window for all samples.
Here's an example of the data:
+---------------------+-----------------+ | time_stamp | count | +---------------------+-----------------+ | 2010-06-15 23:35:28 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 942 | | 2010-06-15 23:35:30 | 180 | | 2010-06-15 23:35:30 | 4 | | 2010-06-15 23:35:30 | 52 | | 2010-06-15 23:35:30 | 12 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:33 | 1468 | | 2010-06-15 23:35:33 | 247 | | 2010-06-15 23:35:33 | 1 | | 2010-06-15 23:35:33 | 81 | | 2010-06-15 23:35:33 | 16 | | 2010-06-15 23:35:35 | 1828 | | 2010-06-15 23:35:35 | 214 | | 2010-06-15 23:35:35 | 75 | | 2010-06-15 23:35:35 | 8 | | 2010-06-15 23:35:37 | 1799 | | 2010-06-15 23:35:37 | 24 | | 2010-06-15 23:35:37 | 11 | | 2010-06-15 23:35:37 | 2 | | 2010-06-15 23:35:40 | 575 | | 2010-06-15 23:35:40 | 1 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 1 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 1 | | 2010-06-17 10:39:40 | 35 | | 2010-06-17 10:39:40 | 19 | | 2010-06-17 10:39:40 | 37 | | 2010-06-17 10:39:42 | 64 | | 2010-06-17 10:39:42 | 3 | | 2010-06-17 10:39:42 | 31 | | 2010-06-17 10:39:42 | 7 | | 2010-06-17 10:39:42 | 246 | +---------------------+-----------------+
The output I would like (based on the data above) should look like this:
+---------------------+-----------------+ | 2010-06-15 23:35:00 | 1 | # This is the sum for the 00 - 30 seconds range | 2010-06-15 23:35:30 | 7544 | # This is the sum for the 30 - 60 seconds range | 2010-06-17 10:39:35 | 450 | # This is the sum for the 30 - 60 seconds range +---------------------+-----------------+
I have used GROUP BY to gather these numbers by the second, or by the minute, but I can't seem to figure out the syntax to get the sub-minute or range of seconds GROUP BY commands to work correctly.
I am mostly going to be using this query to syphon data from this table to another table.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 30
或者说由于某种原因你想以 20 秒的间隔对它们进行分组,它将是
DIV 20
等。要更改之间的边界code>GROUP BY
值,您可以使用GROUP BY (UNIX_TIMESTAMP(time_stamp) + r) DIV 30
其中
r
是小于 30 的文字非负整数。GROUP BY (UNIX_TIMESTAMP(time_stamp) + 5) DIV 30
应该为您提供 hh:mm:05 和 hh:mm:35 之间以及 hh:mm:35 和 hh:mm+1:05 之间的总和。
GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 30
or say for some reason you wanted to group them in 20-second intervals it would be
DIV 20
etc. To change the boundaries betweenGROUP BY
values you could useGROUP BY (UNIX_TIMESTAMP(time_stamp) + r) DIV 30
where
r
is a literal nonnegative integer less than 30. SoGROUP BY (UNIX_TIMESTAMP(time_stamp) + 5) DIV 30
should give you sums between hh:mm:05 and hh:mm:35 and between hh:mm:35 and hh:mm+1:05.
我在我的项目中尝试了 Hammerite 的解决方案,但在该系列中缺少样本的情况下,它效果不佳。下面是一个查询示例,该查询应该从 metric_table 中选择时间戳 (ts)、用户名和平均度量,并按 27 分钟时间间隔对结果进行分组:
注意:27 分钟(在 select 中)= 1620 秒(在 group by 中) ,2160 分钟 = 3 天(这是时间范围)
当我针对不规则记录样本的时间序列运行此查询时(换句话说:对于任何给定的时间戳,不能保证找到所有用户名的度量值)结果没有按照间隔标记(不是每 27 分钟放置一次)。我怀疑这是由于 min(ts) 在某些组中返回的时间戳大于预期的下限(ts0 + i*interval)。我将前一个查询修改为这个:
即使样本丢失,它也能正常工作。我认为这是因为一旦时间数学被移动到选择,它就保证 ts1 将与时间步长保持一致。
I tried Hammerite's solution in my project, but it didn't work well where there were missing samples from the series. Here's an example of the query that is supposed to select timestamp (ts), user name and average measure from metric_table and group the results by 27-minute time intervals:
Note: 27 minutes (in select) = 1620 seconds (in group by), 2160 minutes = 3 days (that's the time range)
When I ran this query against a time series where samples were irregularly recorded (in other words: for any given time stamp there was no guarantee to find measure values for all user names) the results were not stamped according to the interval (were not placed every 27 minutes). I suspect that was due to min(ts) returning a time stamp in some groups that was greater than the expected floor(ts0 + i*interval). I modified the former query to this one:
and it works fine even when the samples are missing. I think that is because once the time math is moved to select it guarantees that ts1 will align with the time steps.
另一个解决方案。
要在您喜欢的任何间隔上进行平均,您可以将 dt 转换为时间戳,并按您的间隔(示例中为 7 秒)取模进行分组。
为了展示它是如何工作的,我准备了一个请求,显示计算结果。
任何人都可以建议更快的东西吗?
Another solution.
To average over any interval you like you can convert your dt to timestamp and group by modulo by your interval (7 seconds in the example).
To show how it works, I prepare a request, showing calculations.
Can anyone suggest something faster?
很奇怪,但使用这里的解决方案:
平均给定时间内每 5 分钟的数据
我们可以建议如下:
50 是因为 正常 分钟的 1/2 有 30 秒,而“整数日期格式”假设我们除以 50
Very strange but using the solution here:
Average of data for every 5 minutes in the given times
We can suggest something like:
50 is because 1/2 of NORMAL minute has 30 seconds while 'INTEGER DATE FORMAT' suppose us to divide by 50