汇总时间间隔和分组到固定时间网格

发布于 2025-02-10 15:51:32 字数 1248 浏览 1 评论 0原文

我有一些数据包括轮班,记录了在轮班期间被视为断裂的时间段。

start_ts                end_ts                  shift_id
2022-01-01T08:31:37Z    2022-01-01T08:58:37Z    1
2022-01-01T08:37:37Z    2022-01-01T09:03:37Z    2
2022-01-01T08:46:37Z    2022-01-01T08:48:37Z    3

我想将此数据映射到15分钟的网格中,计算在此间隔期间花费多少秒(不是每班)。一个解决方案看起来像这样:

start_time               end_time               total_break_seconds
2022-01-01T08:30:00Z    2022-01-01T08:45:00Z      1246
2022-01-01T08:45:00Z    2022-01-01T09:00:00Z      1837
2022-01-01T09:00:00Z    2022-01-01T09:15:00Z      217

我知道这是一个差距和岛风格的问题,但是我不确定如何将其与映射结合到时间网格元素。我已经考虑使用Unix_seconds/tim-the-poch来获取15分钟的间隔,但无法做到。我将使用非常大的桌子合作,因此理想情况下,我会在将每个时间间隔扩展到15分钟网格之前做尽可能多的工作,但欢迎所有解决方案。

我正在研究BigQuery

,这是一个可复制的例子:

SELECT
    TIMESTAMP("2022-01-01 08:31:37") AS start_ts,
    TIMESTAMP("2022-01-01 08:58:37") AS end_ts,
    1 as shift_id
  UNION ALL (
    SELECT
      TIMESTAMP("2022-01-01 08:37:37") AS start_ts,
      TIMESTAMP("2022-01-01 09:03:37") AS end_ts,
      2 as shift_id
  )
  UNION ALL (
    SELECT
      TIMESTAMP("2022-01-01 08:46:37") AS start_ts,
      TIMESTAMP("2022-01-01 08:48:37") AS end_ts,
      3 as shift_id
       )

I have some data consisting of shifts, logging the time periods taken as breaks during the shift.

start_ts                end_ts                  shift_id
2022-01-01T08:31:37Z    2022-01-01T08:58:37Z    1
2022-01-01T08:37:37Z    2022-01-01T09:03:37Z    2
2022-01-01T08:46:37Z    2022-01-01T08:48:37Z    3

I want to map this data to a 15-minute grid, counting how many seconds in total (not per shift) are spent on break during that interval. A solution would look like this:

start_time               end_time               total_break_seconds
2022-01-01T08:30:00Z    2022-01-01T08:45:00Z      1246
2022-01-01T08:45:00Z    2022-01-01T09:00:00Z      1837
2022-01-01T09:00:00Z    2022-01-01T09:15:00Z      217

I know this is a gaps-and-islands style problem, but I'm not sure how to combine this with the mapping to a time grid element. I've looked at using UNIX_SECONDS/time-to-epoch to get the 15-minute intervals, but can't make it out. I'll be working with pretty large tables so ideally I would do as much work as possible before expanding each time interval to the 15-minute grid, but all solutions welcome.

I'm working on BigQuery

Here's a reproducible example to start with:

SELECT
    TIMESTAMP("2022-01-01 08:31:37") AS start_ts,
    TIMESTAMP("2022-01-01 08:58:37") AS end_ts,
    1 as shift_id
  UNION ALL (
    SELECT
      TIMESTAMP("2022-01-01 08:37:37") AS start_ts,
      TIMESTAMP("2022-01-01 09:03:37") AS end_ts,
      2 as shift_id
  )
  UNION ALL (
    SELECT
      TIMESTAMP("2022-01-01 08:46:37") AS start_ts,
      TIMESTAMP("2022-01-01 08:48:37") AS end_ts,
      3 as shift_id
       )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

古镇旧梦 2025-02-17 15:51:32

请考虑以下考虑

with grid as (
  select start_time, timestamp_sub(timestamp_add(start_time, interval 15 minute), interval 1 second) end_time
  from (
    select max(end_ts) max_end,
      timestamp_trunc(min(start_ts), hour) min_start
    from your_table
  ), unnest(generate_timestamp_array(min_start, max_end, interval 15 minute)) start_time
), seconds as (
  select ts from your_table, 
  unnest(generate_timestamp_array(start_ts, timestamp_sub(end_ts, interval 1 second), interval 1 second)) ts # this is the line with fix
)
select start_time, end_time, count(*) total_break_seconds
from grid
join seconds
on ts between start_time and end_time
group by  start_time, end_time    

如果在您的问题中应用于示例数据,

-USTUP是 ”在此处输入图像描述”

Consider below

with grid as (
  select start_time, timestamp_sub(timestamp_add(start_time, interval 15 minute), interval 1 second) end_time
  from (
    select max(end_ts) max_end,
      timestamp_trunc(min(start_ts), hour) min_start
    from your_table
  ), unnest(generate_timestamp_array(min_start, max_end, interval 15 minute)) start_time
), seconds as (
  select ts from your_table, 
  unnest(generate_timestamp_array(start_ts, timestamp_sub(end_ts, interval 1 second), interval 1 second)) ts # this is the line with fix
)
select start_time, end_time, count(*) total_break_seconds
from grid
join seconds
on ts between start_time and end_time
group by  start_time, end_time    

if applied to sample data in your question - output is

enter image description here

秋叶绚丽 2025-02-17 15:51:32

与以下查询:

WITH breaks AS (
  SELECT *,
         CASE
           -- for staring break (considering start_ts and end_ts are in same break)
           WHEN break <= start_ts AND end_ts < break + INTERVAL 15 MINUTE THEN TIMESTAMP_DIFF(end_ts, start_ts, SECOND)
           WHEN break <= start_ts THEN 900 - TIMESTAMP_DIFF(start_ts, break, SECOND)
           -- for remaining breaks (considering full break + partial break)
           ELSE IF(DIV(diff, 900) > 0 AND break + INTERVAL 15 MINUTE < end_ts, 900, MOD(diff, 900))
         END AS elapsed
    FROM sample,
         UNNEST(GENERATE_TIMESTAMP_ARRAY(
           TIMESTAMP_TRUNC(start_ts, HOUR), TIMESTAMP_TRUNC(end_ts, HOUR) + INTERVAL 1 HOUR, INTERVAL 15 MINUTE
         )) break,
         UNNEST([TIMESTAMP_DIFF(end_ts, break, SECOND)]) diff
   WHERE break + INTERVAL 15 MINUTE >= start_ts AND break < end_ts
)
SELECT break AS start_time, break + INTERVAL 15 MINUTE AS end_time, SUM(elapsed) total_break_seconds 
  FROM breaks
 GROUP BY 1 ORDER BY 1;

输出为:

”在此处输入图像说明”

With below query:

WITH breaks AS (
  SELECT *,
         CASE
           -- for staring break (considering start_ts and end_ts are in same break)
           WHEN break <= start_ts AND end_ts < break + INTERVAL 15 MINUTE THEN TIMESTAMP_DIFF(end_ts, start_ts, SECOND)
           WHEN break <= start_ts THEN 900 - TIMESTAMP_DIFF(start_ts, break, SECOND)
           -- for remaining breaks (considering full break + partial break)
           ELSE IF(DIV(diff, 900) > 0 AND break + INTERVAL 15 MINUTE < end_ts, 900, MOD(diff, 900))
         END AS elapsed
    FROM sample,
         UNNEST(GENERATE_TIMESTAMP_ARRAY(
           TIMESTAMP_TRUNC(start_ts, HOUR), TIMESTAMP_TRUNC(end_ts, HOUR) + INTERVAL 1 HOUR, INTERVAL 15 MINUTE
         )) break,
         UNNEST([TIMESTAMP_DIFF(end_ts, break, SECOND)]) diff
   WHERE break + INTERVAL 15 MINUTE >= start_ts AND break < end_ts
)
SELECT break AS start_time, break + INTERVAL 15 MINUTE AS end_time, SUM(elapsed) total_break_seconds 
  FROM breaks
 GROUP BY 1 ORDER BY 1;

Output will be:

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文