如何在可接受的时间窗口delta增量上构造一个窗口的聚合查询

发布于 2025-02-08 12:06:26 字数 2762 浏览 2 评论 0 原文

我将用户事件记录到一些 sqlite 表中。

目前,这些表正在计数1分钟的桶上的交互作用,并在插入时进行了更新。

表的架构就是这样:像

CREATE TABLE table1(
                window DATETIME,
                counter INTEGER NOT NULL DEFAULT 1,
                value INTEGER NOT NULL DEFAULT 0,
                file_id INTEGER NOT NULL,
                PRIMARY KEY(window,file_id)
            );

CREATE TABLE table2(
                window DATETIME,
                counter INTEGER NOT NULL DEFAULT 1,
                value INTEGER NOT NULL DEFAULT 0,
                file_id INTEGER NOT NULL,
                PRIMARY KEY(window,file_id)
            );

CREATE TABLE files(name TEXT PRIMARY KEY NOT NULL ON CONFLICT REPLACE DEFAULT 'NA', touched DATE DEFAULT (date('now')));

这样执行了表格上表的插入:

INSERT OR REPLACE INTO files(ROWID, name, touched)
    VALUES(
        (select ROWID FROM files WHERE name=IFNULL(?1,'NA') UNION SELECT max(ROWID) + 1 FROM files limit 1),
        IFNULL(?1,'NA'), 
        date()
    )

INSERT INTO table1(window,file_id,value)
        VALUES(
            datetime(strftime('%s', 'now') - (strftime('%s', 'now') % 60), 'unixepoch', 'localtime'),
            (SELECT rowid FROM files WHERE name=IFNULL(:fname,'NA')),
            :delta
        ) ON CONFLICT(window,file_id) DO
            UPDATE SET count = count + 1, value = value + :delta

一些示例数据看起来像这样:

select * from table1

window               file_id                 counter      value
-------------------  -------------------  -----------  ----------
2022-06-13 10:26:00  1                        29           3         
2022-06-13 10:27:00  2                        99           7
2022-06-13 10:30:00  3                        1            22
2022-06-13 10:31:00  4                        1            22
2022-06-13 10:34:00  1                        1            22
2022-06-13 10:39:00  1                        1            22
2022-06-13 10:40:00  1                        1            22
2022-06-13 10:53:00  1                        1            22
              

我想做的是基于活动会话汇总“计数器”和“值”列。只要先前的活动(行窗口)在当前活动的10分钟内(行窗口分隔少于10分钟),就可以将会话视为活动。

在此示例中,第1-7行属于同一会话,因为每个活动(行窗口)距离先前的活动不到10分钟。最后一行定义了一个新会话,因为是在上一项活动之后的13分钟(13分钟=(10:53-10:40)大于10分钟)。

此查询应该同时从表2汇总(它们是同时记录的,但反映了不同的数据),

我所需的输出应包括这样的数据(按照第1-7行和第8行),

session 1:
started at: 2022-06-13 10:26:00
ended at:  2022-06-13 10:40:00
counter_sum: 22 + 99 + 1 + 1 + 1 + ... +1 (the sum of count column)
value_sum: 3 + 7 + 22 + 22 .... + 22 (the sum of value column)

session 2:
start at: 2022-06-13 10:53:00
finished at: 2022-06-13 10:53:00 (since no data after this assume same time)
count_sum: 1
value_sum: 22

您将如何构造此类型的窗口查询?

I am recording user events into some SQLite tables.

Currently the tables are counting interactions over buckets of 1 minute which is updated on insertion.

The schema of the tables is like this:

CREATE TABLE table1(
                window DATETIME,
                counter INTEGER NOT NULL DEFAULT 1,
                value INTEGER NOT NULL DEFAULT 0,
                file_id INTEGER NOT NULL,
                PRIMARY KEY(window,file_id)
            );

CREATE TABLE table2(
                window DATETIME,
                counter INTEGER NOT NULL DEFAULT 1,
                value INTEGER NOT NULL DEFAULT 0,
                file_id INTEGER NOT NULL,
                PRIMARY KEY(window,file_id)
            );

CREATE TABLE files(name TEXT PRIMARY KEY NOT NULL ON CONFLICT REPLACE DEFAULT 'NA', touched DATE DEFAULT (date('now')));

Insertion to the tables is performed like this:

INSERT OR REPLACE INTO files(ROWID, name, touched)
    VALUES(
        (select ROWID FROM files WHERE name=IFNULL(?1,'NA') UNION SELECT max(ROWID) + 1 FROM files limit 1),
        IFNULL(?1,'NA'), 
        date()
    )

INSERT INTO table1(window,file_id,value)
        VALUES(
            datetime(strftime('%s', 'now') - (strftime('%s', 'now') % 60), 'unixepoch', 'localtime'),
            (SELECT rowid FROM files WHERE name=IFNULL(:fname,'NA')),
            :delta
        ) ON CONFLICT(window,file_id) DO
            UPDATE SET count = count + 1, value = value + :delta

Some example data would look like this:

select * from table1

window               file_id                 counter      value
-------------------  -------------------  -----------  ----------
2022-06-13 10:26:00  1                        29           3         
2022-06-13 10:27:00  2                        99           7
2022-06-13 10:30:00  3                        1            22
2022-06-13 10:31:00  4                        1            22
2022-06-13 10:34:00  1                        1            22
2022-06-13 10:39:00  1                        1            22
2022-06-13 10:40:00  1                        1            22
2022-06-13 10:53:00  1                        1            22
              

What I want to do is aggregate the "counter" and "value" columns based on active sessions. The session is considered active so long as the previous activity (row window) was within 10 minutes of the current activity (row windows separated by less than 10 minutes).

In this example, rows 1-7 belong to the same session since each activity (row window) is less than 10 minutes away from the previous activity. The last row defines a new session as it is 13 minutes after the previous activity ( 13 minutes = (10:53 - 10:40) is greater than 10 minutes).

This query should simultaneously aggregate from table2 (they are recorded at the same time but reflect different data)

My desired output should include data like this (per the rows 1-7 and row 8)

session 1:
started at: 2022-06-13 10:26:00
ended at:  2022-06-13 10:40:00
counter_sum: 22 + 99 + 1 + 1 + 1 + ... +1 (the sum of count column)
value_sum: 3 + 7 + 22 + 22 .... + 22 (the sum of value column)

session 2:
start at: 2022-06-13 10:53:00
finished at: 2022-06-13 10:53:00 (since no data after this assume same time)
count_sum: 1
value_sum: 22

How would you construct this type of windowing query?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无言温柔 2025-02-15 12:06:26

您需要一列标记有效组的列,

SELECT *, SUM(flag) OVER (ORDER BY window) grp
FROM (
  SELECT *, strftime('%s', window) - COALESCE(strftime('%s', LAG(window) OVER (ORDER BY window)), 0) > 600 flag
  FROM table1
);

然后可以基于该列进行汇总,例如:

WITH cte AS (
  SELECT *, SUM(flag) OVER (ORDER BY window) grp
  FROM (
    SELECT *, strftime('%s', window) - COALESCE(strftime('%s', LAG(window) OVER (ORDER BY window)), 0) > 600 flag
    FROM table1
  )
)
SELECT grp,
       MIN(window) min_window,
       MAX(window) max_window,
       SUM(counter) sum_counter,
       SUM(value) sum_value
FROM cte
GROUP BY grp; 

   

请参阅 demo

You need a column that marks the active groups, like:

SELECT *, SUM(flag) OVER (ORDER BY window) grp
FROM (
  SELECT *, strftime('%s', window) - COALESCE(strftime('%s', LAG(window) OVER (ORDER BY window)), 0) > 600 flag
  FROM table1
);

and then you can aggregate based on that column like:

WITH cte AS (
  SELECT *, SUM(flag) OVER (ORDER BY window) grp
  FROM (
    SELECT *, strftime('%s', window) - COALESCE(strftime('%s', LAG(window) OVER (ORDER BY window)), 0) > 600 flag
    FROM table1
  )
)
SELECT grp,
       MIN(window) min_window,
       MAX(window) max_window,
       SUM(counter) sum_counter,
       SUM(value) sum_value
FROM cte
GROUP BY grp; 

   

See the demo.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文