分析查询

发布于 2024-07-11 23:42:34 字数 445 浏览 6 评论 0原文

我正在寻找一个可以将表中的以下信息转换

name:time    :state
a   :10:00 AM:login
b   :10:05 AM:login
a   :10:06 AM:chatting
a   :10:08 AM:Idle
b   :10:11 AM:chatting
a   :10:10 AM:Logout
b   :10:12 AM:Logout

为类似内容的查询（假设时间范围上午 10 点到上午 10:15 作为查询时间段）

name: State    :Duration
a   : chatting :2 Minutes 
a   : Idle     :2 Minutes
b   : chatting :1 Minute

这可以仅使用 SQL 来完成吗？我使用 Informix 版本 11.5

原文

Am looking for a single query that can convert the following information in the table

name:time    :state
a   :10:00 AM:login
b   :10:05 AM:login
a   :10:06 AM:chatting
a   :10:08 AM:Idle
b   :10:11 AM:chatting
a   :10:10 AM:Logout
b   :10:12 AM:Logout

to something like this (given the time range 10 AM to 10:15 AM as the querying period)

name: State    :Duration
a   : chatting :2 Minutes 
a   : Idle     :2 Minutes
b   : chatting :1 Minute

Can this be done ONLY using SQL? Am using Informix version 11.5

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

堇色安年 2024-07-18 23:42:34

它可以在单个 SQL 语句中完成。这是证据。

设置

CREATE TEMP TABLE eventtable
(
    name CHAR(3) NOT NULL,
    time DATETIME HOUR TO MINUTE NOT NULL,
    state CHAR(8) NOT NULL
);

INSERT INTO eventtable(name, time, state) VALUES('a', '10:00', 'login');
INSERT INTO eventtable(name, time, state) VALUES('b', '10:05', 'login');
INSERT INTO eventtable(name, time, state) VALUES('a', '10:06', 'chatting');
INSERT INTO eventtable(name, time, state) VALUES('a', '10:08', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('b', '10:11', 'chatting');
INSERT INTO eventtable(name, time, state) VALUES('a', '10:10', 'Logout');
INSERT INTO eventtable(name, time, state) VALUES('b', '10:12', 'Logout');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:01', 'login');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:02', 'chatting');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:03', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:04', 'Logout');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:05', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:06', 'Logout');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:07', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:08', 'Logout');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:09', 'login');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:11', 'chatting');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:12', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:13', 'chatting');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:14', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:15', 'Logout');

正确查询

请注意条件。结果表必须排除“登录”和第一个其他事件之间的时间段；此外，它必须排除“注销”和下一个事件（可能是“登录”）之间的时间段。 name 列上的表与 time 列上的非对称联接（使用“<”）确保事件按时间顺序排列。 NOT EXISTS 子选择确保仅考虑相邻事件。在子查询中使用 BETWEEN AND 是一个错误，因为它包含其端点，并且将 r1.time 和 r2.time 排除在范围之外至关重要；我花了几分钟才发现这个错误（查询运行但没有返回任何行，但是为什么？）！

SELECT r1.name, r1.state, r2.TIME - r1.TIME AS duration
    FROM eventtable r1, eventtable r2
    WHERE r1.name = r2.name
      AND r1.time < r2.time
      AND r1.state != 'login'
      AND r1.state != 'Logout'
      AND r1.time BETWEEN DATETIME(10:00) HOUR TO MINUTE
                      AND DATETIME(10:15) HOUR TO MINUTE
      AND r2.time BETWEEN DATETIME(10:00) HOUR TO MINUTE
                      AND DATETIME(10:15) HOUR TO MINUTE
      AND NOT EXISTS (SELECT 1 FROM eventtable r3
                            WHERE r3.time > r1.time AND r3.time < r2.time
                      AND r3.name = r1.name
                      AND r3.name = r2.name);

这会产生答案：

name state      duration
a    chatting   0:02
a    Idle       0:02
b    chatting   0:01

c    chatting   0:01
c    Idle       0:01
c    Idle       0:01
c    Idle       0:01
c    chatting   0:01
c    Idle       0:01
c    chatting   0:01
c    Idle       0:01

“持续时间”值是一个小时到分钟的间隔；如果您想要在短短几分钟内得到一个值，则必须通过强制转换对其进行转换（使用 4 作为精度，以允许长达 1440 分钟或 1 天的间隔；对于较长的时间范围，数据是不明确的）：

(r2.time - r1.time)::INTERVAL MINUTE(4) TO MINUTE

或者：

CAST (r2.time - r1.time AS INTERVAL MINUTE(4) TO MINUTE)

IBM Informix动态服务器 (IDS) 对于时间常数有非常详细的表示法。在标准 SQL 中，您可以使用 TIME 作为类型，使用 TIME '10:00:00' 作为值，但在严格的标准 SQL 中，秒是必需的。 IDS 确实提供了人们想要的确切类型 - 例如 DATETIME HOUR TO MINUTE。您还可以用标准 SQL 编写 INTERVAL MINUTE(4)； “TO MINUTE”应该是可选的。

不正确的查询

在我对 Ray Hidayat 的回答的评论中，我指出 EXISTS 子查询对于确保所考虑的事件是连续的 - 没有中间事件是必要的。这是相同的查询，其中添加了开始时间和结束时间到输出，并且缺少 EXISTS 子句（并且“duration”重命名为“lapse”）：

SELECT r1.name, r1.state, r2.TIME - r1.TIME AS lapse,
       r1.time AS start, r2.time AS end
    FROM eventtable r1, eventtable r2
    WHERE r1.name = r2.name
      AND r1.time < r2.time
      AND r1.state != 'login'
      AND r1.state != 'Logout'
      AND r1.time BETWEEN DATETIME(10:00) HOUR TO MINUTE
                      AND DATETIME(10:15) HOUR TO MINUTE
      AND r2.time BETWEEN DATETIME(10:00) HOUR TO MINUTE
                      AND DATETIME(10:15) HOUR TO MINUTE;

这会产生答案：

name state     lapse start end
a    chatting   0:04 10:06 10:10
a    chatting   0:02 10:06 10:08
a    Idle       0:02 10:08 10:10
b    chatting   0:01 10:11 10:12
c    chatting   0:13 10:02 10:15
c    chatting   0:12 10:02 10:14
c    chatting   0:11 10:02 10:13
c    chatting   0:10 10:02 10:12
c    chatting   0:09 10:02 10:11
c    chatting   0:07 10:02 10:09
c    chatting   0:06 10:02 10:08
c    chatting   0:05 10:02 10:07
c    chatting   0:04 10:02 10:06
c    chatting   0:03 10:02 10:05
c    chatting   0:02 10:02 10:04
c    chatting   0:01 10:02 10:03
c    Idle       0:12 10:03 10:15
c    Idle       0:11 10:03 10:14
c    Idle       0:10 10:03 10:13
c    Idle       0:09 10:03 10:12
c    Idle       0:08 10:03 10:11
c    Idle       0:06 10:03 10:09
c    Idle       0:05 10:03 10:08
c    Idle       0:04 10:03 10:07
c    Idle       0:03 10:03 10:06
c    Idle       0:02 10:03 10:05
c    Idle       0:01 10:03 10:04
c    Idle       0:10 10:05 10:15
c    Idle       0:09 10:05 10:14
c    Idle       0:08 10:05 10:13
c    Idle       0:07 10:05 10:12
c    Idle       0:06 10:05 10:11
c    Idle       0:04 10:05 10:09
c    Idle       0:03 10:05 10:08
c    Idle       0:02 10:05 10:07
c    Idle       0:01 10:05 10:06
c    Idle       0:08 10:07 10:15
c    Idle       0:07 10:07 10:14
c    Idle       0:06 10:07 10:13
c    Idle       0:05 10:07 10:12
c    Idle       0:04 10:07 10:11
c    Idle       0:02 10:07 10:09
c    Idle       0:01 10:07 10:08
c    chatting   0:04 10:11 10:15
c    chatting   0:03 10:11 10:14
c    chatting   0:02 10:11 10:13
c    chatting   0:01 10:11 10:12
c    Idle       0:03 10:12 10:15
c    Idle       0:02 10:12 10:14
c    Idle       0:01 10:12 10:13
c    chatting   0:02 10:13 10:15
c    chatting   0:01 10:13 10:14
c    Idle       0:01 10:14 10:15

这显示了用户“c”的每个符合条件的开始行是如何匹配的对于每个符合条件的结束行，都会给出许多虚假的数据行。 NOT EXISTS 子查询是处理基于时间的查询时的常见主题。您可以在 Snodgrass 的“用 SQL 开发面向时间的应用程序”（可通过 URL 在线获取 PDF），以及 Date、Darwen 和 Lorentzos 中的“Temporal数据和关系模型”。

It can be done in a single SQL statement. Here's the proof.

Setup

CREATE TEMP TABLE eventtable
(
    name CHAR(3) NOT NULL,
    time DATETIME HOUR TO MINUTE NOT NULL,
    state CHAR(8) NOT NULL
);

INSERT INTO eventtable(name, time, state) VALUES('a', '10:00', 'login');
INSERT INTO eventtable(name, time, state) VALUES('b', '10:05', 'login');
INSERT INTO eventtable(name, time, state) VALUES('a', '10:06', 'chatting');
INSERT INTO eventtable(name, time, state) VALUES('a', '10:08', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('b', '10:11', 'chatting');
INSERT INTO eventtable(name, time, state) VALUES('a', '10:10', 'Logout');
INSERT INTO eventtable(name, time, state) VALUES('b', '10:12', 'Logout');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:01', 'login');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:02', 'chatting');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:03', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:04', 'Logout');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:05', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:06', 'Logout');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:07', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:08', 'Logout');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:09', 'login');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:11', 'chatting');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:12', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:13', 'chatting');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:14', 'Idle');
INSERT INTO eventtable(name, time, state) VALUES('c', '10:15', 'Logout');

Correct query

Note the conditions. The result table must exclude the periods between 'login' and the first other event; further, it must exclude the period between 'Logout' and the next event (presumably a 'login'). The self-join between the table on the name column and then the asymmetric join on the time column (using '<') ensures that events are in time order. The NOT EXISTS sub-select ensures that only adjacent events are considered. Using BETWEEN AND in the sub-query is a mistake because it includes its end points and it is crucial that r1.time and r2.time are excluded from the range; it took me a few minutes to spot that bug (the query ran but returned no rows, but why?)!

SELECT r1.name, r1.state, r2.TIME - r1.TIME AS duration
    FROM eventtable r1, eventtable r2
    WHERE r1.name = r2.name
      AND r1.time < r2.time
      AND r1.state != 'login'
      AND r1.state != 'Logout'
      AND r1.time BETWEEN DATETIME(10:00) HOUR TO MINUTE
                      AND DATETIME(10:15) HOUR TO MINUTE
      AND r2.time BETWEEN DATETIME(10:00) HOUR TO MINUTE
                      AND DATETIME(10:15) HOUR TO MINUTE
      AND NOT EXISTS (SELECT 1 FROM eventtable r3
                            WHERE r3.time > r1.time AND r3.time < r2.time
                      AND r3.name = r1.name
                      AND r3.name = r2.name);

This produces the answer:

name state      duration
a    chatting   0:02
a    Idle       0:02
b    chatting   0:01

c    chatting   0:01
c    Idle       0:01
c    Idle       0:01
c    Idle       0:01
c    chatting   0:01
c    Idle       0:01
c    chatting   0:01
c    Idle       0:01

The 'duration' value is an INTERVAL HOUR TO MINUTE; if you want a value in just minutes, you have to convert it with a cast (using 4 for the precision to allow for intervals up to 1440 minutes, or 1 day; the data is ambiguous for longer time frames):

(r2.time - r1.time)::INTERVAL MINUTE(4) TO MINUTE

Or:

CAST (r2.time - r1.time AS INTERVAL MINUTE(4) TO MINUTE)

IBM Informix Dynamic Server (IDS) has very verbose notations for time constants. In Standard SQL, you could use TIME as the type and TIME '10:00:00' as a value, but the seconds would be necessary in strict standard SQL. IDS does provide exact types that people want - such as DATETIME HOUR TO MINUTE. You'd also write INTERVAL MINUTE(4) in standard SQL; the 'TO MINUTE' should be optional.

Incorrect query

In my comment to Ray Hidayat's answer, I pointed out that the EXISTS sub-query is necessary to ensure that the events under consideration are contiguous - there are no intervening events. Here's the same query with start and end times added to the output, and the EXISTS clause missing (and 'duration' renamed to 'lapse'):

SELECT r1.name, r1.state, r2.TIME - r1.TIME AS lapse,
       r1.time AS start, r2.time AS end
    FROM eventtable r1, eventtable r2
    WHERE r1.name = r2.name
      AND r1.time < r2.time
      AND r1.state != 'login'
      AND r1.state != 'Logout'
      AND r1.time BETWEEN DATETIME(10:00) HOUR TO MINUTE
                      AND DATETIME(10:15) HOUR TO MINUTE
      AND r2.time BETWEEN DATETIME(10:00) HOUR TO MINUTE
                      AND DATETIME(10:15) HOUR TO MINUTE;

This produces the answer:

name state     lapse start end
a    chatting   0:04 10:06 10:10
a    chatting   0:02 10:06 10:08
a    Idle       0:02 10:08 10:10
b    chatting   0:01 10:11 10:12
c    chatting   0:13 10:02 10:15
c    chatting   0:12 10:02 10:14
c    chatting   0:11 10:02 10:13
c    chatting   0:10 10:02 10:12
c    chatting   0:09 10:02 10:11
c    chatting   0:07 10:02 10:09
c    chatting   0:06 10:02 10:08
c    chatting   0:05 10:02 10:07
c    chatting   0:04 10:02 10:06
c    chatting   0:03 10:02 10:05
c    chatting   0:02 10:02 10:04
c    chatting   0:01 10:02 10:03
c    Idle       0:12 10:03 10:15
c    Idle       0:11 10:03 10:14
c    Idle       0:10 10:03 10:13
c    Idle       0:09 10:03 10:12
c    Idle       0:08 10:03 10:11
c    Idle       0:06 10:03 10:09
c    Idle       0:05 10:03 10:08
c    Idle       0:04 10:03 10:07
c    Idle       0:03 10:03 10:06
c    Idle       0:02 10:03 10:05
c    Idle       0:01 10:03 10:04
c    Idle       0:10 10:05 10:15
c    Idle       0:09 10:05 10:14
c    Idle       0:08 10:05 10:13
c    Idle       0:07 10:05 10:12
c    Idle       0:06 10:05 10:11
c    Idle       0:04 10:05 10:09
c    Idle       0:03 10:05 10:08
c    Idle       0:02 10:05 10:07
c    Idle       0:01 10:05 10:06
c    Idle       0:08 10:07 10:15
c    Idle       0:07 10:07 10:14
c    Idle       0:06 10:07 10:13
c    Idle       0:05 10:07 10:12
c    Idle       0:04 10:07 10:11
c    Idle       0:02 10:07 10:09
c    Idle       0:01 10:07 10:08
c    chatting   0:04 10:11 10:15
c    chatting   0:03 10:11 10:14
c    chatting   0:02 10:11 10:13
c    chatting   0:01 10:11 10:12
c    Idle       0:03 10:12 10:15
c    Idle       0:02 10:12 10:14
c    Idle       0:01 10:12 10:13
c    chatting   0:02 10:13 10:15
c    chatting   0:01 10:13 10:14
c    Idle       0:01 10:14 10:15

This shows how each eligible start row for user 'c' is matched with each eligible end row, giving many spurious rows of data. The NOT EXISTS sub-query is a common theme when dealing with time-based queries. You can find information about these operations in Snodgrass's "Developing Time-Oriented Applications in SQL" (PDF available online at URL), and in Date, Darwen and Lorentzos "Temporal Data and the Relational Model".

回复收藏 0 原文

只是一片海 2024-07-18 23:42:34

我很确定只使用 SQL 就可以完成，这将花费我相当多的时间来为您提出查询，完成后我将对其进行编辑。我认为的基本步骤首先是计算每个条目花费的时间（通过获取每个条目并将其连接到下一个条目并减去以找到时间差来完成），然后一个简单的带有总和的分组依据子句将很容易得到它变成你所描述的形式。

编辑：这是我想出的

SELECT l.userid, l.state, SUM(t.minutes) AS duration
FROM Log l 
INNER JOIN (
    SELECT l1.id, (l2.time - l1.time) AS minutes
    FROM Log l1, Log l2
    WHERE l2.time == ( 
        -- find the next entry --
        SELECT TOP 1 ls.time
        FROM Log ls
        WHERE ls.Time > l1.Time && ls.userid = l1.userid
        ORDER BY ls.Time
    )
) t ON l.id == t.id
GROUP BY l.userid, l.state
ORDER BY l.userid

这是半伪代码，我编写了所有表名称和内容，您将无法仅从另一个时间中减去一个时间，您可能会使用 DATEDIFF 函数。除此之外，我认为这就是要点。我认为 SQL 是最令人惊奇的语言之一，你可以用很少的代码做几乎任何事情。

I'm pretty sure it can be done using only SQL, it's going to take me quite a bit of time to come up with a query for you, I'll edit it in when I'm done. The basic steps I think would first be to calculate the amount of time each one takes (done by taking each entry and joining it to the next entry and subtracting to find the time difference) then a simple group by clause with a sum will easily get it into the form you've described.

Edit: Here what I came up with

SELECT l.userid, l.state, SUM(t.minutes) AS duration
FROM Log l 
INNER JOIN (
    SELECT l1.id, (l2.time - l1.time) AS minutes
    FROM Log l1, Log l2
    WHERE l2.time == ( 
        -- find the next entry --
        SELECT TOP 1 ls.time
        FROM Log ls
        WHERE ls.Time > l1.Time && ls.userid = l1.userid
        ORDER BY ls.Time
    )
) t ON l.id == t.id
GROUP BY l.userid, l.state
ORDER BY l.userid

This is semi-pseudocode, I made up all the table names and things, and you won't be able to just subtract one time from another, you'll probably be using the DATEDIFF function. Besides that though, I think that's the gist of it. I think SQL is one of the most amazing languages, you can do almost anything with little code.

回复收藏 0 原文

~没有更多了~