分析查询
我正在寻找一个可以将表中的以下信息转换
name:time :state
a :10:00 AM:login
b :10:05 AM:login
a :10:06 AM:chatting
a :10:08 AM:Idle
b :10:11 AM:chatting
a :10:10 AM:Logout
b :10:12 AM:Logout
为类似内容的查询(假设时间范围上午 10 点到上午 10:15 作为查询时间段)
name: State :Duration
a : chatting :2 Minutes
a : Idle :2 Minutes
b : chatting :1 Minute
这可以仅使用 SQL 来完成吗? 我使用 Informix 版本 11.5
Am looking for a single query that can convert the following information in the table
name:time :state
a :10:00 AM:login
b :10:05 AM:login
a :10:06 AM:chatting
a :10:08 AM:Idle
b :10:11 AM:chatting
a :10:10 AM:Logout
b :10:12 AM:Logout
to something like this (given the time range 10 AM to 10:15 AM as the querying period)
name: State :Duration
a : chatting :2 Minutes
a : Idle :2 Minutes
b : chatting :1 Minute
Can this be done ONLY using SQL? Am using Informix version 11.5
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
它可以在单个 SQL 语句中完成。 这是证据。
设置
正确查询
请注意条件。 结果表必须排除“登录”和第一个其他事件之间的时间段; 此外,它必须排除“注销”和下一个事件(可能是“登录”)之间的时间段。 name 列上的表与 time 列上的非对称联接(使用“
<
”)确保事件按时间顺序排列。 NOT EXISTS 子选择确保仅考虑相邻事件。 在子查询中使用 BETWEEN AND 是一个错误,因为它包含其端点,并且将r1.time
和r2.time
排除在范围之外至关重要; 我花了几分钟才发现这个错误(查询运行但没有返回任何行,但是为什么?)!这会产生答案:
“持续时间”值是一个小时到分钟的间隔; 如果您想要在短短几分钟内得到一个值,则必须通过强制转换对其进行转换(使用 4 作为精度,以允许长达 1440 分钟或 1 天的间隔;对于较长的时间范围,数据是不明确的):
或者:
IBM Informix动态服务器 (IDS) 对于时间常数有非常详细的表示法。 在标准 SQL 中,您可以使用 TIME 作为类型,使用 TIME '10:00:00' 作为值,但在严格的标准 SQL 中,秒是必需的。 IDS 确实提供了人们想要的确切类型 - 例如 DATETIME HOUR TO MINUTE。 您还可以用标准 SQL 编写 INTERVAL MINUTE(4); “TO MINUTE”应该是可选的。
不正确的查询
在我对 Ray Hidayat 的回答的评论中,我指出 EXISTS 子查询对于确保所考虑的事件是连续的 - 没有中间事件是必要的。 这是相同的查询,其中添加了开始时间和结束时间到输出,并且缺少 EXISTS 子句(并且“duration”重命名为“lapse”):
这会产生答案:
这显示了用户“c”的每个符合条件的开始行是如何匹配的对于每个符合条件的结束行,都会给出许多虚假的数据行。 NOT EXISTS 子查询是处理基于时间的查询时的常见主题。 您可以在 Snodgrass 的“用 SQL 开发面向时间的应用程序”(可通过 URL 在线获取 PDF),以及 Date、Darwen 和 Lorentzos 中的“Temporal数据和关系模型”。
It can be done in a single SQL statement. Here's the proof.
Setup
Correct query
Note the conditions. The result table must exclude the periods between 'login' and the first other event; further, it must exclude the period between 'Logout' and the next event (presumably a 'login'). The self-join between the table on the name column and then the asymmetric join on the time column (using '
<
') ensures that events are in time order. The NOT EXISTS sub-select ensures that only adjacent events are considered. Using BETWEEN AND in the sub-query is a mistake because it includes its end points and it is crucial thatr1.time
andr2.time
are excluded from the range; it took me a few minutes to spot that bug (the query ran but returned no rows, but why?)!This produces the answer:
The 'duration' value is an INTERVAL HOUR TO MINUTE; if you want a value in just minutes, you have to convert it with a cast (using 4 for the precision to allow for intervals up to 1440 minutes, or 1 day; the data is ambiguous for longer time frames):
Or:
IBM Informix Dynamic Server (IDS) has very verbose notations for time constants. In Standard SQL, you could use TIME as the type and TIME '10:00:00' as a value, but the seconds would be necessary in strict standard SQL. IDS does provide exact types that people want - such as DATETIME HOUR TO MINUTE. You'd also write INTERVAL MINUTE(4) in standard SQL; the 'TO MINUTE' should be optional.
Incorrect query
In my comment to Ray Hidayat's answer, I pointed out that the EXISTS sub-query is necessary to ensure that the events under consideration are contiguous - there are no intervening events. Here's the same query with start and end times added to the output, and the EXISTS clause missing (and 'duration' renamed to 'lapse'):
This produces the answer:
This shows how each eligible start row for user 'c' is matched with each eligible end row, giving many spurious rows of data. The NOT EXISTS sub-query is a common theme when dealing with time-based queries. You can find information about these operations in Snodgrass's "Developing Time-Oriented Applications in SQL" (PDF available online at URL), and in Date, Darwen and Lorentzos "Temporal Data and the Relational Model".
我很确定只使用 SQL 就可以完成,这将花费我相当多的时间来为您提出查询,完成后我将对其进行编辑。 我认为的基本步骤首先是计算每个条目花费的时间(通过获取每个条目并将其连接到下一个条目并减去以找到时间差来完成),然后一个简单的带有总和的分组依据子句将很容易得到它变成你所描述的形式。
编辑:这是我想出的
这是半伪代码,我编写了所有表名称和内容,您将无法仅从另一个时间中减去一个时间,您可能会使用 DATEDIFF 函数。 除此之外,我认为这就是要点。 我认为 SQL 是最令人惊奇的语言之一,你可以用很少的代码做几乎任何事情。
I'm pretty sure it can be done using only SQL, it's going to take me quite a bit of time to come up with a query for you, I'll edit it in when I'm done. The basic steps I think would first be to calculate the amount of time each one takes (done by taking each entry and joining it to the next entry and subtracting to find the time difference) then a simple group by clause with a sum will easily get it into the form you've described.
Edit: Here what I came up with
This is semi-pseudocode, I made up all the table names and things, and you won't be able to just subtract one time from another, you'll probably be using the DATEDIFF function. Besides that though, I think that's the gist of it. I think SQL is one of the most amazing languages, you can do almost anything with little code.