SQL查询挑战
所以这是另一个“向 X 写入查询”挑战。
我正在监控许多联网的自动售货机。每台机器都有许多部件,例如钞票接收器、投币系统、打印机等。
机器部件的问题记录在表中,我们称之为“故障”,看起来像这样(省略不相关的字段):
machineid partid start_time end_time
--------- ------ ---------------- ----------------
1 2 2009-10-05 09:00 NULL
1 3 2009-10-05 08:00 2009-10-05 10:00
2 2 2009-09-30 12:00 2009-09-30 14:00
3 4 2009-09-28 13:00 2009-09-28 15:00
3 2 2009-09-28 12:00 2009-09-28 14:00
如果问题当前正在进行,则 end_date 为 NULL。
我需要一个查询来显示整个机器停机的时间段,并且可以考虑重叠范围,将它们折叠成单个记录。因此,对于上面的示例数据,它将产生:
machineid start_time end_time
--------- ---------------- ----------------
1 2009-10-05 08:00 NULL
2 2009-09-30 12:00 2009-09-30 14:00
3 2009-09-28 12:00 2009-09-28 15:00
编写过程代码来逐行执行此操作并不困难,但一个漂亮的声明性 SQL 查询会更有用、更优雅。看起来这应该是可能的,但我只是无法完全到达那里。
SQL 方言是 Oracle。如果有帮助的话,可以使用分析函数。
谢谢!
So here's yet another 'write a query to X' challenge.
I'm monitoring a number of networked vending machines. Each machine has a number of parts, e.g. bank note acceptor, coin system, printer and so on.
Problems with machine parts are logged in table, let's call it 'faults', which looks something like this (irrelevant fields omitted):
machineid partid start_time end_time
--------- ------ ---------------- ----------------
1 2 2009-10-05 09:00 NULL
1 3 2009-10-05 08:00 2009-10-05 10:00
2 2 2009-09-30 12:00 2009-09-30 14:00
3 4 2009-09-28 13:00 2009-09-28 15:00
3 2 2009-09-28 12:00 2009-09-28 14:00
end_date is NULL if the problem is currently ongoing.
I need a query which show time periods for which the machine as a whole is down, and which can account for overlapping ranges, collapsing them down into a single record. So for the sample data above, it would produce:
machineid start_time end_time
--------- ---------------- ----------------
1 2009-10-05 08:00 NULL
2 2009-09-30 12:00 2009-09-30 14:00
3 2009-09-28 12:00 2009-09-28 15:00
It's not tough to write procedural code to do this line by line, but a nice declarative SQL query would be more useful, more elegant. It seems like it ought to be possible, I just can't quite get there though.
SQL dialect is Oracle. Analytic functions are availabe if that would help.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
使用分析,您可以构建一个对数据进行单次传递的查询(对于大型数据集,这将是最有效的):
该查询首先确定一行是否与之前开始的任何行相邻。然后我们对连续的行进行分组。
using analytics, you can build a query that will make a single pass on the data (with a large data set this will be the most efficient):
This query starts by determining if a row is contiguous to any row that started before. We then group the rows that are contiguous.
基本上,你不能在纯集理论中做到这一点(找到森林的覆盖分区集)(例如,作为没有循环的有界查询)。
要以最类似于集合的方式执行此操作,
创建一个用于林分区的临时表(10 或 11 列,4 来自故障 #1,4 来自故障 #2,1 表示分区 ID,1 表示哪个节点的轮次)被插入,1用于各种优化,我在38C的发烧下无法想到。
运行一个循环(BFS 或 DFS,无论您发现什么更容易实现森林分区算法),与图相比,棘手的部分是您可以将许多子树连接起来 顶部到当前子树
您可以使用sheepsimulator的查询作为循环的基本构建块(例如查找2个连接的节点)
当分区循环完成,只需执行
我只是为没有拼写森林分区的确切代码而道歉(它可能会在树分区下归档) - 我累坏了,我确信一些谷歌搜索会产生一个现在你知道的tdata 结构和问题名称(或者您可以将其作为更精确表述的 Q 发布在 StackOverflow 上 - 例如“如何实现将森林完全分区为 SQL 中的循环的算法”。
Basically, you can not do it (find a covering partition set of a forest) in pure set theory (e.g. as a bounded # of queries without a loop).
To do it in the most set-like way,
Create a temp table for forest partitioning (10 or 11 columns, 4 from failure #1, 4 from failure #2, 1 for partition ID, 1 for round in which node was inserted, and 1 for assorted optimizations I can't think of with a fever of 38C.
Run a loop (BFS or DFS, whatever you find to easier implement the forest partitioning algorithm in). The tricky part, compared to graphs, is that you can have many sub-trees joined from the top to current sub-tree
You can use sheepsimulator's query as basic building block for the loop (e.g. finding 2 connected node)
When the partitioning loop is done, simply do
I apologize for not spelling the exact code for forest partitioning (it may be filed under tree partitioning) - I'm dead tired and I'm certain some Googling will yield one now that you know the tdata structure and problem name (or you can post this as a more precisely formulated Q on StackOverflow - e.g. "How to implement an algorithm for complete partitioning of a forest of trees as a loop in SQL".
我用以下数据测试了这个查询:
我得到了这个:
I tested this query on the following data:
I got this:
应该完成这项工作(如果需要,用等效的 Oracle 函数替换 ifnull)。
should do the job (replacing ifnull by the equivalent Oracle function if needed).
我希望我有时间给出完整的答案,但这里有一个查找重叠停机时间的提示:
I wish I had time to give a full answer, but here is a hint to find overlapping downtimes:
我相信你需要一个存储过程来做到这一点,或者类似递归“通用表表达式(CTE)(如 SQL srever 中存在)”,否则(在单个 SQL 语句中)你将无法得到正确的答案当 3 行或更多行一起形成覆盖日期的连续范围时。
就像:
在没有实际进行练习的情况下,我可能建议在存储过程中构建一个所有“候选日期”的表,然后构建一个包含现有行中日期范围未涵盖的所有日期的表,然后通过“否定”该集来构建输出结果集。
I believe you would need a stored proc to do this, or something like recursive 'Common Table Expressions (CTEs) (as exists in SQL srever), or otherwise (in a single SQL Statement) you would not be able to get the right answer when 3 or more rows togeher form a contiguous range of covered dates.
like:
Without actually going through the exercise, I might suggest that in the stored proc, build a table of all "candidate dates" and then construct a table that contains all the dates that are NOT covered by a daterange in an existing row, then build your output resultset by "negating" this set.
请参阅此讨论 - 底部附近有一个解决方案: http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.sqlserver.programming&tid =2bae93da-c70e-4de4-a58b-d8cc0bf8ffd5
See this discussion - with a solution near the bottom: http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.sqlserver.programming&tid=2bae93da-c70e-4de4-a58b-d8cc0bf8ffd5
呵呵。
在支持区间类型的 SIRA_PRISE 中,解决这个问题就像
SELECT machineID, period FROM Faults 一样简单。
其中'period'是时间间隔类型的属性,其起点和终点是SQL表的start_time和end_time。
但由于您可能被迫在 SQL 中解决这个问题,并且系统不支持间隔类型,所以我只能祝您有很大的勇气。
两个提示:
两个间隔的并集可以在 SQL 中使用复杂的 CASE 构造来处理(如果interval_values_overlap,然后是lowest_start_time、highest_end_time,诸如此类的东西)。
由于您无法事先知道有多少行将合并为一行,因此您可能会发现自己被迫编写递归 SQL。
Hehe.
In SIRA_PRISE, which supports interval types, solving this problem would be as easy as
SELECT machineID, period FROM Faults.
IN which 'period' is an attribute of the time interval type whose start and end points are start_time and end_time of your SQL table.
But since you are presumably forced to solve this in SQL, and with a system that does not support interval types, I can only wish you a lot of courage.
Two hints :
The union of two intervals can be handled in SQL using complex CASE constructs (if interval_values_overlap then lowest_start_time highest_end_time, all that sort of stuff).
Since you cannot tell beforehand how many rows will merge into one, you will presumably find yourself forced to write recursive SQL.