比使用“A UNION (B in A)”更高效的 SQL?
编辑 1(澄清):感谢您迄今为止的回答!反响令人欣喜。
我想稍微澄清一下这个问题,因为根据答案,我认为我没有正确描述问题的一个方面(我确信这是我的错,因为我什至很难为自己定义它)。
问题在于:结果集应仅包含 tstamp 介于“2010-01-03”和“2010-01-09”之间的记录,以及每个 order_num 的 tstamp 为 NULL 的一条记录在第一组中(每个 order_num总是有一个 tstamp 为空的集合)。
到目前为止给出的答案似乎包括特定 order_num 的所有 记录(如果在“2010-01-03”和“2010-01-09”之间存在 tstamp 的任何记录) 。例如,如果存在另一条 order_num = 2 且 tstamp = 2010-01-12 00:00:00 的记录,则不应将其包含在结果中。
原始问题:
考虑一个包含 id(唯一)、order_num、tstamp(时间戳)和 item_id(订单中包含的单个商品)的订单表。 tstamp 为 null,除非订单已被修改,在这种情况下,会有另一条具有相同 order_num 的记录,并且 tstamp 包含更改发生时的时间戳。
示例...
id order_num tstamp item_id __ _________ ___________________ _______ 0 1 100 1 2 101 2 2 2010-01-05 12:34:56 102 3 3 113 4 4 124 5 5 135 6 5 2010-01-07 01:23:45 136 7 5 2010-01-07 02:46:00 137 8 6 100 9 6 2010-01-13 08:33:55 105
检索在某个日期范围内已被修改一次或多次的所有订单(基于 order_num)的最有效的 SQL 语句是什么?换句话说,对于每个订单,我们需要具有相同 order_num 的所有记录(包括具有 NULL tstamp 的记录),对于每个 order_num,其中至少一个 order_num 的 tstamp NOT NULL AND tstamp BETWEEN '2010-01-03'和“2010-01-09”。我遇到困难的是“其中至少一个 order_num 的 tstamp NOT NULL”。
结果集应该是这样的:
id order_num tstamp item_id __ _________ ___________________ _______ 1 2 101 2 2 2010-01-05 12:34:56 102 5 5 135 6 5 2010-01-07 01:23:45 136 7 5 2010-01-07 02:46:00 137
我想出的SQL是这样的,本质上是“A UNION (B in A)”,但是它执行得很慢,我希望有一个更有效的解决方案:
SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id FROM (SELECT orders.order_id, orders.tstamp, orders.item_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09') AS history_orders UNION SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id FROM (SELECT orders.order_id, orders.tstamp, orders.item_id FROM orders WHERE orders.tstamp IS NULL) AS current_orders WHERE current_orders.order_id IN (SELECT orders.order_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');
Edit 1 (clarification): Thank you for the answers so far! The response is gratifying.
I want to clarify the question a little because based on the answers I think I did not describe one aspect of the problem correctly (and I'm sure that's my fault as I was having a difficult time defining it even for myself).
Here's the rub: The result set should contain ONLY the records with tstamp BETWEEN '2010-01-03' AND '2010-01-09', AND the one record where the tstamp IS NULL for each order_num in the first set (there will always be one with null tstamp for each order_num).
The answers given so far appear to include all records for a certain order_num if there are any with tstamp BETWEEN '2010-01-03' AND '2010-01-09'. For example, if there were another record with order_num = 2 and tstamp = 2010-01-12 00:00:00 it should not be included in the result.
Original question:
Consider an orders table containing id (unique), order_num, tstamp (a timestamp), and item_id (the single item included in an order). tstamp is null, unless the order has been modified, in which case there is another record with identical order_num and tstamp then contains the timestamp of when the change occurred.
Example...
id order_num tstamp item_id __ _________ ___________________ _______ 0 1 100 1 2 101 2 2 2010-01-05 12:34:56 102 3 3 113 4 4 124 5 5 135 6 5 2010-01-07 01:23:45 136 7 5 2010-01-07 02:46:00 137 8 6 100 9 6 2010-01-13 08:33:55 105
What is the most efficient SQL statement to retrieve all of the orders (based on order_num) which have been modified one or more times during a certain date range? In other words, for each order we need all of the records with the same order_num (including the one with NULL tstamp), for each order_num WHERE at least one of the order_num's has tstamp NOT NULL AND tstamp BETWEEN '2010-01-03' AND '2010-01-09'. It's the "WHERE at least one of the order_num's has tstamp NOT NULL" that I'm having difficulty with.
The result set should look like this:
id order_num tstamp item_id __ _________ ___________________ _______ 1 2 101 2 2 2010-01-05 12:34:56 102 5 5 135 6 5 2010-01-07 01:23:45 136 7 5 2010-01-07 02:46:00 137
The SQL that I came up with is this, which is essentially "A UNION (B in A)", but it executes slowly and I hope there is a more efficient solution:
SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id FROM (SELECT orders.order_id, orders.tstamp, orders.item_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09') AS history_orders UNION SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id FROM (SELECT orders.order_id, orders.tstamp, orders.item_id FROM orders WHERE orders.tstamp IS NULL) AS current_orders WHERE current_orders.order_id IN (SELECT orders.order_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
也许是一个子查询:
perhaps a subquery:
除非我误解了,否则类似这样的事情应该可以解决问题:
使用 EXISTS 的好处是,一旦它罚款第一场比赛,它就会停止。
Unless I've misunderstood, something like this should do the trick:
Benefit of using EXISTS is that it stops as soon as it fines the first match.
我知道回复已经很晚了,但我刚刚看到这篇文章,我想也许我应该尝试一次,这个查询怎么样,与上述所有解决方案相比,它确实非常小并解决了目的。
I know it's very late for repliying but I just saw this posting and I thought maybe I should try this once, how about this query, it's really very small as compared to all of the above solutions and solves the purpose.
希望我答对了你的问题。这应该返回在提供的时间戳内订单已更改的所有订单。
Hope I got your question right. This should return all orders which have on order that has been changed within the provided timestamp.
您可以自行加入表。简化后,这看起来像:
You can self-join the table. Simplified, this would look like:
再次感谢您的所有建议。我找到了三个有效的解决方案,包括我原来的解决方案。最后我添加了一些性能结果,但没有我希望的那么好。如果有人可以改进这一点,我会很高兴!
1)到目前为止找到的最好的解决方案似乎是:
2)我还尝试使用 EXISTS 代替 IN,这需要在最后一个 SELECT 中添加一个附加的 WHERE 子句:
3)最后是我原来的解决方案,使用 UNION。
评论:
为了评论表大小,我实际的“现实世界”问题涉及 4 个表(通过内部联接连接),分别包含 98、2189、43897、785656 条记录。
性能 - 我将每个解决方案运行了三次,这是我的真实结果:
1: 52, 51, 51 秒
2: 54, 54, 53 秒
3: 56, 56, 56 秒
Thank you again for all the suggestions. I found three solutions that work, including my original. At the end I've added some performance results, which are not as great as I had hoped. If anyone can improve on this I would be thrilled!
1) The best solution found so far seems to be:
2) I also tried using EXISTS in place of IN, which requires an additional WHERE clause in the last SELECT:
3) And finally there is my original solution, using UNION.
Comments:
To comment on the table size, my actual "real world" problem involves 4 tables (connected with inner joins) containing 98, 2189, 43897, 785656 records respectively.
Performance - I ran each solution three times and here are my real world results:
1: 52, 51, 51 seconds
2: 54, 54, 53 s
3: 56, 56, 56 s