比使用“A UNION (B in A)”更高效的 SQL？

发布于 2024-08-19 01:58:35 字数 2457 浏览 4 评论 0原文

编辑 1（澄清）：感谢您迄今为止的回答！反响令人欣喜。
我想稍微澄清一下这个问题，因为根据答案，我认为我没有正确描述问题的一个方面（我确信这是我的错，因为我什至很难为自己定义它）。
问题在于：结果集应仅包含 tstamp 介于“2010-01-03”和“2010-01-09”之间的记录，以及每个 order_num 的 tstamp 为 NULL 的一条记录在第一组中（每个 order_num总是有一个 tstamp 为空的集合）。
到目前为止给出的答案似乎包括特定 order_num 的所有记录（如果在“2010-01-03”和“2010-01-09”之间存在 tstamp 的任何记录）。例如，如果存在另一条 order_num = 2 且 tstamp = 2010-01-12 00:00:00 的记录，则不应将其包含在结果中。

原始问题：
考虑一个包含 id（唯一）、order_num、tstamp（时间戳）和 item_id（订单中包含的单个商品）的订单表。 tstamp 为 null，除非订单已被修改，在这种情况下，会有另一条具有相同 order_num 的记录，并且 tstamp 包含更改发生时的时间戳。

示例...

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 0          1                           100
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 3          3                           113
 4          4                           124
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137
 8          6                           100
 9          6  2010-01-13 08:33:55      105

检索在某个日期范围内已被修改一次或多次的所有订单（基于 order_num）的最有效的 SQL 语句是什么？换句话说，对于每个订单，我们需要具有相同 order_num 的所有记录（包括具有 NULL tstamp 的记录），对于每个 order_num，其中至少一个 order_num 的 tstamp NOT NULL AND tstamp BETWEEN '2010-01-03'和“2010-01-09”。我遇到困难的是“其中至少一个 order_num 的 tstamp NOT NULL”。

结果集应该是这样的：

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137

我想出的SQL是这样的，本质上是“A UNION (B in A)”，但是它执行得很慢，我希望有一个更有效的解决方案：

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09')
    AS history_orders
UNION
SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp IS NULL)
    AS current_orders
WHERE current_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

原文

Edit 1 (clarification): Thank you for the answers so far! The response is gratifying.
I want to clarify the question a little because based on the answers I think I did not describe one aspect of the problem correctly (and I'm sure that's my fault as I was having a difficult time defining it even for myself).
Here's the rub: The result set should contain ONLY the records with tstamp BETWEEN '2010-01-03' AND '2010-01-09', AND the one record where the tstamp IS NULL for each order_num in the first set (there will always be one with null tstamp for each order_num).
The answers given so far appear to include all records for a certain order_num if there are any with tstamp BETWEEN '2010-01-03' AND '2010-01-09'. For example, if there were another record with order_num = 2 and tstamp = 2010-01-12 00:00:00 it should not be included in the result.

Original question:
Consider an orders table containing id (unique), order_num, tstamp (a timestamp), and item_id (the single item included in an order). tstamp is null, unless the order has been modified, in which case there is another record with identical order_num and tstamp then contains the timestamp of when the change occurred.

Example...

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 0          1                           100
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 3          3                           113
 4          4                           124
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137
 8          6                           100
 9          6  2010-01-13 08:33:55      105

What is the most efficient SQL statement to retrieve all of the orders (based on order_num) which have been modified one or more times during a certain date range? In other words, for each order we need all of the records with the same order_num (including the one with NULL tstamp), for each order_num WHERE at least one of the order_num's has tstamp NOT NULL AND tstamp BETWEEN '2010-01-03' AND '2010-01-09'. It's the "WHERE at least one of the order_num's has tstamp NOT NULL" that I'm having difficulty with.

The result set should look like this:

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137

The SQL that I came up with is this, which is essentially "A UNION (B in A)", but it executes slowly and I hope there is a more efficient solution:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09')
    AS history_orders
UNION
SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp IS NULL)
    AS current_orders
WHERE current_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吃不饱 2024-08-26 01:58:35

也许是一个子查询：

select * from order o where o.order_num in (select distinct
  order_num from order where tstamp between '2010-01-03' and '2010-01-09')

perhaps a subquery:

select * from order o where o.order_num in (select distinct
  order_num from order where tstamp between '2010-01-03' and '2010-01-09')

回复收藏 0 原文

溇涏 2024-08-26 01:58:35

除非我误解了，否则类似这样的事情应该可以解决问题：

SELECT o1.id, o1.order_num, o.tstamp, o.item_id
FROM  orders o1
WHERE EXISTS(
    SELECT * FROM orders o2 
    WHERE o1.order_num = o2.order_num 
        AND o2.tstamp BETWEEN '2010-01-03' AND '2010-01-09')

使用 EXISTS 的好处是，一旦它罚款第一场比赛，它就会停止。

Unless I've misunderstood, something like this should do the trick:

SELECT o1.id, o1.order_num, o.tstamp, o.item_id
FROM  orders o1
WHERE EXISTS(
    SELECT * FROM orders o2 
    WHERE o1.order_num = o2.order_num 
        AND o2.tstamp BETWEEN '2010-01-03' AND '2010-01-09')

Benefit of using EXISTS is that it stops as soon as it fines the first match.

回复收藏 0 原文

衣神在巴黎 2024-08-26 01:58:35

我知道回复已经很晚了，但我刚刚看到这篇文章，我想也许我应该尝试一次，这个查询怎么样，与上述所有解决方案相比，它确实非常小并解决了目的。

select * from orders_gc where order_num in 
    (select order_num
     from orders_gc 
     group by order_num 
     having count(id) > 1 and 
     MAX(tstamp) between '03-jan-2010' and '09-jan-2010')

I know it's very late for repliying but I just saw this posting and I thought maybe I should try this once, how about this query, it's really very small as compared to all of the above solutions and solves the purpose.

select * from orders_gc where order_num in 
    (select order_num
     from orders_gc 
     group by order_num 
     having count(id) > 1 and 
     MAX(tstamp) between '03-jan-2010' and '09-jan-2010')

回复收藏 0 原文

汹涌人海 2024-08-26 01:58:35

希望我答对了你的问题。这应该返回在提供的时间戳内订单已更改的所有订单。

SELECT o.order_id, o.tstamp, o.item_id
FROM orders o
JOIN ( SELECT DISTINCT o2.order_num
       FROM orders o2
       WHERE o2.tstamp BETWEEN '2010-01-03' AND '2010-01-09' ) o3
ON ( o3.order_num = o.order_num )

Hope I got your question right. This should return all orders which have on order that has been changed within the provided timestamp.

SELECT o.order_id, o.tstamp, o.item_id
FROM orders o
JOIN ( SELECT DISTINCT o2.order_num
       FROM orders o2
       WHERE o2.tstamp BETWEEN '2010-01-03' AND '2010-01-09' ) o3
ON ( o3.order_num = o.order_num )

回复收藏 0 原文

桜花祭 2024-08-26 01:58:35

您可以自行加入表。简化后，这看起来像：

select order_id
from orders all_orders
inner join orders not_null_orders
    on all_orders.order_id = not_null_orders.order_id
where
    not_null_orders.tstamp is not null
    and all_orders.tstamp between '2010-01-03' AND '2010-01-09'

You can self-join the table. Simplified, this would look like:

select order_id
from orders all_orders
inner join orders not_null_orders
    on all_orders.order_id = not_null_orders.order_id
where
    not_null_orders.tstamp is not null
    and all_orders.tstamp between '2010-01-03' AND '2010-01-09'

回复收藏 0 原文

辞取 2024-08-26 01:58:35

再次感谢您的所有建议。我找到了三个有效的解决方案，包括我原来的解决方案。最后我添加了一些性能结果，但没有我希望的那么好。如果有人可以改进这一点，我会很高兴！

1）到目前为止找到的最好的解决方案似乎是：

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE history_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

2）我还尝试使用 EXISTS 代替 IN，这需要在最后一个 SELECT 中添加一个附加的 WHERE 子句：

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE EXISTS
   (SELECT orders.order_id
    FROM orders
    WHERE history_orders.order_id = orders.order_id
    AND orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

3）最后是我原来的解决方案，使用 UNION。

评论：
为了评论表大小，我实际的“现实世界”问题涉及 4 个表（通过内部联接连接），分别包含 98、2189、43897、785656 条记录。

性能 - 我将每个解决方案运行了三次，这是我的真实结果：
1: 52, 51, 51 秒
2: 54, 54, 53 秒
3: 56, 56, 56 秒

Thank you again for all the suggestions. I found three solutions that work, including my original. At the end I've added some performance results, which are not as great as I had hoped. If anyone can improve on this I would be thrilled!

1) The best solution found so far seems to be:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE history_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

2) I also tried using EXISTS in place of IN, which requires an additional WHERE clause in the last SELECT:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE EXISTS
   (SELECT orders.order_id
    FROM orders
    WHERE history_orders.order_id = orders.order_id
    AND orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

3) And finally there is my original solution, using UNION.

Comments:
To comment on the table size, my actual "real world" problem involves 4 tables (connected with inner joins) containing 98, 2189, 43897, 785656 records respectively.

Performance - I ran each solution three times and here are my real world results:
1: 52, 51, 51 seconds
2: 54, 54, 53 s
3: 56, 56, 56 s

回复收藏 0 原文

~没有更多了~