比使用“A UNION (B in A)”更高效的 SQL?

发布于 2024-08-19 01:58:35 字数 2457 浏览 4 评论 0原文

编辑 1(澄清):感谢您迄今为止的回答!反响令人欣喜。
我想稍微澄清一下这个问题,因为根据答案,我认为我没有正确描述问题的一个方面(我确信这是我的错,因为我什至很难为自己定义它)。
问题在于:结果集应仅包含 tstamp 介于“2010-01-03”和“2010-01-09”之间的记录,以及每个 order_num 的 tstamp 为 NULL 的一条记录在第一组中(每个 order_num总是有一个 tstamp 为空的集合)。
到目前为止给出的答案似乎包括特定 order_num 的所有 记录(如果在“2010-01-03”和“2010-01-09”之间存在 tstamp 的任何记录) 。例如,如果存在另一条 order_num = 2 且 tstamp = 2010-01-12 00:00:00 的记录,则不应将其包含在结果中。

原始问题:
考虑一个包含 id(唯一)、order_num、tstamp(时间戳)和 item_id(订单中包含的单个商品)的订单表。 tstamp 为 null,除非订单已被修改,在这种情况下,会有另一条具有相同 order_num 的记录,并且 tstamp 包含更改发生时的时间戳。

示例...

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 0          1                           100
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 3          3                           113
 4          4                           124
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137
 8          6                           100
 9          6  2010-01-13 08:33:55      105

检索在某个日期范围内已被修改一次或多次的所有订单(基于 order_num)的最有效的 SQL 语句是什么?换句话说,对于每个订单,我们需要具有相同 order_num 的所有记录(包括具有 NULL tstamp 的记录),对于每个 order_num,其中至少一个 order_num 的 tstamp NOT NULL AND tstamp BETWEEN '2010-01-03'和“2010-01-09”。我遇到困难的是“其中至少一个 order_num 的 tstamp NOT NULL”。

结果集应该是这样的:

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137

我想出的SQL是这样的,本质上是“A UNION (B in A)”,但是它执行得很慢,我希望有一个更有效的解决方案:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09')
    AS history_orders
UNION
SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp IS NULL)
    AS current_orders
WHERE current_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

Edit 1 (clarification): Thank you for the answers so far! The response is gratifying.
I want to clarify the question a little because based on the answers I think I did not describe one aspect of the problem correctly (and I'm sure that's my fault as I was having a difficult time defining it even for myself).
Here's the rub: The result set should contain ONLY the records with tstamp BETWEEN '2010-01-03' AND '2010-01-09', AND the one record where the tstamp IS NULL for each order_num in the first set (there will always be one with null tstamp for each order_num).
The answers given so far appear to include all records for a certain order_num if there are any with tstamp BETWEEN '2010-01-03' AND '2010-01-09'. For example, if there were another record with order_num = 2 and tstamp = 2010-01-12 00:00:00 it should not be included in the result.

Original question:
Consider an orders table containing id (unique), order_num, tstamp (a timestamp), and item_id (the single item included in an order). tstamp is null, unless the order has been modified, in which case there is another record with identical order_num and tstamp then contains the timestamp of when the change occurred.

Example...

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 0          1                           100
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 3          3                           113
 4          4                           124
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137
 8          6                           100
 9          6  2010-01-13 08:33:55      105

What is the most efficient SQL statement to retrieve all of the orders (based on order_num) which have been modified one or more times during a certain date range? In other words, for each order we need all of the records with the same order_num (including the one with NULL tstamp), for each order_num WHERE at least one of the order_num's has tstamp NOT NULL AND tstamp BETWEEN '2010-01-03' AND '2010-01-09'. It's the "WHERE at least one of the order_num's has tstamp NOT NULL" that I'm having difficulty with.

The result set should look like this:

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137

The SQL that I came up with is this, which is essentially "A UNION (B in A)", but it executes slowly and I hope there is a more efficient solution:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09')
    AS history_orders
UNION
SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp IS NULL)
    AS current_orders
WHERE current_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

吃不饱 2024-08-26 01:58:35

也许是一个子查询:

select * from order o where o.order_num in (select distinct
  order_num from order where tstamp between '2010-01-03' and '2010-01-09')

perhaps a subquery:

select * from order o where o.order_num in (select distinct
  order_num from order where tstamp between '2010-01-03' and '2010-01-09')
溇涏 2024-08-26 01:58:35

除非我误解了,否则类似这样的事情应该可以解决问题:

SELECT o1.id, o1.order_num, o.tstamp, o.item_id
FROM  orders o1
WHERE EXISTS(
    SELECT * FROM orders o2 
    WHERE o1.order_num = o2.order_num 
        AND o2.tstamp BETWEEN '2010-01-03' AND '2010-01-09')

使用 EXISTS 的好处是,一旦它罚款第一场比赛,它就会停止。

Unless I've misunderstood, something like this should do the trick:

SELECT o1.id, o1.order_num, o.tstamp, o.item_id
FROM  orders o1
WHERE EXISTS(
    SELECT * FROM orders o2 
    WHERE o1.order_num = o2.order_num 
        AND o2.tstamp BETWEEN '2010-01-03' AND '2010-01-09')

Benefit of using EXISTS is that it stops as soon as it fines the first match.

衣神在巴黎 2024-08-26 01:58:35

我知道回复已经很晚了,但我刚刚看到这篇文章,我想也许我应该尝试一次,这个查询怎么样,与上述所有解决方案相比,它确实非常小并解决了目的。

select * from orders_gc where order_num in 
    (select order_num
     from orders_gc 
     group by order_num 
     having count(id) > 1 and 
     MAX(tstamp) between '03-jan-2010' and '09-jan-2010')

I know it's very late for repliying but I just saw this posting and I thought maybe I should try this once, how about this query, it's really very small as compared to all of the above solutions and solves the purpose.

select * from orders_gc where order_num in 
    (select order_num
     from orders_gc 
     group by order_num 
     having count(id) > 1 and 
     MAX(tstamp) between '03-jan-2010' and '09-jan-2010')
汹涌人海 2024-08-26 01:58:35

希望我答对了你的问题。这应该返回在提供的时间戳内订单已更改的所有订单。

SELECT o.order_id, o.tstamp, o.item_id
FROM orders o
JOIN ( SELECT DISTINCT o2.order_num
       FROM orders o2
       WHERE o2.tstamp BETWEEN '2010-01-03' AND '2010-01-09' ) o3
ON ( o3.order_num = o.order_num )

Hope I got your question right. This should return all orders which have on order that has been changed within the provided timestamp.

SELECT o.order_id, o.tstamp, o.item_id
FROM orders o
JOIN ( SELECT DISTINCT o2.order_num
       FROM orders o2
       WHERE o2.tstamp BETWEEN '2010-01-03' AND '2010-01-09' ) o3
ON ( o3.order_num = o.order_num )
桜花祭 2024-08-26 01:58:35

您可以自行加入表。简化后,这看起来像:

select order_id
from orders all_orders
inner join orders not_null_orders
    on all_orders.order_id = not_null_orders.order_id
where
    not_null_orders.tstamp is not null
    and all_orders.tstamp between '2010-01-03' AND '2010-01-09'

You can self-join the table. Simplified, this would look like:

select order_id
from orders all_orders
inner join orders not_null_orders
    on all_orders.order_id = not_null_orders.order_id
where
    not_null_orders.tstamp is not null
    and all_orders.tstamp between '2010-01-03' AND '2010-01-09'
辞取 2024-08-26 01:58:35

再次感谢您的所有建议。我找到了三个有效的解决方案,包括我原来的解决方案。最后我添加了一些性能结果,但没有我希望的那么好。如果有人可以改进这一点,我会很高兴!

1)到目前为止找到的最好的解决方案似乎是:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE history_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

2)我还尝试使用 EXISTS 代替 IN,这需要在最后一个 SELECT 中添加一个附加的 WHERE 子句:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE EXISTS
   (SELECT orders.order_id
    FROM orders
    WHERE history_orders.order_id = orders.order_id
    AND orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

3)最后是我原来的解决方案,使用 UNION。

评论:
为了评论表大小,我实际的“现实世界”问题涉及 4 个表(通过内部联接连接),分别包含 98、2189、43897、785656 条记录。

性能 - 我将每个解决方案运行了三次,这是我的真实结果:
1: 52, 51, 51 秒
2: 54, 54, 53 秒
3: 56, 56, 56 秒

Thank you again for all the suggestions. I found three solutions that work, including my original. At the end I've added some performance results, which are not as great as I had hoped. If anyone can improve on this I would be thrilled!

1) The best solution found so far seems to be:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE history_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

2) I also tried using EXISTS in place of IN, which requires an additional WHERE clause in the last SELECT:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE EXISTS
   (SELECT orders.order_id
    FROM orders
    WHERE history_orders.order_id = orders.order_id
    AND orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

3) And finally there is my original solution, using UNION.

Comments:
To comment on the table size, my actual "real world" problem involves 4 tables (connected with inner joins) containing 98, 2189, 43897, 785656 records respectively.

Performance - I ran each solution three times and here are my real world results:
1: 52, 51, 51 seconds
2: 54, 54, 53 s
3: 56, 56, 56 s

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文