Postgresql UNION 花费的时间是运行单个查询的 10 倍

发布于 2024-11-15 07:04:42 字数 610 浏览 0 评论 0原文

我试图获取 postgresql 中两个几乎相同的表之间的差异。我正在运行的当前查询是：

SELECT * FROM tableA EXCEPT SELECT * FROM tableB;

并且

SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

上面的每个查询都需要大约 2 分钟来运行（它是一个大表）

我想结合这两个查询以希望节省时间，所以我尝试：

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

虽然它有效，但它需要跑20分钟！！！我猜想最多需要 4 分钟，即单独运行每个查询的时间。

UNION 是否做了一些额外的工作导致需要这么长时间？或者有什么方法可以加快速度（有或没有联盟）？

更新：使用 UNION ALL 运行查询需要 15 分钟，几乎是单独运行每个查询的 4 倍，我是否正确地说 UNION (all) 根本不会加快速度？

原文

I am trying to get the diff between two nearly identical tables in postgresql. The current query I am running is:

SELECT * FROM tableA EXCEPT SELECT * FROM tableB;

and

SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

Each of the above queries takes about 2 minutes to run (Its a large table)

I wanted to combine the two queries in hopes to save time, so I tried:

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

And while it works, it takes 20 minutes to run!!! I would guess that it would at most take 4 minutes, the amount of time to run each query individually.

Is there some extra work UNION is doing that is making it take so long? Or is there any way I can speed this up (with or without the UNION)?

UPDATE: Running the query with UNION ALL takes 15 minutes, almost 4 times as long as running each one on its own, Am I correct in saying that UNION (all) is not going to speed this up at all?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

又怨 2024-11-22 07:04:42

关于你的“额外工作”问题。是的。 Union 不仅合并两个查询，而且还遍历并删除重复项。这与使用 unique 语句相同。

因此，特别是与您的 except 语句“union all”结合使用可能会更快。

回复收藏 0 原文

我很坚强 2024-11-22 07:04:42

除了合并第一个和第二个查询的结果之外，UNION 默认情况下还会删除重复记录。（参见http://www.postgresql.org/docs/8.1/static /sql-select.html）。检查两个查询之间的重复记录所涉及的额外工作可能是造成额外时间的原因。在这种情况下，不应有任何重复记录，因此可以通过指定 UNION ALL 来避免查找重复记录的额外工作。

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION ALL
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

In addition to combining the results of the first and second query, UNION by default also removes duplicate records. (see http://www.postgresql.org/docs/8.1/static/sql-select.html). The extra work involved in checking for duplicate records between the two queries is probably responsible for the extra time. In this situation there should not be any duplicate records so the extra work looking for duplicates can be avoided by specifying UNION ALL.

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION ALL
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

回复收藏 0 原文

沦落红尘 2024-11-22 07:04:42

我认为您的代码不会返回您想要的结果集。我宁愿认为你想要这样做：

SELECT * 
  FROM (
        SELECT * FROM tableA 
        EXCEPT 
        SELECT * FROM tableB
       ) AS T1
UNION 
SELECT * 
  FROM (
        SELECT * FROM tableB 
        EXCEPT 
        SELECT * FROM tableA
       ) AS T2;

换句话说，你想要一组互斥的成员。如果是这样，您需要阅读 SQL 中的关系运算符优先级；）当您阅读完后，您可能会意识到上述内容可以合理化为：

SELECT * FROM tableA 
UNION 
SELECT * FROM tableB
EXCEPT 
SELECT * FROM tableA 
INTERSECT
SELECT * FROM tableB;

FWIW，使用子查询（派生表 T1 和 T2）来显式显示（否则是隐式的）关系运算符优先级，您的原始查询是这样的：

SELECT * 
  FROM (
        SELECT * 
          FROM (
                SELECT * 
                  FROM tableA 
                EXCEPT 
                SELECT * 
                  FROM tableB
               ) AS T2
        UNION
        SELECT * 
          FROM tableB
       ) AS T1
EXCEPT 
SELECT * 
  FROM tableA;

上面的内容可以关系到：

SELECT * 
  FROM tableB 
EXCEPT 
SELECT * 
  FROM tableA;

...我认为这不是预期的结果。

I don't think your code returns resultset you intend it to. I rather think you want to do this:

SELECT * 
  FROM (
        SELECT * FROM tableA 
        EXCEPT 
        SELECT * FROM tableB
       ) AS T1
UNION 
SELECT * 
  FROM (
        SELECT * FROM tableB 
        EXCEPT 
        SELECT * FROM tableA
       ) AS T2;

In other words, you want the set of mutually exclusive members. If so, you need to read up on relational operator precedence in SQL ;) And when you have, you may realise the above can be rationalised to:

SELECT * FROM tableA 
UNION 
SELECT * FROM tableB
EXCEPT 
SELECT * FROM tableA 
INTERSECT
SELECT * FROM tableB;

FWIW, using subqueries (derived tables T1 and T2) to explicitly show (what would otherwise be implicit) relational operator precedence, your original query is this:

SELECT * 
  FROM (
        SELECT * 
          FROM (
                SELECT * 
                  FROM tableA 
                EXCEPT 
                SELECT * 
                  FROM tableB
               ) AS T2
        UNION
        SELECT * 
          FROM tableB
       ) AS T1
EXCEPT 
SELECT * 
  FROM tableA;

The above can be relationalised to:

SELECT * 
  FROM tableB 
EXCEPT 
SELECT * 
  FROM tableA;

...and I think not what is intended.

回复收藏 0 原文

卖梦商人 2024-11-22 07:04:42

您可以使用 tableA FULL OUTER JOIN tableB，它只需 1 次表扫描即可提供您想要的内容（具有适当的联接条件），它可能比上面的 2 个查询更快。

请发布更多信息。

回复收藏 0 原文

~没有更多了~

关于作者

爱，才寂寞

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

Postgresql UNION 花费的时间是运行单个查询的 10 倍

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

Postgresql UNION 花费的时间是运行单个查询的 10 倍

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。