Postgresql UNION 花费的时间是运行单个查询的 10 倍

发布于 2024-11-15 07:04:42 字数 610 浏览 0 评论 0原文

我试图获取 postgresql 中两个几乎相同的表之间的差异。我正在运行的当前查询是:

SELECT * FROM tableA EXCEPT SELECT * FROM tableB;

并且

SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

上面的每个查询都需要大约 2 分钟来运行(它是一个大表)

我想结合这两个查询以希望节省时间,所以我尝试:

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

虽然它有效,但它需要跑20分钟!!!我猜想最多需要 4 分钟,即单独运行每个查询的时间。

UNION 是否做了一些额外的工作导致需要这么长时间?或者有什么方法可以加快速度(有或没有联盟)?

更新:使用 UNION ALL 运行查询需要 15 分钟,几乎是单独运行每个查询的 4 倍,我是否正确地说 UNION (all) 根本不会加快速度?

I am trying to get the diff between two nearly identical tables in postgresql. The current query I am running is:

SELECT * FROM tableA EXCEPT SELECT * FROM tableB;

and

SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

Each of the above queries takes about 2 minutes to run (Its a large table)

I wanted to combine the two queries in hopes to save time, so I tried:

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

And while it works, it takes 20 minutes to run!!! I would guess that it would at most take 4 minutes, the amount of time to run each query individually.

Is there some extra work UNION is doing that is making it take so long? Or is there any way I can speed this up (with or without the UNION)?

UPDATE: Running the query with UNION ALL takes 15 minutes, almost 4 times as long as running each one on its own, Am I correct in saying that UNION (all) is not going to speed this up at all?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

又怨 2024-11-22 07:04:42

关于你的“额外工作”问题。是的。 Union 不仅合并两个查询,而且还遍历并删除重复项。这与使用 unique 语句相同。

因此,特别是与您的 except 语句“union all”结合使用可能会更快。

在这里阅读更多内容:
http://www.postgresql.org/files/documentation/books/aw_pgsql/node80.html< /a>

With regards to your "extra work" question. Yes. Union not only combines the two queries but also goes through and removes duplicates. It's the same as using a distinct statement.

For this reason, especially combined with your except statements "union all" would likely be faster.

Read more here:
http://www.postgresql.org/files/documentation/books/aw_pgsql/node80.html

我很坚强 2024-11-22 07:04:42

除了合并第一个和第二个查询的结果之外,UNION 默认情况下还会删除重复记录。 (参见http://www.postgresql.org/docs/8.1/static /sql-select.html)。检查两个查询之间的重复记录所涉及的额外工作可能是造成额外时间的原因。在这种情况下,不应有任何重复记录,因此可以通过指定 UNION ALL 来避免查找重复记录的额外工作。

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION ALL
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

In addition to combining the results of the first and second query, UNION by default also removes duplicate records. (see http://www.postgresql.org/docs/8.1/static/sql-select.html). The extra work involved in checking for duplicate records between the two queries is probably responsible for the extra time. In this situation there should not be any duplicate records so the extra work looking for duplicates can be avoided by specifying UNION ALL.

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION ALL
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
沦落红尘 2024-11-22 07:04:42

我认为您的代码不会返回您想要的结果集。我宁愿认为你想要这样做:

SELECT * 
  FROM (
        SELECT * FROM tableA 
        EXCEPT 
        SELECT * FROM tableB
       ) AS T1
UNION 
SELECT * 
  FROM (
        SELECT * FROM tableB 
        EXCEPT 
        SELECT * FROM tableA
       ) AS T2;

换句话说,你想要一组互斥的成员。如果是这样,您需要阅读 SQL 中的关系运算符优先级;)当您阅读完后,您可能会意识到上述内容可以合理化为:

SELECT * FROM tableA 
UNION 
SELECT * FROM tableB
EXCEPT 
SELECT * FROM tableA 
INTERSECT
SELECT * FROM tableB;

FWIW,使用子查询(派生表 T1T2)来显式显示(否则是隐式的)关系运算符优先级,您的原始查询是这样的:

SELECT * 
  FROM (
        SELECT * 
          FROM (
                SELECT * 
                  FROM tableA 
                EXCEPT 
                SELECT * 
                  FROM tableB
               ) AS T2
        UNION
        SELECT * 
          FROM tableB
       ) AS T1
EXCEPT 
SELECT * 
  FROM tableA;

上面的内容可以关系到:

SELECT * 
  FROM tableB 
EXCEPT 
SELECT * 
  FROM tableA;

...我认为这不是预期的结果。

I don't think your code returns resultset you intend it to. I rather think you want to do this:

SELECT * 
  FROM (
        SELECT * FROM tableA 
        EXCEPT 
        SELECT * FROM tableB
       ) AS T1
UNION 
SELECT * 
  FROM (
        SELECT * FROM tableB 
        EXCEPT 
        SELECT * FROM tableA
       ) AS T2;

In other words, you want the set of mutually exclusive members. If so, you need to read up on relational operator precedence in SQL ;) And when you have, you may realise the above can be rationalised to:

SELECT * FROM tableA 
UNION 
SELECT * FROM tableB
EXCEPT 
SELECT * FROM tableA 
INTERSECT
SELECT * FROM tableB;

FWIW, using subqueries (derived tables T1 and T2) to explicitly show (what would otherwise be implicit) relational operator precedence, your original query is this:

SELECT * 
  FROM (
        SELECT * 
          FROM (
                SELECT * 
                  FROM tableA 
                EXCEPT 
                SELECT * 
                  FROM tableB
               ) AS T2
        UNION
        SELECT * 
          FROM tableB
       ) AS T1
EXCEPT 
SELECT * 
  FROM tableA;

The above can be relationalised to:

SELECT * 
  FROM tableB 
EXCEPT 
SELECT * 
  FROM tableA;

...and I think not what is intended.

卖梦商人 2024-11-22 07:04:42

您可以使用 tableA FULL OUTER JOIN tableB,它只需 1 次表扫描即可提供您想要的内容(具有适当的联接条件),它可能比上面的 2 个查询更快。

请发布更多信息。

You could use tableA FULL OUTER JOIN tableB, which would give what you want (with a propre join condition) with only 1 table scan, it probably would be faster than the 2 queries above.

Post more info please.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文