使用 UNION 子查询的查询需要很长时间

发布于 2024-11-28 15:03:29 字数 734 浏览 0 评论 0原文

我在某些依赖于子查询的查询上遇到了一个奇怪的问题。它们运行得快如闪电，直到我在子查询中使用 UNION 语句。然后他们就无休无止地跑，我10分钟后就给了。我现在描述的场景不是我一开始的场景，但我认为它消除了很多可能的问题，但却产生了同样的问题。因此，即使这是一个毫无意义的问题，请耐心等待！

我有一个表：

tblUser - 100,000 rows
tblFavourites - 200,000 rows

如果我执行：

SELECT COUNT(*) 
FROM tblFavourites 
WHERE userID NOT IN (SELECT uid FROM tblUser);

...那么它会在一秒钟内运行。但是，如果我修改它以使子查询具有 UNION，它将运行至少 10 分钟（在我放弃之前！）这是

SELECT COUNT(*) 
FROM tblFavourites 
WHERE userID NOT IN (SELECT uid FROM tblUser UNION SELECT uid FROM tblUser);

一个毫无意义的更改，但它应该产生相同的结果，我不明白为什么它应该需要更长的时间吗？

将子查询放入视图中并调用它具有相同的效果。

有什么想法吗？我正在使用 SQL Azure。

问题解决了。请参阅下面我的回答。

原文

I've been having an odd problem with some queries that depend on a sub query. They run lightning fast, until I use a UNION statement in the sub query. Then they run endlessly, I've given after 10 minutes. The scenario I'm describing now isn't the original one I started with, but I think it cuts out a lot of possible problems yet yields the same problem. So even though it's a pointless query, bear with me!

I have a table:

tblUser - 100,000 rows
tblFavourites - 200,000 rows

If I execute:

SELECT COUNT(*) 
FROM tblFavourites 
WHERE userID NOT IN (SELECT uid FROM tblUser);

… then it runs in under a second. However, if I modify it so that the sub query has a UNION, it will run for at least 10 minutes (before I give up!)

SELECT COUNT(*) 
FROM tblFavourites 
WHERE userID NOT IN (SELECT uid FROM tblUser UNION SELECT uid FROM tblUser);

A pointless change, but it should yield the same result and I don't see why it should take any longer?

Putting the sub-query into a view and calling that instead has the same effect.

Any ideas why this would be? I'm using SQL Azure.

Problem solved. See my answer below.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜妞爱困 2024-12-05 15:03:30

UNION 实际上是对组合数据集中的所有字段执行 DISTINCT。它会过滤掉最终结果中的重复内容。

Uid 是否已编入索引？如果不是，则可能需要很长时间，因为查询引擎：

生成第一个结果集
生成第二个结果集
过滤掉哈希表中的所有重复项（即一半记录）

如果不关心重复项（并使用 < code>IN 意味着它们不会），然后使用 UNION ALL 来删除昂贵的排序/过滤步骤。

回复收藏 0 原文

匿名。 2024-12-05 15:03:30

UNION 生成唯一值，因此 DBMS 引擎进行排序。
在这种情况下，您可以安全地使用 UNION ALL。

回复收藏 0 原文

初熏 2024-12-05 15:03:30

UNION 通常通过内存中的临时表来实现。您实际上是将 tblUser 复制到内存中两次，没有索引。那么 tblFavourites 中的每一行都会引发超过 200,000 行的完整表扫描 - 即 200Kx200K=400 亿次双行扫描（因为查询引擎必须从两个表行获取 uid）

如果您的 tblUser 在 uid 上有索引（这绝对是正确的）因为 SQL Azure 中的所有表都必须有聚集索引），那么 tblFavourites 中的每一行都会产生非常快速的索引查找，导致仅 200Kxlog(100K) =200Kx17 = 200K 行扫描，每个扫描有 17 个 b 树索引比较（这比从数据页上的行读取 uid 快得多），因此它应该等于大约 200Kx(3- 4) 或100万次双行扫描。我相信较新版本的 SQL Server 也可能会构建一个仅包含 uid 的临时哈希表，因此本质上它会减少到 200K 行扫描（假设哈希表查找很简单）。

您还应该生成查询计划进行检查。

本质上，如果 tblUser 有索引（应该位于 SQL Azure 上），非 UNION 查询的运行速度会快大约 500,000 倍。

回复收藏 0 原文

神爱温柔 2024-12-05 15:03:30

事实证明，问题是由于索引之一... tblFavourites 包含 tblUser 中主键（uid）的两个外键：

userId
otherUserId

两列具有相同的定义和相同的索引，但我发现将 userId 交换为 otherUserId原始查询解决了问题。

我跑了：

ALTER INDEX ALL ON tblFavourites REBUILD

……然后问题就消失了。现在查询几乎立即执行。

我不太了解 Sql Server/Azure 幕后发生的事情......但我只能想象它是损坏的索引或其他什么？我经常更新统计数据，但这没有效果。

谢谢！

---- 更新

以上并不完全正确。它确实解决了大约 20 分钟的问题，然后又恢复了。我已经与 Microsoft 支持人员联系了几天，问题似乎与 tempDB 有关。他们最终正在研究解决方案。

It turns out the problem was due to one of the indexes ... tblFavourites contained two foreign keys to the primary key (uid) in tblUser:

userId
otherUserId

both columns had the same definition and same indexes, but I discovered that swapping userId for otherUserId in the original query solved the problem.

I ran:

ALTER INDEX ALL ON tblFavourites REBUILD

... and the problem went away. The query now executes almost instantly.

I don't know too much about what goes on behind the scenes in Sql Server/Azure ... but I can only imagine that it was a damaged index or something? I update statistics frequently, but that had no effect.

Thanks!

---- UPDATE

The above was not fully correct. It did fix the problem for around 20 minutes, then it returned. I have been in touch with Microsoft support for several days and it seems the problem is to do with the tempDB. They are working on a solution at their end.

回复收藏 0 原文

暖伴 2024-12-05 15:03:30

我刚刚遇到这个问题。我有大约 100 万行需要检查，然后我意识到我的一些 ID 位于另一个表中，因此我联合起来在一个“不存在”中获取相同的信息。我从花费大约 7 秒的查询变成在一分钟左右后仅处理 5000 行。这似乎有帮助。我绝对讨厌这个解决方案，但我尝试了很多方法，但最终都以同样极其缓慢的执行计划告终。这个在大约 18 秒内就满足了我的需要。

    DECLARE @PIDS TABLE ([PID] [INT] PRIMARY KEY)
    INSERT INTO @PIDS SELECT DISTINCT [ID] FROM [STAGE_TABLE] WITH(NOLOCK)
    INSERT INTO @PIDS SELECT DISTINCT [OTHERID] FROM [PRODUCTION_TABLE] WITH(NOLOCK)
        WHERE NOT EXISTS(SELECT [PID] FROM @PIDS WHERE [PID] = [OTHERID]

    SELECT (columns needed)
    FROM [ORDER_HEADER] [OH] WITH(NOLOCK)
    INNER JOIN @PIDS ON [OH].[SOME_ID] = [PID]

（是的，我尝试了“WHERE EXISTS IN...”作为最终选择...内连接速度更快）
请让我再说一遍，我个人觉得这真的很难看，但我实际上在我的过程中使用了这个 join 两次，所以从长远来看这会节省我的时间。希望这有帮助。

I just ran into this problem. I have about 1million rows to go through and then I realized that some of my IDs were in another table, so I unioned to get the same information in one "NOT EXISTS." I went from the query taking about 7 sec to processing only 5000 rows after a minute or so. This seemed to help. I absolutely hate the solution, but I've tried a multitude of things that all end up w/the same extremely slow execution plan. This one got me what I needed in about 18 sec.

    DECLARE @PIDS TABLE ([PID] [INT] PRIMARY KEY)
    INSERT INTO @PIDS SELECT DISTINCT [ID] FROM [STAGE_TABLE] WITH(NOLOCK)
    INSERT INTO @PIDS SELECT DISTINCT [OTHERID] FROM [PRODUCTION_TABLE] WITH(NOLOCK)
        WHERE NOT EXISTS(SELECT [PID] FROM @PIDS WHERE [PID] = [OTHERID]

    SELECT (columns needed)
    FROM [ORDER_HEADER] [OH] WITH(NOLOCK)
    INNER JOIN @PIDS ON [OH].[SOME_ID] = [PID]

(And yes I tried "WHERE EXISTS IN..." for the final select... inner join was faster)
Please let me say again, I personaly feel this is really ugly, but I actually use this join twice in my proc, so it's going to save me time in the long run. Hope this helps.

回复收藏 0 原文

心凉 2024-12-05 15:03:30

是否更有意义？

将问题从“不在该表和/或该表中出现的所有 Id 的组合列表中的用户 ID”

改写为

“用户 ID” 不在这张桌子上并且也不在那张桌子上

SELECT COUNT(*) 
FROM tblFavourites 
WHERE userID NOT IN (SELECT uid FROM tblUser) 
AND   userID NOT IN (SELECT uid FROM tblUser);

Doesn't it make more sense to rephrase the questions from

"UserIds that aren't on the combined list of all the Ids that apper in this table and/or that table"

"UserIds that aren't on this table AND aren't on that table either

SELECT COUNT(*) 
FROM tblFavourites 
WHERE userID NOT IN (SELECT uid FROM tblUser) 
AND   userID NOT IN (SELECT uid FROM tblUser);

回复收藏 0 原文

~没有更多了~

关于作者

烟沫凡尘

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

使用 UNION 子查询的查询需要很长时间

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

书间行客

我ぃ本無心為│何有愛

神妖

undefined

38169838

彡翼

友情链接

使用 UNION 子查询的查询需要很长时间

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

书间行客

我ぃ本無心為│何有愛

神妖

undefined

38169838

彡翼

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。