为什么 INTERSECT 和嵌套 JOIN 一样慢?

发布于 2024-10-07 00:07:03 字数 1183 浏览 0 评论 0原文

我正在使用 MS SQL。

我有一个带有索引的巨大表,可以使此查询快速:

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010

它在不到 1 秒的时间内返回。该表有数十亿行。只有大约 10000 个结果。

我希望这个查询也能在大约一秒内完成:

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010'

intersect

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 40652 and
IncrementalStatistics.Created > '12/2/2010'

intersect

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 14403 and
IncrementalStatistics.Created > '12/2/2010'

但它需要20秒。所有单独的查询都需要< 1 秒返回大约 10k 个结果。

我希望 SQL 在内部将每个子查询的结果放入哈希表中并进行哈希交集 - 应该是 O(n)。结果集足够大,可以容纳在内存中,所以我怀疑这是一个 IO 问题。

我编写了一个备用查询,它只是一系列嵌套的 JOIN,这也需要大约 20 秒,这是有道理的。

为什么 INTERSECT 这么慢?它是否会在查询处理的早期阶段简化为 JOIN?

I'm using MS SQL.

I have a huge table with indices to make this query fast:

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010

It returns in less than 1 second. The table has billions of rows. There are only around 10000 results.

I would expect this query to also complete in about a second:

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010'

intersect

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 40652 and
IncrementalStatistics.Created > '12/2/2010'

intersect

select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 14403 and
IncrementalStatistics.Created > '12/2/2010'

But it takes 20 seconds. All the individual queries take < 1 second and return around 10k results.

I would expect SQL internally to throw the results from each of these subqueries into a hashtable and do a hash-intersection - should be O(n). The result sets are big enough to fit in memory, so I doubt it's an IO issue.

I wrote an alternate query that is just a series of nested JOINs and this also takes around 20 seconds, which makes sense.

Why is INTERSECT so slow? Does it reduce to a JOIN at an early stage of the query processing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

请你别敷衍 2024-10-14 00:07:03

试试这个吧。显然未经测试,但我认为它会给你你想要的结果。

select userid 
    from IncrementalStatistics 
    where IncrementalStatisticsTypeID = 5 
        and IncrementalStatistics.AssociatedPlaceID in (47828,40652,14403)  
        and IncrementalStatistics.Created > '12/2/2010'
    group by userid
    having count(distinct IncrementalStatistics.AssociatedPlaceID) = 3

Give this a try instead. Untested obviously, but I think it will get you the results you want.

select userid 
    from IncrementalStatistics 
    where IncrementalStatisticsTypeID = 5 
        and IncrementalStatistics.AssociatedPlaceID in (47828,40652,14403)  
        and IncrementalStatistics.Created > '12/2/2010'
    group by userid
    having count(distinct IncrementalStatistics.AssociatedPlaceID) = 3
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文