为什么 INTERSECT 和嵌套 JOIN 一样慢?
我正在使用 MS SQL。
我有一个带有索引的巨大表,可以使此查询快速:
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010
它在不到 1 秒的时间内返回。该表有数十亿行。只有大约 10000 个结果。
我希望这个查询也能在大约一秒内完成:
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010'
intersect
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 40652 and
IncrementalStatistics.Created > '12/2/2010'
intersect
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 14403 and
IncrementalStatistics.Created > '12/2/2010'
但它需要20秒。所有单独的查询都需要< 1 秒返回大约 10k 个结果。
我希望 SQL 在内部将每个子查询的结果放入哈希表中并进行哈希交集 - 应该是 O(n)。结果集足够大,可以容纳在内存中,所以我怀疑这是一个 IO 问题。
我编写了一个备用查询,它只是一系列嵌套的 JOIN,这也需要大约 20 秒,这是有道理的。
为什么 INTERSECT 这么慢?它是否会在查询处理的早期阶段简化为 JOIN?
I'm using MS SQL.
I have a huge table with indices to make this query fast:
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010
It returns in less than 1 second. The table has billions of rows. There are only around 10000 results.
I would expect this query to also complete in about a second:
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 47828 and
IncrementalStatistics.Created > '12/2/2010'
intersect
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 40652 and
IncrementalStatistics.Created > '12/2/2010'
intersect
select userid from IncrementalStatistics where
IncrementalStatisticsTypeID = 5 and
IncrementalStatistics.AssociatedPlaceID = 14403 and
IncrementalStatistics.Created > '12/2/2010'
But it takes 20 seconds. All the individual queries take < 1 second and return around 10k results.
I would expect SQL internally to throw the results from each of these subqueries into a hashtable and do a hash-intersection - should be O(n). The result sets are big enough to fit in memory, so I doubt it's an IO issue.
I wrote an alternate query that is just a series of nested JOINs and this also takes around 20 seconds, which makes sense.
Why is INTERSECT so slow? Does it reduce to a JOIN at an early stage of the query processing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
试试这个吧。显然未经测试,但我认为它会给你你想要的结果。
Give this a try instead. Untested obviously, but I think it will get you the results you want.