比较 Group by VS Over Partition By

发布于 2025-01-06 04:30:19 字数 523 浏览 3 评论 0原文

假设一个表 CAR 包含两列 CAR_ID (int) 和 VERSION (int)。

我想检索每辆车的最大版本。

因此（至少）有两种解决方案：

select car_id, max(version) as max_version 
  from car  
 group by car_id;

或者：

select car_id, max_version 
  from  ( select car_id, version
               , max(version) over (partition by car_id) as max_version
            from car
                ) max_ver  
 where max_ver.version = max_ver.max_version

这两个查询的性能是否相似？

原文

Assuming one table CAR with two columns CAR_ID (int) and VERSION (int).

I want to retrieve the maximum version of each car.

So there are two solutions (at least) :

select car_id, max(version) as max_version 
  from car  
 group by car_id;

Or :

select car_id, max_version 
  from  ( select car_id, version
               , max(version) over (partition by car_id) as max_version
            from car
                ) max_ver  
 where max_ver.version = max_ver.max_version

Are these two queries similarly performant?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寒尘 2025-01-13 04:30:19

我知道这是非常古老的，但认为应该指出。

select car_id, max_version 
  from (select car_id
             , version
             , max(version) over (partition by car_id) as max_version
          from car ) max_ver  
 where max_ver.version = max_ver.max_version

不知道为什么你这样做选项二...在这种情况下，子选择理论上应该更慢，因为你从同一个表中选择 2x，然后将结果连接回自身。

只需从内联视图中删除版本，它们就是同一件事。

select car_id, max(version) over (partition by car_id) as max_version
  from car

在这种情况下，性能实际上取决于优化器，但是，是的，原始答案建议内联视图，因为它们会缩小结果。尽管这不是一个很好的例子，因为它的同一个表在给定的选择中没有过滤器。

当您选择大量列但需要适合结果集的不同聚合时，分区也很有用。否则，您将被迫按每隔一列进行分组。

I know this is extremely old but thought it should be pointed out.

select car_id, max_version 
  from (select car_id
             , version
             , max(version) over (partition by car_id) as max_version
          from car ) max_ver  
 where max_ver.version = max_ver.max_version

Not sure why you did option two like that... in this case the sub select should be theoretically slower because your selecting from the same table 2x and then joining the results back to itself.

Just remove version from your inline view and they are the same thing.

select car_id, max(version) over (partition by car_id) as max_version
  from car

The performance really depends on the optimizer in this situation, but yes the as original answer suggests inline views as they do narrow results. Though this is not a good example being its the same table with no filters in the selections given.

Partitioning is also helpful when you are selecting a lot of columns but need different aggregations that fit the result set. Otherwise you are forced to group by every other column.

回复收藏 0 原文