比较 Group by VS Over Partition By
假设一个表 CAR
包含两列 CAR_ID (int)
和 VERSION (int)
。
我想检索每辆车的最大版本。
因此(至少)有两种解决方案:
select car_id, max(version) as max_version
from car
group by car_id;
或者:
select car_id, max_version
from ( select car_id, version
, max(version) over (partition by car_id) as max_version
from car
) max_ver
where max_ver.version = max_ver.max_version
这两个查询的性能是否相似?
Assuming one table CAR
with two columns CAR_ID (int)
and VERSION (int)
.
I want to retrieve the maximum version of each car.
So there are two solutions (at least) :
select car_id, max(version) as max_version
from car
group by car_id;
Or :
select car_id, max_version
from ( select car_id, version
, max(version) over (partition by car_id) as max_version
from car
) max_ver
where max_ver.version = max_ver.max_version
Are these two queries similarly performant?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我知道这是非常古老的,但认为应该指出。
不知道为什么你这样做选项二...在这种情况下,子选择理论上应该更慢,因为你从同一个表中选择 2x,然后将结果连接回自身。
只需从内联视图中删除版本,它们就是同一件事。
在这种情况下,性能实际上取决于优化器,但是,是的,原始答案建议内联视图,因为它们会缩小结果。尽管这不是一个很好的例子,因为它的同一个表在给定的选择中没有过滤器。
当您选择大量列但需要适合结果集的不同聚合时,分区也很有用。否则,您将被迫按每隔一列进行分组。
I know this is extremely old but thought it should be pointed out.
Not sure why you did option two like that... in this case the sub select should be theoretically slower because your selecting from the same table 2x and then joining the results back to itself.
Just remove version from your inline view and they are the same thing.
The performance really depends on the optimizer in this situation, but yes the as original answer suggests inline views as they do narrow results. Though this is not a good example being its the same table with no filters in the selections given.
Partitioning is also helpful when you are selecting a lot of columns but need different aggregations that fit the result set. Otherwise you are forced to group by every other column.
是的,它可能会影响
第二个查询是内联视图的一个示例。
这是一种非常有用的方法,用于执行具有各种类型计数的报告或使用任何聚合函数。
Oracle 执行子查询,然后使用结果行作为 FROM 子句中的视图。
当我们考虑性能时,始终推荐内联视图而不是选择其他子查询类型。
还有一件事是,第二个查询将给出所有最大记录,而第一个查询将只给出一个最大记录。
参见此处
Yes It may affects
Second query is an example of Inline View.
It's a very useful method for performing reports with various types of counts or use of any aggregate functions with it.
Oracle executes the subquery and then uses the resulting rows as a view in the FROM clause.
As we consider about performance , always recommend inline view instead of choosing another subquery type.
And one more thing second query will give all max records,while first one will give you only one max record.
see here
这将取决于您的索引方案和表中的数据量。优化器可能会根据表中实际的数据做出不同的决策。
我发现,至少在 SQL Server 中(我知道你问过 Oracle),优化器更有可能使用 PARTITION BY 查询而不是 GROUP BY 查询来执行完整扫描。但这仅适用于索引包含 CAR_ID 和 VERSION (DESC) 的情况。
这个故事的寓意是我会彻底测试以选择正确的。对于小桌子来说,这并不重要。对于非常非常大的数据集,两者都可能不会很快......
It will depend on your indexing scheme and the amount of data in the table. The optimizer will likely make different decisions based on the data that's actually inside the table.
I have found, at least in SQL Server (I know you asked about Oracle) that the optimizer is more likely to perform a full scan with the PARTITION BY query vs the GROUP BY query. But that's only in cases where you have an index which contains CAR_ID and VERSION (DESC) in it.
The moral of the story is that I would test thoroughly to choose the right one. For small tables, it doesn't matter. For really, really big data sets, neither may be fast...