在 Postgres 中将时间戳截断为 5 分钟的最快方法是什么?
Postgres 可以使用 date_trunc 函数舍入(截断)时间戳,如下所示:
date_trunc('hour', val)
date_trunc('minute', val)
我正在寻找一种将时间戳截断到最近的 5 分钟边界的方法,例如,14:26:57 变成 14:25:00。简单的方法是这样的:
date_trunc('hour', val) + date_part('minute', val)::int / 5 * interval '5 min'
由于这是查询的性能关键部分,我想知道这是否是最快的解决方案,或者是否有一些我忽略的快捷方式(与 Postgres 8.1+ 兼容) 。
Postgres can round (truncate) timestamps using the date_trunc function, like this:
date_trunc('hour', val)
date_trunc('minute', val)
I'm looking for a way to truncate a timestamp to the nearest 5-minute boundary so, for example, 14:26:57 becomes 14:25:00. The straightforward way to do it is like this:
date_trunc('hour', val) + date_part('minute', val)::int / 5 * interval '5 min'
Since this is a performance-critical part of the query, I'm wondering whether this is the fastest solution, or whether there's some shortcut (compatible with Postgres 8.1+) that I've overlooked.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我也想知道同样的事情。我找到了两种替代方法,但您建议的方法更快。
我非正式地对我们的一张较大的表进行了基准测试。我将查询限制为前 400 万行。我在两个查询之间交替进行,以避免由于数据库缓存而给一个查询带来不公平的优势。
遍历 epoch/unix 时间
(请注意,即使您使用不知道时区的数据类型,这也会生成
timestamptz
)结果
使用 date_trunc 和 date_part
结果
系统
结论
您的版本似乎更快。但对于我的具体用例来说还不够快。不必指定小时的优点使得纪元版本更加通用,并在客户端代码中产生更简单的参数化。它可以处理
2 小时
间隔以及5 分钟
间隔,而无需提高date_trunc
时间单位参数。最后,我希望这个时间单位参数改为时间间隔参数。I was wondering the same thing. I found two alternative ways of doing this, but the one you suggested was faster.
I informally benchmarked against one of our larger tables. I limited the query to the first 4 million rows. I alternated between the two queries in order to avoid giving one a unfair advantage due to db caching.
Going through epoch/unix time
(Note this produces
timestamptz
even if you used a time zone unaware datatype)Results
Using date_trunc and date_part
Results
System
Conclusion
Your version seems to be faster. But not fast enough for my specific use case. The advantage of not having to specify the hour makes the epoch version more versatile and produces simpler parameterization in client side code. It handles
2 hour
intervals just as well as5 minute
intervals without having to bump thedate_trunc
time unit argument up. On a end note, I wish this time unit argument was changed to a time interval argument instead.我认为没有更快的方法。
我认为您不应该担心表达式的性能。
执行 (SELECT, UPDATE, ...) 语句所涉及的所有其他内容很可能比日期/时间计算昂贵得多(例如检索行的 I/O)。
I don't think there is any quicker method.
And I don't think you should be worried about the performance of the expression.
Everything else that is involved in executing your (SELECT, UPDATE, ...) statement is most probably a lot more expensive (e.g. the I/O to retrieve rows) than that date/time calculation.
自 Postgres 14 起,< strong>
date_bin()
是最简单且最快的:手册:
提供匹配数据类型的“来源”,以避免由于忽略时区或假定错误的时区而导致意外结果。
我的示例看起来像日期文字,但也可用作有效的时间戳文字。如果时间部分缺失,则假定为“00:00”。
相关:
Since Postgres 14,
date_bin()
is simplest and fastest:The manual:
Provide an "origin" of matching data type to avoid unexpected results from the cast ignoring time zones or assuming the wrong one.
My example looks like a
date
literal but serves as validtimestamp
literal, too. If the time component is missing '00:00' is assumed.Related:
对于那些想知道的人的完整查询(基于@DNS问题):
假设您有订单并且您想按 5min 和 shop_id 的切片来计数它们:
Full query for those wondering (based on @DNS question):
Assuming you have orders and you want to count them by slices of 5min and shop_id: