在 Postgres 中将时间戳截断为 5 分钟的最快方法是什么？

发布于 2024-12-02 21:21:44 字数 377 浏览 0 评论 0原文

Postgres 可以使用 date_trunc 函数舍入（截断）时间戳，如下所示：

date_trunc('hour', val)
date_trunc('minute', val)

我正在寻找一种将时间戳截断到最近的 5 分钟边界的方法，例如，14:26:57 变成 14:25:00。简单的方法是这样的：

date_trunc('hour', val) + date_part('minute', val)::int / 5 * interval '5 min'

由于这是查询的性能关键部分，我想知道这是否是最快的解决方案，或者是否有一些我忽略的快捷方式（与 Postgres 8.1+ 兼容）。

原文

Postgres can round (truncate) timestamps using the date_trunc function, like this:

date_trunc('hour', val)
date_trunc('minute', val)

I'm looking for a way to truncate a timestamp to the nearest 5-minute boundary so, for example, 14:26:57 becomes 14:25:00. The straightforward way to do it is like this:

date_trunc('hour', val) + date_part('minute', val)::int / 5 * interval '5 min'

Since this is a performance-critical part of the query, I'm wondering whether this is the fastest solution, or whether there's some shortcut (compatible with Postgres 8.1+) that I've overlooked.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

缺⑴份安定 2024-12-09 21:21:44

我也想知道同样的事情。我找到了两种替代方法，但您建议的方法更快。

我非正式地对我们的一张较大的表进行了基准测试。我将查询限制为前 400 万行。我在两个查询之间交替进行，以避免由于数据库缓存而给一个查询带来不公平的优势。

遍历 epoch/unix 时间

SELECT to_timestamp(
    floor(EXTRACT(epoch FROM ht.time) / EXTRACT(epoch FROM interval '5 min'))
    * EXTRACT(epoch FROM interval '5 min')
) FROM huge_table AS ht LIMIT 4000000

（请注意，即使您使用不知道时区的数据类型，这也会生成 timestamptz）

结果

运行 1：39.368 秒
运行 3：39.526 秒
运行 5：39.883 秒

使用 date_trunc 和 date_part

SELECT 
    date_trunc('hour', ht.time) 
    + date_part('minute', ht.time)::int / 5 * interval '5 min'
FROM huge_table AS ht LIMIT 4000000

结果

运行 2：34.189 秒
运行 4：37.028 秒
运行 6：32.397 秒

系统

数据库版本：PostgreSQL 9.6。 2 在 x86_64-pc-linux-gnu 上，由 gcc 编译（Ubuntu 4.8.2-19ubuntu1）4.8.2， 64 位
核心：Intel® Xeon®、E5-1650v2、六核
RAM：64 GB、DDR3 ECC RAM

结论

您的版本似乎更快。但对于我的具体用例来说还不够快。不必指定小时的优点使得纪元版本更加通用，并在客户端代码中产生更简单的参数化。它可以处理 2 小时 间隔以及 5 分钟 间隔，而无需提高 date_trunc 时间单位参数。最后，我希望这个时间单位参数改为时间间隔参数。

I was wondering the same thing. I found two alternative ways of doing this, but the one you suggested was faster.

I informally benchmarked against one of our larger tables. I limited the query to the first 4 million rows. I alternated between the two queries in order to avoid giving one a unfair advantage due to db caching.

Going through epoch/unix time

SELECT to_timestamp(
    floor(EXTRACT(epoch FROM ht.time) / EXTRACT(epoch FROM interval '5 min'))
    * EXTRACT(epoch FROM interval '5 min')
) FROM huge_table AS ht LIMIT 4000000

(Note this produces timestamptzeven if you used a time zone unaware datatype)

Results

Run 1: 39.368 seconds
Run 3: 39.526 seconds
Run 5: 39.883 seconds

Using date_trunc and date_part

SELECT 
    date_trunc('hour', ht.time) 
    + date_part('minute', ht.time)::int / 5 * interval '5 min'
FROM huge_table AS ht LIMIT 4000000

Results

Run 2: 34.189 seconds
Run 4: 37.028 seconds
Run 6: 32.397 seconds

System

DB version: PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
Cores: Intel® Xeon®, E5-1650v2, Hexa-Core
RAM: 64 GB, DDR3 ECC RAM

Conclusion

Your version seems to be faster. But not fast enough for my specific use case. The advantage of not having to specify the hour makes the epoch version more versatile and produces simpler parameterization in client side code. It handles 2 hour intervals just as well as 5 minute intervals without having to bump the date_trunc time unit argument up. On a end note, I wish this time unit argument was changed to a time interval argument instead.

回复收藏 0 原文

半城柳色半声笛 2024-12-09 21:21:44

我认为没有更快的方法。

我认为您不应该担心表达式的性能。

执行 (SELECT, UPDATE, ...) 语句所涉及的所有其他内容很可能比日期/时间计算昂贵得多（例如检索行的 I/O）。

回复收藏 0 原文

尸血腥色 2024-12-09 21:21:44

自 Postgres 14 起，< strong>date_bin() 是最简单且最快的：

date_bin('5 min', val, '2000-1-1')

手册：

函数date_bin将输入时间戳“装箱”到指定的
与指定原点对齐的间隔（步幅）。
date_bin(步幅，源，原点)
source 是 timestamp 或 timestamp with time zone 类型的值表达式>。（date 类型的值会自动转换为
timestamp。）stride 是间隔类型的值表达式。
返回值同样是 timestamp 或 timestamp with time zone 类型，它标记了 bin 的开始位置。
源代码已放置。

提供匹配数据类型的“来源”，以避免由于忽略时区或假定错误的时区而导致意外结果。

我的示例看起来像日期文字，但也可用作有效的时间戳文字。如果时间部分缺失，则假定为“00:00”。

相关：

在 PostgreSQL 中生成两个日期之间的时间序列

Since Postgres 14, date_bin() is simplest and fastest:

date_bin('5 min', val, '2000-1-1')

The manual:

The function date_bin “bins” the input timestamp into the specified
interval (the stride) aligned with a specified origin.
date_bin(stride, source, origin)
source is a value expression of type timestamp or timestamp with time zone. (Values of type date are cast automatically to
timestamp.) stride is a value expression of type interval.
The return value is likewise of type timestamp or timestamp with time zone, and it marks the beginning of the bin into which the
source is placed.

Provide an "origin" of matching data type to avoid unexpected results from the cast ignoring time zones or assuming the wrong one.

My example looks like a date literal but serves as valid timestamp literal, too. If the time component is missing '00:00' is assumed.

Generating time series between two dates in PostgreSQL

回复收藏 0 原文

骄傲 2024-12-09 21:21:44

对于那些想知道的人的完整查询（基于@DNS问题）：

假设您有订单并且您想按 5min 和 shop_id 的切片来计数它们：

SELECT date_trunc('hour', created_at) + date_part('minute', created_at)::int / 5 * interval '5 min' AS minute
      , shop_id, count(id) as orders_count
FROM orders
GROUP BY 1, shop_id
ORDER BY 1 ASC

Full query for those wondering (based on @DNS question):

Assuming you have orders and you want to count them by slices of 5min and shop_id:

SELECT date_trunc('hour', created_at) + date_part('minute', created_at)::int / 5 * interval '5 min' AS minute
      , shop_id, count(id) as orders_count
FROM orders
GROUP BY 1, shop_id
ORDER BY 1 ASC

回复收藏 0 原文

黑寡妇 2024-12-09 21:21:44

SELECT to_timestamp(time_stamp-abs(time_stamp%(5*60)))

SELECT to_timestamp(time_stamp-abs(time_stamp%(5*60)))

回复收藏 0 原文

~没有更多了~

关于作者

﹏雨一样淡蓝的深情

暂无简介

文章

23 人气

关注发私信

友情链接

文江博客

在 Postgres 中将时间戳截断为 5 分钟的最快方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

遍历 epoch/unix 时间

使用 date_trunc 和 date_part

结论

Going through epoch/unix time

Using date_trunc and date_part

Conclusion

关于作者

相关话题

热门标签

推荐作者

丶视觉

蓝礼

birdxs

foonlee

微信用户

っ〆星空下的拥抱

友情链接

在 Postgres 中将时间戳截断为 5 分钟的最快方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

遍历 epoch/unix 时间

使用 date_trunc 和 date_part

结论

Going through epoch/unix time

Using date_trunc and date_part

Conclusion

关于作者

相关话题

热门标签

推荐作者

丶视觉

蓝礼

birdxs

foonlee

微信用户

っ〆星空下的拥抱

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。