在 Postgres 中将时间戳截断为 5 分钟的最快方法是什么?

发布于 2024-12-02 21:21:44 字数 377 浏览 0 评论 0原文

Postgres 可以使用 date_trunc 函数舍入(截断)时间戳,如下所示:

date_trunc('hour', val)
date_trunc('minute', val)

我正在寻找一种将时间戳截断到最近的 5 分钟边界的方法,例如,14:26:57 变成 14:25:00。简单的方法是这样的:

date_trunc('hour', val) + date_part('minute', val)::int / 5 * interval '5 min'

由于这是查询的性能关键部分,我想知道这是否是最快的解决方案,或者是否有一些我忽略的快捷方式(与 Postgres 8.1+ 兼容) 。

Postgres can round (truncate) timestamps using the date_trunc function, like this:

date_trunc('hour', val)
date_trunc('minute', val)

I'm looking for a way to truncate a timestamp to the nearest 5-minute boundary so, for example, 14:26:57 becomes 14:25:00. The straightforward way to do it is like this:

date_trunc('hour', val) + date_part('minute', val)::int / 5 * interval '5 min'

Since this is a performance-critical part of the query, I'm wondering whether this is the fastest solution, or whether there's some shortcut (compatible with Postgres 8.1+) that I've overlooked.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

缺⑴份安定 2024-12-09 21:21:44

我也想知道同样的事情。我找到了两种替代方法,但您建议的方法更快。

我非正式地对我们的一张较大的表进行了基准测试。我将查询限制为前 400 万行。我在两个查询之间交替进行,以避免由于数据库缓存而给一个查询带来不公平的优势。


遍历 epoch/unix 时间

SELECT to_timestamp(
    floor(EXTRACT(epoch FROM ht.time) / EXTRACT(epoch FROM interval '5 min'))
    * EXTRACT(epoch FROM interval '5 min')
) FROM huge_table AS ht LIMIT 4000000

(请注意,即使您使用不知道时区的数据类型,这也会生成 timestamptz

结果

  • 运行 1:39.368 秒
  • 运行 3:39.526 秒
  • 运行 5:39.883 秒

使用 date_trunc 和 date_part

SELECT 
    date_trunc('hour', ht.time) 
    + date_part('minute', ht.time)::int / 5 * interval '5 min'
FROM huge_table AS ht LIMIT 4000000

结果

  • 运行 2:34.189 秒
  • 运行 4:37.028 秒
  • 运行 6:32.397 秒

系统

  • 数据库版本:PostgreSQL 9.6。 2 在 x86_64-pc-linux-gnu 上,由 gcc 编译(Ubuntu 4.8.2-19ubuntu1)4.8.2, 64 位
  • 核心:Intel® Xeon®、E5-1650v2、六核
  • RAM:64 GB、DDR3 ECC RAM

结论

您的版本似乎更快。但对于我的具体用例来说还不够快。不必指定小时的优点使得纪元版本更加通用,并在客户端代码中产生更简单的参数化。它可以处理 2 小时 间隔以及 5 分钟 间隔,而无需提高 date_trunc 时间单位参数。最后,我希望这个时间单位参数改为时间间隔参数。

I was wondering the same thing. I found two alternative ways of doing this, but the one you suggested was faster.

I informally benchmarked against one of our larger tables. I limited the query to the first 4 million rows. I alternated between the two queries in order to avoid giving one a unfair advantage due to db caching.


Going through epoch/unix time

SELECT to_timestamp(
    floor(EXTRACT(epoch FROM ht.time) / EXTRACT(epoch FROM interval '5 min'))
    * EXTRACT(epoch FROM interval '5 min')
) FROM huge_table AS ht LIMIT 4000000

(Note this produces timestamptzeven if you used a time zone unaware datatype)

Results

  • Run 1: 39.368 seconds
  • Run 3: 39.526 seconds
  • Run 5: 39.883 seconds

Using date_trunc and date_part

SELECT 
    date_trunc('hour', ht.time) 
    + date_part('minute', ht.time)::int / 5 * interval '5 min'
FROM huge_table AS ht LIMIT 4000000

Results

  • Run 2: 34.189 seconds
  • Run 4: 37.028 seconds
  • Run 6: 32.397 seconds

System

  • DB version: PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
  • Cores: Intel® Xeon®, E5-1650v2, Hexa-Core
  • RAM: 64 GB, DDR3 ECC RAM

Conclusion

Your version seems to be faster. But not fast enough for my specific use case. The advantage of not having to specify the hour makes the epoch version more versatile and produces simpler parameterization in client side code. It handles 2 hour intervals just as well as 5 minute intervals without having to bump the date_trunc time unit argument up. On a end note, I wish this time unit argument was changed to a time interval argument instead.

半城柳色半声笛 2024-12-09 21:21:44

我认为没有更快的方法。

我认为您不应该担心表达式的性能。

执行 (SELECT, UPDATE, ...) 语句所涉及的所有其他内容很可能比日期/时间计算昂贵得多(例如检索行的 I/O)。

I don't think there is any quicker method.

And I don't think you should be worried about the performance of the expression.

Everything else that is involved in executing your (SELECT, UPDATE, ...) statement is most probably a lot more expensive (e.g. the I/O to retrieve rows) than that date/time calculation.

尸血腥色 2024-12-09 21:21:44

Postgres 14 起,< strong>date_bin()最简单且最快的

date_bin('5 min', val, '2000-1-1')

手册:

函数date_bin将输入时间戳“装箱”到指定的
与指定原点对齐的间隔(步幅)。

date_bin(步幅原点)

sourcetimestamptimestamp with time zone 类型的值表达式>。 (date 类型的值会自动转换为
timestamp。)stride 是间隔类型的值表达式。
返回值同样是 timestamptimestamp with time zone 类型,它标记了 bin 的开始位置。
源代码已放置。

提供匹配数据类型的“来源”,以避免由于忽略时区或假定错误的时区而导致意外结果。

我的示例看起来像日期文字,但也可用作有效的时间戳文字。如果时间部分缺失,则假定为“00:00”。

相关:

Since Postgres 14, date_bin() is simplest and fastest:

date_bin('5 min', val, '2000-1-1')

The manual:

The function date_bin “bins” the input timestamp into the specified
interval (the stride) aligned with a specified origin.

date_bin(stride, source, origin)

source is a value expression of type timestamp or timestamp with time zone. (Values of type date are cast automatically to
timestamp.) stride is a value expression of type interval.
The return value is likewise of type timestamp or timestamp with time zone, and it marks the beginning of the bin into which the
source is placed.

Provide an "origin" of matching data type to avoid unexpected results from the cast ignoring time zones or assuming the wrong one.

My example looks like a date literal but serves as valid timestamp literal, too. If the time component is missing '00:00' is assumed.

Related:

骄傲 2024-12-09 21:21:44

对于那些想知道的人的完整查询(基于@DNS问题):

假设您有订单并且您想按 5min 和 shop_id 的切片来计数它们:

SELECT date_trunc('hour', created_at) + date_part('minute', created_at)::int / 5 * interval '5 min' AS minute
      , shop_id, count(id) as orders_count
FROM orders
GROUP BY 1, shop_id
ORDER BY 1 ASC

Full query for those wondering (based on @DNS question):

Assuming you have orders and you want to count them by slices of 5min and shop_id:

SELECT date_trunc('hour', created_at) + date_part('minute', created_at)::int / 5 * interval '5 min' AS minute
      , shop_id, count(id) as orders_count
FROM orders
GROUP BY 1, shop_id
ORDER BY 1 ASC
黑寡妇 2024-12-09 21:21:44
SELECT to_timestamp(time_stamp-abs(time_stamp%(5*60)))
SELECT to_timestamp(time_stamp-abs(time_stamp%(5*60)))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文