查看表的时间戳字段中有哪些唯一日期的最快方法？

发布于 2024-11-08 06:08:14 字数 268 浏览 5 评论 0原文

我有一个包含数十亿行的表。 “记录”字段上有每日分区，这是一个“无时区的时间戳”。我想知道表中当前有哪几天。我知道我可以做类似的事情：

SELECT recorded::date
FROM table
GROUP BY 1;

理想情况下应该有效，但是对此的解释相当高，并且表明需要相当长的时间才能工作......如果这是我能做的最好的事情，我可以接受（并且我们可以在数据进入时对其进行监视），但我想知道是否有更有效的方法来做到这一点，因为我每天都有分区？

原文

I have a table with billions of rows. There are daily partitions on the "recorded" field, which is a "timestamp without time zone." I want to know which days are currently in the table. I know I could do something like:

SELECT recorded::date
FROM table
GROUP BY 1;

Which ideally should work, but the explain on that is rather high, and indicates it would take quite a while to work... if that's the best I can do, I can accept that (and we could keep tabs on the data as it goes in), but I was wondering if there might be a more efficient way to do this, given that I have daily partitioning?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

相思故 2024-11-15 06:08:14

您可以创建这样的索引：

create index your_index_name
on table (date_trunc('day', recorded))

在我的测试中，PostgreSQL 9.something 在添加索引之前使用了顺序扫描，在简单地索引“recorded”列之后使用了顺序扫描，并在使用 date_trunc() 为其建立索引之后使用了索引扫描。在没有索引的情况下选择一天的行需要 66 毫秒，使用普通索引需要 68 毫秒，使用 date_trunc() 使用索引需要 13 毫秒。

对于数十亿行，创建该索引预计需要几分钟时间。（咳嗽）

You can create an index something like this:

create index your_index_name
on table (date_trunc('day', recorded))

In my test, PostgreSQL 9.something used a sequential scan before adding the index, a sequential scan after simply indexing the column "recorded", and an index scan after indexing it with date_trunc(). Selecting a single day's rows took 66ms without an index, 68ms with a plain index, and 13ms with an index using date_trunc().

With billions of rows, expect creating that index to take a few minutes. (cough)

回复收藏 0 原文

太阳公公是暖光 2024-11-15 06:08:14

这里有一个非常相似的线程：

Slow select unique query on postgres

如果您知道最小/最大日期，则最好针对日期列表进行查询，而不是对整个表进行 seq 扫描。假设您有一个记录索引，看起来像这样的东西应该更快：

with days as (
select date_trunc('day', min(recorded))::date + k * interval '1 day' as day
from records,
     generate_series(0,
                    (select date_trunc('day', max(recorded))::date
                            - date_trunc('day', min(recorded)::date
                    from records
     )) as k
)
select day
from days
where exists (
      select 1
      from records
      where day <= recorded and recorded < day + interval '1 day'
      );

上面的查询可能需要做一些调整，但总体思路是：执行几千个子查询/会更快与对索引字段进行索引扫描相比，seq 扫描数十亿行并聚合它们以识别不同的日期。

There's a very similar thread here:

Slow select distinct query on postgres

If you know the min/max dates, you'll be better off querying against a list of dates than doing a seq scan over the whole table. Assuming you've an index on recorded, something that looks like this should be faster:

with days as (
select date_trunc('day', min(recorded))::date + k * interval '1 day' as day
from records,
     generate_series(0,
                    (select date_trunc('day', max(recorded))::date
                            - date_trunc('day', min(recorded)::date
                    from records
     )) as k
)
select day
from days
where exists (
      select 1
      from records
      where day <= recorded and recorded < day + interval '1 day'
      );

There might be a few tweaks to do to the above query, but the general idea is there: it'll be faster to do a few thousand subquery/index scans on an indexed field than it is to seq scan a few billions of rows and aggregate them in order to identify the distinct days.

回复收藏 0 原文

~没有更多了~