查看表的时间戳字段中有哪些唯一日期的最快方法?

发布于 2024-11-08 06:08:14 字数 268 浏览 9 评论 0原文

我有一个包含数十亿行的表。 “记录”字段上有每日分区,这是一个“无时区的时间戳”。我想知道表中当前有哪几天。我知道我可以做类似的事情:

SELECT recorded::date
FROM table
GROUP BY 1;

理想情况下应该有效,但是对此的解释相当高,并且表明需要相当长的时间才能工作......如果这是我能做的最好的事情,我可以接受(并且我们可以在数据进入时对其进行监视),但我想知道是否有更有效的方法来做到这一点,因为我每天都有分区?

I have a table with billions of rows. There are daily partitions on the "recorded" field, which is a "timestamp without time zone." I want to know which days are currently in the table. I know I could do something like:

SELECT recorded::date
FROM table
GROUP BY 1;

Which ideally should work, but the explain on that is rather high, and indicates it would take quite a while to work... if that's the best I can do, I can accept that (and we could keep tabs on the data as it goes in), but I was wondering if there might be a more efficient way to do this, given that I have daily partitioning?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

相思故 2024-11-15 06:08:14

您可以创建这样的索引:

create index your_index_name
on table (date_trunc('day', recorded))

在我的测试中,PostgreSQL 9.something 在添加索引之前使用了顺序扫描,在简单地索引“recorded”列之后使用了顺序扫描,并在使用 date_trunc() 为其建立索引之后使用了索引扫描。在没有索引的情况下选择一天的行需要 66 毫秒,使用普通索引需要 68 毫秒,使用 date_trunc() 使用索引需要 13 毫秒。

对于数十亿行,创建该索引预计需要几分钟时间。 (咳嗽)

You can create an index something like this:

create index your_index_name
on table (date_trunc('day', recorded))

In my test, PostgreSQL 9.something used a sequential scan before adding the index, a sequential scan after simply indexing the column "recorded", and an index scan after indexing it with date_trunc(). Selecting a single day's rows took 66ms without an index, 68ms with a plain index, and 13ms with an index using date_trunc().

With billions of rows, expect creating that index to take a few minutes. (cough)

太阳公公是暖光 2024-11-15 06:08:14

这里有一个非常相似的线程:

Slow select unique query on postgres

如果您知道最小/最大日期,则最好针对日期列表进行查询,而不是对整个表进行 seq 扫描。假设您有一个记录索引,看起来像这样的东西应该更快:

with days as (
select date_trunc('day', min(recorded))::date + k * interval '1 day' as day
from records,
     generate_series(0,
                    (select date_trunc('day', max(recorded))::date
                            - date_trunc('day', min(recorded)::date
                    from records
     )) as k
)
select day
from days
where exists (
      select 1
      from records
      where day <= recorded and recorded < day + interval '1 day'
      );

上面的查询可能需要做一些调整,但总体思路是:执行几千个子查询/会更快与对索引字段进行索引扫描相比,seq 扫描数十亿行并聚合它们以识别不同的日期。

There's a very similar thread here:

Slow select distinct query on postgres

If you know the min/max dates, you'll be better off querying against a list of dates than doing a seq scan over the whole table. Assuming you've an index on recorded, something that looks like this should be faster:

with days as (
select date_trunc('day', min(recorded))::date + k * interval '1 day' as day
from records,
     generate_series(0,
                    (select date_trunc('day', max(recorded))::date
                            - date_trunc('day', min(recorded)::date
                    from records
     )) as k
)
select day
from days
where exists (
      select 1
      from records
      where day <= recorded and recorded < day + interval '1 day'
      );

There might be a few tweaks to do to the above query, but the general idea is there: it'll be faster to do a few thousand subquery/index scans on an indexed field than it is to seq scan a few billions of rows and aggregate them in order to identify the distinct days.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文