难道超过几十个分区没有意义吗?

发布于 2024-09-14 18:37:58 字数 920 浏览 4 评论 0原文

我将时间序列模拟结果存储在 PostgreSQL 中。 数据库模式是这样的。

table SimulationInfo (
    simulation_id integer primary key,
    simulation_property1, 
    simulation_property2, 
    ....
)
table SimulationResult (  // The size of one row would be around 100 bytes
    simulation_id integer,
    res_date Date,
    res_value1,
    res_value2,
    ...
    res_value9,
    primary key (simulation_id, res_date)

我通常根据simulation_id和res_date查询数据。

我根据simulation_id的范围值将SimulationResult表分为200个子表。一个完全填满的子表有10~1500万行。目前约有70个子表已满,数据库大小超过100GB。总共 200 个子表很快就会被填满,当这种情况发生时,我需要添加更多的子表。

但我读了这个答案,它说超过几十个分区没有意义。所以我的问题如下。

  1. 超过几十个分区没有意义?为什么? 我检查了 200 个子表的执行计划,它只扫描相关的子表。所以我猜分区越多,每个子表越小一定越好。

  2. 如果分区数量应该受到限制,比如50个,那么一张表中有数十亿行没有问题吗?考虑到像我这样的模式,一张表可以有多大而不会有大问题?

I store time-series simulation results in PostgreSQL.
The db schema is like this.

table SimulationInfo (
    simulation_id integer primary key,
    simulation_property1, 
    simulation_property2, 
    ....
)
table SimulationResult (  // The size of one row would be around 100 bytes
    simulation_id integer,
    res_date Date,
    res_value1,
    res_value2,
    ...
    res_value9,
    primary key (simulation_id, res_date)

)

I usually query data based on simulation_id and res_date.

I partitioned the SimulationResult table into 200 sub-tables based on the range value of simulation_id. A fully filled sub table has 10 ~ 15 millions rows. Currently about 70 sub-tables are fully filled, and the database size is more than 100 gb. The total 200 sub tables would be filled soon, and when it happens, I need to add more sub tables.

But I read this answers, which says more than a few dozen partitions does not make sense. So my questions are like below.

  1. more than a few dozen partitions not make sense? why?
    I checked the execution plan on my 200 sub-tables, and it scan only the relevant sub-table. So i guessed more partitions with smaller each sub-table must be better.

  2. if number of partitions should be limited, like 50, then is it no problem to have billions rows in one table? How big one table can be without big problem given the schema like mine?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

百变从容 2024-09-21 18:37:58

是的,拥有那么多分区可能是不明智的。拥有分区的主要原因并不是为了使索引查询更快(在大多数情况下,它们并不是这样),而是为了提高必须基于可证明不成立的约束顺序扫描表的查询的性能对于某些分区;并改进维护操作(例如真空,或删除大批量的旧数据,这可以通过在某些设置中截断分区来实现,等等)。

也许您可以使用它的哈希值进行分区,而不是使用模拟 ID 的范围(这意味着您一直需要越来越多的分区)。这样,所有分区都以相似的速度增长,并且分区数量是固定的。

例如,太多分区的问题是系统不准备处理锁定太多对象。也许 200 个工作得很好,但是当你达到 1000 个或更多时,它就无法很好地扩展(根据你的描述,这听起来不太可能)。

每个分区拥有数十亿行没有问题。

话虽如此,显然每种情况都有一些特殊的担忧。这完全取决于您要运行的查询,以及您计划长期处理数据的方式(即您是否要保留所有数据、存档它、删除最旧的数据,...?)

It's probably unwise to have that many partitions, yes. The main reason to have partitions at all is not to make indexed queries faster (which they are not, for the most part), but to improve performance for queries that have to sequentially scan the table based on constraints that can be proved to not hold for some of the partitions; and to improve maintenance operations (like vacuum, or deleting large batches of old data which can be achieved by truncating a partition in certain setups, and such).

Maybe instead of using ranges of simulation_id (which means you need more and more partitions all the time), you could partition using a hash of it. That way all partitions grow at a similar rate, and there's a fixed number of partitions.

The problem with too many partitions is that the system is not prepared to deal with locking too many objects, for example. Maybe 200 work fine, but it won't scale well when you reach a thousand and beyond (which doesn't sound that unlikely given your description).

There's no problem with having billions of rows per partition.

All that said, there are obviously particular concerns that apply to each scenario. It all depends on the queries you're going to run, and what you plan to do with the data long-term (i.e. are you going to keep it all, archive it, delete the oldest, ...?)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文