数据库设计:分区表与规范化表

发布于 2024-10-24 10:00:41 字数 1166 浏览 1 评论 0原文

我的新闻通讯应用程序有两个表:tblIssuetblIssueSubscriber

这是我的规范化设计:

 tblIssues (newsletter issues masterlist)
 --------------------
 IssueId int PK
 PublisherCode varchar(10)
 IssueDesc varchar(50)


 tblIssueSubscribers (newsletter subscribers)
 -----------------
 IssueId int FK
 EmailAddress varchar(100)

但 tblIssueSubscriber 预计每周保存数十万甚至数百万条记录,并且会被频繁访问,这就是我倾向于表分区的原因。我的设计是根据 PublisherCode 对 tblIssueSubscriber 进行分区(我们的主列表上有 8 个publisherCode)。

 tblIssues
 --------------
 IssueId int PK
 PublisherCode varchar(10)
 IssueDesc varchar(50)


 tblIssueSubscribers
 -----------------
 IssueId int FK
 PublisherCode varchar(10)
 EmailAddress varchar(100)

然后根据 PublisherCode 将其分区

 CREATE PARTITION FUNCTION [PartitionPublisher] (varchar(10)) AS RANGE RIGHT FOR VALUES  ('PUBLISHER1', 'PUBLISHER2', 'PUBLISHER3', 'PUBLISHER4', 'PUBLISHER5', 'PUBLISHER6', 'PUBLISHER7', 'PUBLISHER8');

我知道表分区会增加复杂性,所以我的问题是,

是否值得对 tblIssueSubscriber 进行分区,或者我应该坚持 标准化设计?

I have two tables: tblIssue and tblIssueSubscriber for my newsletter application.

This is my normalized design:

 tblIssues (newsletter issues masterlist)
 --------------------
 IssueId int PK
 PublisherCode varchar(10)
 IssueDesc varchar(50)


 tblIssueSubscribers (newsletter subscribers)
 -----------------
 IssueId int FK
 EmailAddress varchar(100)

but tblIssueSubscriber is expected to hold hundred thousands or even millions of record per week and it will be accessed frequently that's why Im leaning towards Table partitioning. My design is to partition the tblIssueSubscriber per PublisherCode (We have 8 publisherCode on our masterlist).

 tblIssues
 --------------
 IssueId int PK
 PublisherCode varchar(10)
 IssueDesc varchar(50)


 tblIssueSubscribers
 -----------------
 IssueId int FK
 PublisherCode varchar(10)
 EmailAddress varchar(100)

and then partitioned it per PublisherCode

 CREATE PARTITION FUNCTION [PartitionPublisher] (varchar(10)) AS RANGE RIGHT FOR VALUES  ('PUBLISHER1', 'PUBLISHER2', 'PUBLISHER3', 'PUBLISHER4', 'PUBLISHER5', 'PUBLISHER6', 'PUBLISHER7', 'PUBLISHER8');

I know that table partitioning adds complexity so my question is,

Is it worth partitioning tblIssueSubscriber, or should I stick to
the normalized design?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

不乱于心 2024-10-31 10:00:41

首先,我认为尺寸是一个转移注意力的因素。这不是一个非常有用的论点,因为所有大小都是相对的,并且有理由使用分区而不管大小。

性能只是部分原因。罗尼斯提出了一些很好的观点,但还不止于此。

对表进行分区有两个原因。一是性能,一是维护。

我们先从维护开始。

一般来说,在数据库中删除是一件“坏”事情。假设您错误地插入了 100 万行,然后删除了 100 万行。这些删除中的每一个都会记录生成 UNDO 和 REDO 记录,这会浪费空间并花费时间,不仅在删除时进行,而且在“播放”以进行时间点恢复时也需要时间。那么还有什么比删除更好呢?截断(或删除)。当您所描述的表不断增长时,在某些时候您希望删除旧记录。这就是为什么我说大小无关紧要 - 如果您想在该表中保留一年,则需要删除超过 12 个月的记录 - 无论大小是多少。添加记录一年后,您可能会拥有一个 300MB 的表或 500GB 的表 - 无论您需要/想要开始删除。因此,您始终可以删除带有 insert_dt < 的行。 sysdate - 365。或者您可以删除或截断该月/日的分区。未记录的事务将占用较少的资源。

还有其他维护好处,例如单独备份分区或重建索引或移动到新表空间等。不确定您使用的是什么 RDBMS,但大多数情况下您可以通过分区交换加载数据。这使您可以在所有数据加载并准备就绪之前对最终表进行任何更改。

就性能而言...

这里的关键是,任何在 where 子句中不包含分区键的查询很可能比分区之前的性能更差。这不是 GO_FASTER = TRUE 类型的设置。我见过人们实施分区并破坏他们的系统。 Ronnis 的帖子是单个分区表中性能指南的基础知识。如果您有多个表在同一键上分区,某些 RDBMS 可以并行化它们之间的连接。

First I think Size is a Red Herring. It's not a very useful argument since all size is relative and there are reasons to use partition irrespective of size.

Performance is only part of the reason. Ronnis makes some good points but it doesn't stop there.

There are two reasons to partition a table. One is performance, one is maintenance.

Let's start with maintenance.

In general DELETE is a 'bad' thing to do in a database. Say you mistakenly insert 1 million rows and then delete 1 million rows. Each of those deletes is logged generating UNDO and REDO records, which waste space and take time not only to make while deleting but again when 'played' for a point-in-time recovery. So what's better than delete? Truncate (or drop). When you have tables as you describe that are constantly growing, at some point you'd like to get rid of old records. This is why I say size is irrelevant - if you want to keep a year in that table, you'll need to remove records that are more than 12 months old - NO MATTER WHAT THAT SIZE IS. You could have a 300MB table or a 500GB table after 1 year of adding records - regardless you'll need/want to start deleting. So you can always just delete the rows with insert_dt < sysdate - 365. Or you could just drop or truncate that month/day's partition. A not logged transaction that will be less resource intensive.

There are other maintenance benefits like individually backing up partition or rebuilding indexes or moving to new tablespaces etc. Not sure what RDBMS you're using but you can load data via partition swaps in most. This allows you to make no changes to your final tables until all of the data is loaded and ready to go.

As far as performance goes...

The key here is that any query that doesn't include the partition key in the where clause will most likely perform worse than it did before partitioning. This isn't a GO_FASTER = TRUE type of setting. I've seen people implement partitioning and crush their systems. Ronnis' post is the basics of performance guideline in a single partitioned tables. If you have more than one table partitioned on the same key, some RDBMS's can parallelize the joins between them.

溺深海 2024-10-31 10:00:41

查询模式将决定您是否会从分区中受益。

如果您的应用程序主要涉及单行查询(通常是主键或索引访问),则您不会看到对表进行分区带来的性能提升。

如果您的应用程序主要是处理所有发布者方面的数据,那么您将在执行表扫描时通过消除表的较大部分来受益于分区。

The query patterns will determine whether you will benefit from partitioning.

If your application is mostly about single row queries (typically primary key or indexed access), you will not see a performance gain from partitioning the table.

If your application is mostly about processing all the data publisher-wise, then you would benefit from partitioning by eliminating larger parts of the table when performing table scans.

谈场末日恋爱 2024-10-31 10:00:41

这实际上取决于数据库文件将变得有多大、其中将有多少记录以及您使用的机器。粗略计算一下你认为它会变成多大。

粗略地说,数据库文件将增长到 300 MB?

那没什么...我个人不会对其进行分区。我认识一些使用分区的数据库客户,当他们预计数据库将增长到超过 500 GB 并且最终可能达到 4 TB 时,他们就开始分区。在这种情况下,是的分区。但我怀疑你不会去任何接近的地方。

另外,您以后随时可以分区,不是吗?

我推荐一台 64 位机器,运行 Linux 或 Windows Server 2008/Win7。更多的记忆总是好的。

It really depends on how large that database file is going to become and how many records you are going to have in there and what machine you are using. Do a rough calculation of how large you think it will become.

Roughly, lets say that database file will grow to 300 MB?

That is nothing... I would personally not partition it. I know some of our database clients who use partitioning, and they started partitioning when they expected the database to grow beyond 500 GB and that it ultimately may reach 4 TB. In that case, yes partition. But I suspect you are not going to go anywhere near that.

Plus, you can always partition later, no?

I would recommend a 64-bit machine, running Linux or Windows server 2008/Win7. And more memory is always good.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文