最佳数据库完整性?

发布于 2024-10-14 20:41:18 字数 192 浏览 2 评论 0原文

假设您有一个任意但容量有限的关系数据库,并且该数据库保存不断生成新事件的在线系统的历史事件信息。数据库应保存事件信息以用于报告目的,但应清除早于 (N) 天的事件。鉴于您有足够的历史信息来推断事件生成率相对恒定并且不会随着时间的推移而增加或减少,您是否可以设计一个最佳的填充度百分比(60%、70%、80%...)对于这个数据库?如果是这样,那么你为什么选择这个百分比?

Lets say you have a relational database of arbitrary but finite capacity, and the database holds historical event information for an online system that is constantly generating new events. The database should hold event information for reporting purposes, but should purge events older than (N) number of days. Given that you have enough historical information to deduce that the rate of event generation is relatively constant and not increasing or decreasing over time, is there an optimal percentage (60%, 70%, 80%,...) fullness that you would design for this database? If so, then why did you choose that percentage?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

最初的梦 2024-10-21 20:41:18

这取决于。

好吧,为了稍微更有帮助,您说事件生成的速率是“相对恒定的”。您将需要足够的余量来处理该比率的不稳定情况,无论是统计上的还是紧急情况下的情况。您可以从历史记录中获得统计数据,但紧急情况只能猜测。

实际使用的空间量取决于其存储方式。与此相关的是,许多文件系统如果超过一定的填充度,就会变得非常慢。您可能希望将此百分比作为总利润的一部分。另外,请考虑诸如事件清除的粒度之类的事情:发生的频率是多少?

另外,请考虑容量耗尽的后果。你的系统会崩溃吗?无论如何,这个系统有多重要?您可以进行紧急清理以腾出额外的空间吗?相对于停机费用而言,额外容量的成本有多高?

It depends.

Well, to be slightly more helpful, you said that the rate of event generation is "relatively constant". You will need enough margin to deal with inconstancies in that rate, both statistical and emergency. Statistics you can get from your history, but emergencies can only be guessed at.

The actual amount of space used depends on how it is stored. On a related note, many filesystems become very slow if they exceed a certain degree of fullness; you will likely want include this percentage as part of your total margin. Also, consider things like the granularity of your event purge: how often does it happen?

Also, consider the consequences of running out of capacity. Will your system crash? How critical is the system, anyway? Can you do an emergency purge to make additional space? How expensive is extra capacity, relative to the expense of an outage?

朮生 2024-10-21 20:41:18

这与其说是数据库设计问题,不如说是操作问题。

您的夜间维护过程(或者无论如何老化过期数据)需要维护足够的可用空间来容纳任何合理的每日事件量。据推测,由于空间不足而导致失败是不可能的。但您只能通过了解每日交易量和方差来了解空间有多大。如果您的平均每日交易量为 5,000,000 个事件,方差为 +/- 4,000,000 个事件,并且您的标准差为 2,000,000 个,那么您将需要维护比相同情况下更多的可用空间平均每日交易量,但方差为 +/- 500,000,标准差为 50,000。在您获得一些统计数据之前,您只是猜测。

在 TB 级硬盘成本不到 200 美元的世界里,担心空间是不值得的。

更重要的是,从操作角度来看,恕我直言,数据和索引页上需要维护多少可用空间,以便最大限度地减少插入和更新操作的页面拆分以及由此带来的性能损失。再说一遍,您需要了解一些实际数据才能弄清楚这一点。

This isn't so much a database design issue as it is an operational issue.

You're nightly maintenance process (or however else you age out expired data) needs to maintain enough free space to accomodate any rational daily volume of events. Presumably failure due to running out of space isn't an option. But you only know how much space that is by knowing what your daily volume is and what the variance is. If your mean daily volume is 5,000,000 events with a variance of +/- 4,000,000 events and you've got a standard deviation of say 2,000,000, you're going to need to maintain a lot more free space than if you've got the same mean daily volume, but the variance is +/- 500,000 and a standard deviation of 50,000. Until you've got some statistics to inform you, you're just guessing.

In a world where terabyte hard drives cost less than $200, worrying about space isn't worthwhile.

Far more important, IMHO, from an operational perspective, is just how much free space to maintain on data and index pages, so as to minimize page splits on insert and update operations and the performance hit you take from that. And again, you need to know something about the actual data to figure that out.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文