何时从 OLAP DB 中删除数据?

发布于 2024-08-18 20:38:56 字数 272 浏览 9 评论 0原文

我是 OLAP 新手。

我了解表结构和ETL过程。

我不明白何时应该从事实表中删除数据。 假设我正在创建一个事件报告应用程序。每个事件都有完成所需的持续时间、退出代码和读取的总字节数。有几个维度,例如时间和地点。

假设我每天为事实表准备 100 万条新记录,总共 1 GB。 如果我的 ETL 流程仅将数据添加到我的事实表中,那么它会无限增长。 我什么时候应该从事实表中删除数据?我应该将数据分成几个事实表(例如每月表)吗?

有什么经验法则吗?

谢谢

I am new to OLAP.

I understand the table structure and ETL process.

I don't understand when data is supposed to be deleted from the fact table.
Say I'm creating a reporting application for events. each event has the duration it took to complete, the exit code and total bytes read. There are several dimensions, e.g. time and location.

Say I have 1 million new records ready for my fact table daily, A total of 1 GB.
If my ETL process only adds data to my fact table it grows indefinitely.
When should I delete data from my fact table? Should I divide the data into several fact tables (e.g. monthly tables)?

Is there any rule-of-thumb?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

山有枢 2024-08-25 20:38:56

历史不应该被删除。

时期。

然而,有些人担心每天 1Gb 可能会每 3 年变成 1Tb。这实际上并不重要,但有些人仍然喜欢担心存储的价格。

您花在设计数据清除上的时间可能比您尝试节省的存储空间更昂贵。

[我发现 3 位 DBA 和 2 位程序员正在讨论如何节省几百 MB。我说我会把它们全部开到百思买,购买一个 500Mb 的磁盘驱动器,并将备用的零钱放在我车的地板上。 5个顾问仅仅走进房间讨论这个问题的价格就已经超过了他们试图“节省”的存储的价格。]

“我们可以总结一下吗?”的问题。完全取决于用户。有时你无法进行有用的总结,因此你也无法轻易删除任何内容。

有些人会说商业周期是 20 年或类似的时间,并且想要前 20 年(在 7Tb 上)的详细信息,然后是之前时间段的摘要。

History should never be deleted.

Period.

However, some people get nervous that 1Gb per day may turn into 1Tb every 3 years. This rarely actually matters, but some people still like to worry about the price of storage.

Your time spent designing a data purge can be more expensive than the storage you're attempting to save.

[I found 3 DBA's and 2 programmers debating ways to save a few hundred MB's. I said that I would drive them all down to Best Buy and purchase a 500Mb disk drive with the spare change on the floor of my car. The price of 5 consultants merely walking into the room to discuss it had already exceeded the price of the storage they were attempting to "save".]

The question of "can we summarize?" is entirely up to the users. Sometimes you can't usefully summarize, so you can't easily delete anything either.

Some folks will say that the business cycle is 20 years or something like that, and want details for the first 20 years (on 7Tb) and then summaries for time periods before that.

青芜 2024-08-25 20:38:56

绝不。您可以使用分区来处理旧记录并将分区移动到不同的驱动器。如果您按日期(月、季度、年)对事实表进行分区,那么出于所有实际目的,您大多数时候都会访问几个最新分区。

请记住,DW 属于业务用户而不是 IT 用户。不要限制(不要试图假设)业务分析师可能想问的问题——查询 DW。

Never. You can use partitioning to deal with old records and move partitions to different drives. If you partition fact tables by date (month quarter, year), then for all the practical purposes you mostly access few latest partitions most of the time.

Keep in mind that DW belongs to business users and not to IT. Do not limit (not try to assume) questions a business analyst may want to ask -- query the DW.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文