数据的生命周期是多长?

发布于 2024-07-16 04:05:04 字数 353 浏览 10 评论 0原文

最近,我发现自己陷入了数据库混乱之中,管理层希望能够从数据库中删除数据,但仍然希望该数据出现在其他地方。 示例:他们想要删除产品whizbang 的所有实例,但他们仍然希望whizbang 出现在销售报告中。 (如果他们在前一个日期运行过一次)。

现在我可以添加一个字段,例如 is_deleted,它将跟踪该产品是否已被删除,从而仍然保留我的所有引用,但在一段时间后,我有可能容纳大量死数据。 (永远不会再次访问的数据)。 如何处理这不是我的问题。

我很想知道,根据您的经验,数据的平均寿命是多少? 也就是说,数据在被替换或删除之前平均存活或有效多长时间? 我知道这与您所存储的数据类型有关,但所有数据肯定都有某种生命周期吗?

Recently I’ve found myself in a database tangle where management wants the ability to remove data from the database, but still wants that data to appear in other places. Example: They want to remove all instances of the product whizbang, but they still want whizbang to appear in sales reports. (if they ran one for a previous date).

Now I can add a field, say is_deleted, that will track whether that product has been deleted and thus still keep all my references, but over a period of time, I have the potential of housing a lot of dead data. (data that is never accessed again). How to handle this is not my question.

I’m curious to find out, in your experience what is the average life span of data? That is, on average how long is data alive or good for before it gets either replaced or deleted? I understand that this is relative to the type of data you are housing, but certainly all data has some sort of life span?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

維他命╮ 2024-07-23 04:05:04

数据永远存在……或者通常应该如此。 一种常见的做法是为记录指定结束和/或开始日期。 因此,对于你的奇葩来说,你有一个开始日期(这样它在正式发布之前不会出现在销售报告中)和一个结束日期(这样它在生命周期结束后就会从报告中消失)。 使用正确的日期作为您的报告和应用程序的标准,除非您应该看到,否则您将不会看到精彩,并且数据仍然存在(理论上应该是无限的)。

正如 Koistya Navin 提到的,在某个时刻将数据移动到数据仓库也是一种选择,但这在很大程度上取决于“旧”数据有多大,以及需要保持数据可供访问多长时间。

Data lives forever...or often it should. One common practice is to have end and/or start dates for a record. So for your whizbang, you have a start date (so that it won't appear on sales reports before it's official launch), and an end date (so that it drops off of reports after it's been end-of-lifed). Using the proper dates as criteria for your reporting as well as your applications, you won't see the whizbang except for when you should, and the data still exists (which it should, theoretically infinitely).

As Koistya Navin mentions, moving data to a data warehouse at a certain point is also an option, but this depends in large part on how large your 'old' data is, and how long you need to keep it readily available for access.

胡大本事 2024-07-23 04:05:04

我们的许多客户将数据在线保存 2 年。 之后,它会被移至备份磁盘,但如果需要,可以将其联机。

考虑添加“到期”或“有效日期”列。 这将允许您将产品标记为过时,但如果满足时间范围,报告将返回该产品。

Many of our customers keep data online for 2 years. After that it's moved to backup disks, but it can be put online if needed.

Consider adding a column "expiration" or "effective date". This will allow you mark a product as obsolete, but reports will return that product if the time range is satisfied.

荒人说梦 2024-07-23 04:05:04

通常最好将此类数据移至单独的数据库(数据库仓库)并保持工作数据库干净。 在数据仓库中,您的数据可以保存多年,而不会影响您的应用程序。

参考:维基百科的数据仓库

Usually it's better to move such data into seporate database (database warehouse) and keep working database clean. At data warehouse your data can be kept for many years without impacting your application.

Reference: Data Warehouse at Wikipedia

白云不回头 2024-07-23 04:05:04

我总是遵循统治机构的要求。 例如,国税局希望您保留 7 年的历史记录,或者出于安全原因,我们保留 3 年的日志信息等。所以我想您可以做两件事,确定数据的生命周期,我想说 3 年就是足够了,然后您可以添加 is_deleted 标志和日期,这样您就可以标记一些数据以便尽早删除。

I've always gone by what is the ruling body looking for. Example the IRS wants you to keep 7 years of history or for security reasons we keep 3 years of log information, etc. So I guess you could do 2 things, determine what the life span of your data is I would say 3 years would be enough and then you could add the is_deleted flag along with a date that way you would be able to flag some data to delete sooner than later.

别低头,皇冠会掉 2024-07-23 04:05:04

是的,所有数据都有寿命。 是的,它与您拥有的数据类型有关。

有些数据的生命周期以秒为单位(例如身份验证令牌),而其他一些数据则具有虚拟的永恒性(不仅仅是其存储的介质和格式,例如所有权记录)。

您必须更加具体地说明您所设想的数据类型,或者在您自己的组织中对数据的通常寿命进行普查。

Yes, all data has a lifespan. And yes, it is relative to the type of data you have.

Some data has a lifespan measured in seconds (authentication tokens, for instance), some other data virtual eternity (more than the medium and formats it is stored into, like for instance ownership records).

You will have to either be more specific as to the type of data you are envisioning, or do a census in your own organization as to the usual lifespan of stuff.

遇见了你 2024-07-23 04:05:04

我们的特殊口味各不相同。 我们有一些数据(绝大多数)在 3 个月(硬产品限制)后就会过时,但可以在以后的任何日期恢复。

我们还有其他实际上不朽的数据。

在实践中,我们提供的大部分数据都是新鲜的,并且经常需要几周(最多一个月)的数据,然后才会偶尔使用。

Our particular flavor varies. We have some data (a vast majority) which goes stale after 3 months (hard product limit) but can be revived at any later date.

We have other data that is effectively immortal.

In practice, most of the data we serve up is fresh and frequently requested for a few weeks, at most a month, before falling to sporadic use.

行雁书 2024-07-23 04:05:04

“大量死数据”是多少?

由于处理能力和数据存储如此便宜,除非有充分的理由,否则我不会清除旧数据。 您还需要考虑法律影响。 大型(甚至小型)公司可能会对旧数据制定令人难以置信的长期保留政策,以便在被法官传唤时为自己节省数百万美元。

我会咨询您所在的任何法律部门,并了解数据需要存储多长时间。 这是最安全的选择。

另外,问问自己删除旧数据的好处是什么。 唯一的好处是数据库更整洁吗? 如果是这样,我就不会这么做。 您会看到性能提升 10 倍吗? 如果是这样,我会这么做。 但这确实是一个复杂的问题,我们很难获得为您提供良好建议所需的所有信息。

How much is "a lot of dead data"?

With processing power and data storage so cheap, I wouldn't purge old data unless there's a really good reason to. You also need to consider the legal implications. Large (and even small) companies may have incredibly long retention policies for old data, to save themselves millions down the road when they are subpoenaed for it by a judge.

I would check with whatever legal department you have and find out how long the data needs to be stored. That's the safest bet.

Also, ask yourself what the benefit of removing the old data is. Is the only benefit a tidier database? If so, I wouldn't do it. Are you going to see a 10X performance increase? If so, I'd do it. This really is a complex question though, and it's tough for us to have all the information required to give you good advice.

北方。的韩爷 2024-07-23 04:05:04

我有几个项目,客户需要所有历史数据(可以追溯到 19 年前)。 相当多的旧数据格式错误,导入新系统将是一场噩梦。 我们说服他们,他们不需要 10 年前的记录,但正如您所说,这都与您所存储的数据类型有关。

顺便说一句,数据存储现在非常便宜,如果它不影响应用程序的性能,我会将其保留在原处。

I have a few projects where the customer wants all the historical data (going back over 19 years). Quite a bit of the really old data is malformed and is going to be a nightmare to import into the new system. We convinced them that they won't need records going back any further than 10 years, but like you said it's all relative to the type of data you're housing.

On a side note, data storage is extremely cheap right now, and if it isn't affecting the performance of your application, I would just leave it where it is.

无人接听 2024-07-23 04:05:04

[...]但是所有数据肯定都有某种生命周期吗?

没有任何一种寿命是我们可以有意义地谈论的。 很多数据一旦创建或记录下来就毫无用处。 此类数据可以立即丢弃,不会产生任何影响。 另一方面,某些数据具有足够的价值,其寿命将比托管它的当前系统更长。 如果亚马逊要完全取代他们当前的基础设施,他们存储的客户历史记录仍然非常有价值。

正如你所说,这是相对的。 每种类型的数据都有自己的生命周期,与另一种数据的生命周期无关。 没有有意义的“数据平均寿命”。

[...] but certainly all data has some sort of life span?

Not any kind of life span we can talk about meaningfully. A lot of data is useless as soon as it's created or recorded. Such data could be discarded immediately with no effect. On the other hand, some data has enough value that it will outlive the current system that hosts it. If Amazon were to completely replace their current infrastructure, the customer histories they have stored would still be immensely valuable.

As you said, it's relative. Each type of data has its own life span that has no relation to another type of data's life span. There's no meaningful "average life span of data".

ぽ尐不点ル 2024-07-23 04:05:04

我有潜力容纳大量死数据。 (永远不会再次访问的数据)。

但当他们执行这些报告时,他们就会访问该数据。

在此之前,您需要以某种形式保存数据。 移到另一张桌子或像你提到的那样有一个开关。

I have the potential of housing a lot of dead data. (data that is never accessed again).

But they will when they perform those reports then they are accessing that data.

Until then you'll need to keep the data in some form. Move to another table or have a switch like you mentioned.

呃...冒着过度简化的风险...听起来使用 DateDeleted 而不是 bit 可以解决您的保留时间问题。

uh...at the risk of oversimplifying...it sounds like using DateDeleted instead of a bit would solve your how-long-to-keep issue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文