实现软删除,对性能和代码的影响最小
我想实现像 StackOverflow 上的软删除功能,其中项目并未真正删除,而只是隐藏。我正在使用 SQL 数据库。这里有 3 个选项:
添加
is_deleted
布尔字段。- 优点:简单。
- 缺点:没有日期记录。强制我在每个查询中添加
is_deleted = 0
。
添加
deleted_date
日期字段。如果未删除,则设置为NULL
。- 优点:有日期。
- 缺点:仍然让我的查询变得混乱。
对于以上两种情况,
- 它也会影响性能,因为存在所有这些无用的行。它们仍然需要在索引中维护。此外,在获取未删除(大多数)行时,
deleted
列上的索引也无济于事。需要全表扫描。
另一种选择是创建一个单独的表来保存已删除的项目:
- 优点:提高查询未删除行时的性能。无需向未删除行的查询添加条件。索引维护更容易。
- 缺点: 复杂:删除和恢复都需要数据迁移。需要新桌子。引用完整性更难处理。
有更好的选择吗?
There are some similar questions on the topic, but they are not really helping me.
I want to implement a soft delete feature like on StackOverflow, where items are not really deleted, but just hidden. I am using a SQL database. Here are 3 options:
Add a
is_deleted
boolean field.- Advantages: Simple.
- Disadvantages: No date record. Forces me to add a
is_deleted = 0
in every query.
Add a
deleted_date
date field. This is set toNULL
if it's not deleted.- Advantages: Has date.
- Disadvantages: Still cluttering my queries.
For both of the above
- It will also impact performance because there are all these useless rows. They still have to be maintained in indexes. Also an index on the
deleted
column won't help when fetching non-deleted (the majority) of the rows. Full table scan is needed.
Another option is to create a separate table to hold deleted items:
- Advantages: Improved performance when querying non-deleted rows. No need to add conditions to my queries on non-deleted rows. Easier on index maintenance.
- Disadvantages: Complexity: Requires data migration for both deletion and undeletion. Need for new tables. Referential integrity is harder to handle.
Is there a better option?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在我看来,在考虑扩展和最终表/数据库大小时,最好的方法是第三个选择 - 用于删除项目的单独表。这样的表最终可以移动到不同的数据库以支持扩展。
我相信您已经列出了三个最常见的选项。正如您所看到的,每种方法都有优点和缺点。就我个人而言,我喜欢以更长远的眼光看待事物。
In my opinion, the best way forward, when thinking about scaling and eventual table/database sizes is your third option - a separate table for deleted items. Such a table can eventually be moved to a different database to support scaling.
I believe you have listed the three most common options. As you have seen, each has advantages and disadvantages. Personally, I like taking the longer view on things.
我个人会根据您预计用户想要访问已删除数据或“恢复”已删除数据的频率来确定我的答案。
如果经常发生,那么我会使用“Date_Deleted”字段,并将计算出的“IsDeleted”放入代码中的 poco 中。
如果从来没有(或几乎从来没有),那么历史表或已删除的表对于您所解释的好处很有好处。
我个人几乎从不使用已删除的表(并选择 isDeleted 或 date_deleted),因为存在引用完整性的潜在风险。你有 A -> B 并且您从 B 数据库中删除了该记录...由于您的设计选择,您现在必须管理引用完整性。
I personally would base my answer off of how often you anticipate your users wanting to access that deleted data or "restore" that deleted data.
If it's often, then I would go with a "Date_Deleted" field and put a calculated "IsDeleted" in my poco in the code.
If it's never (or almost never) then a history table or deleted table is good for the benefits you explained.
I personally almost never use deleted tables (and opt for isDeleted or date_deleted) because of the potential risk to referencial integrity. You have A -> B and you remove the record from B database... You now have to manage referencial integrity because of your design choice.
如果键是数字,我通过否定该键来处理“软删除”。 (当然,不适用于身份密钥)。你根本不需要改变你的代码,并且可以通过乘以-1轻松恢复记录。
这是另一种思考的方法...如果密钥是字母数字,您可以通过在前面添加唯一的“标记”字符来执行类似的操作。由于删除的记录都将以该标记开头,因此最终将在索引中自行消失。
If the key is numeric, I handle a "soft-delete" by negating the key. (Of course, won't work for identity keys). You don't need to change your code at all, and can easily restore the record by multiplying by -1.
Just another approach to give some thought to... If the key is alphanumeric, you can do something similar by prepending a unique "marker" characters. Since deleted records will all begin with this marker, then will end up off by themselves in the index.
我认为您对选项的分析很好,但您错过了我在下面列出的一些相关要点。正如您在前两个选项中所建议的那样,我见过的几乎所有实现都在行上使用某种已删除或版本控制字段。
使用一张带有已删除标志的表:
如果您的索引都首先包含已删除的标志字段,并且您的查询主要包含 where isdeleted=false 类型结构,那么它确实可以解决您的性能问题,并且索引非常有效地排除已删除的行。类似的逻辑可用于删除日期选项。
使用两个表
一般来说,您需要对报告进行大量更改,因为某些报告可能引用已删除的数据(例如旧的销售数据可能引用已删除的销售类别)。人们可以通过创建一个视图来克服这个问题,该视图是两个表的并集,用于读取和仅写入活动记录表。
I think your analysis of the options is good but you missed a few relevant points which I list below. Almost all implementations that I have seen use some sort of deleted or versioning field on the row as you suggest in your first two options.
Using one table with deleted flag:
If your indexes all contain the deleted flag field first and your query's mostly contain a where isdeleted=false type structure then it DOES solve you performance problems and the indexes very efficiently exclude the deleted rows. Similar logic could be used for the deleted date option.
Using two Tables
In general you need to make massive changes to reports because some reports may refer to deleted data (like old sales figures might refer to a deleted sales category). One can overcome this by creating a view which is a union of the two tables to read from and only write to the active records table.
假设我们创建一个名为
dead
的字段来标记已删除的行。我们可以创建一个索引,其中字段dead
为 false。这样,我们只使用提示使用索引来搜索未删除的行。
Let's suppose we create a field called
dead
to mark deleted rows. We can create a index where fielddead
is false.In this way, we only search non-deleted rows using the hint use index.