我们是否在表中记录重复的数据或继续连接所有未来的读取?
说一个用户活动表。至少您需要 user_id、datetime、activity_id、object_id 等内容。我可以与对象表结合起来找到对象所有者。我可以加入活动表来查找活动组、类型等。
或者
我也可以在运行时将这些详细信息复制到活动表中。这仅意味着重复数据,但将来每当我需要阅读时,我不必继续加入。我现在将所有数据都存储在活动表中,以获取所有可能的数据。
如果我复制数据,我是使用 FK 还是独立复制?
Say a user activity table. At a minimum you would need stuff like user_id, datetime, activity_id, object_id, etc. I can join up with object table to find the object owner. I can join with activity table to find the activity group, type, etc.
OR
I can copy those details at runtime into the activity table also. This only means duplicate data but in the future whenever i need to read i do not have to keep joining. I have all my data now in the activity table for all possible pieces of data.
If i do duplicate data, do i duplicate it with FK or standalone?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当然,数据库应该规范化,因为它们性能更好,SQL 是为规范化结构以及连接它们而设计的。没有理由“去常态化”。
然而,有一种情况有特殊需要。历史表或日志文件。需要考虑的问题是,当你查询这张表时,是否需要父对象当前的数据;或者这是当时发生的情况的真实日志,并且您需要创建日志行时的最新数据。
在后一种情况下,由于存储此类数据有明确的目的,因此它实际上不是重复的,也不是标准化的逆转;因此,将数据分类为重复或“非规范化”是不正确的。这只是审计数据,必须保留。一般来说,日志文件(这就是它们的本质;为了方便起见,我们将它们存储在数据库中)不是数据库的一部分,并且数据库规则不适用。
但始终对它们实施索引,并定期清除它们;否则就会变成怪物。
日志文件的替代方案是历史表。这是根据需要在表的基础上实现的,而不是操作的日志文件。对于必须保留变更审核的每个表,实施该表的“副本”。这存储已更改的行的前图像。 DDL 与源表完全相同,但增加了一项:PK 添加了 TIMESTAMP 或 DATETIME 列。同样,这是根据明确的要求,将这些表分类为重复或“非规范化”是不正确的。
历史表优于日志文件,因为它们仅存储已更改的行;而日志文件则重复存储不变的数据。由此产生的存储数据量差异很大。
▶历史表格示例◀
Certainly Databases should be Normalised, because they perform better, SQL was designed for Normalised structures, and for joining them. There is no reason to "de-normalise".
However, there is one case which has an exceptional need. History tables or Log fies. The question to be considered is, when you query this table, do you need the current data from the parent object; or is this a true log of what happened at the time, and you need the data that was current at the time the log row was created.
In the latter case, since there is an explicit purpose in storing such data, it is not actually a duplicate, it is not a reversal of the Normalisation; therefore, it is incorrect to classify the data as duplicate or "denormalised". It is just Audit data, which must be retained. Generally Log files (that's what they really are; we store them in the database for convenience) are not part of the database, and database rules do not apply.
But always implement indices on them, and purge them regularly; otherwise the turn into monsters.
The alternative to Log files is History tables. Rather than a Log file for actions, this is implemented on a table basis, as needed. For each table for which an Audit of changes must be retained, a "copy" of the table is implemented. This stores the before-image of rows that have been changed. The DDL is exactly the same as the source table, plus one item: the PK has a TIMESTAMP or DATETIME column added. There too, it is by explicit requirement, and it would be incorrect to classify these tables as duplicate or "de-normalised".
History tables are superior to Log files, because they store only the rows that have been changed; whereas Log files store unchanging data repeatedly. The resulting volumes of data stored are quite different.
▶Example of History Tables◀
您所说的称为 数据库规范化 (或者在您的情况下 反规范化)。
由于关系数据库几乎是为执行联接而设计的,因此通常没有什么理由去规范化并将重复数据存储在一个大的宽表中(存在一些边缘情况和权衡,但在您使用的示例中,它们没有确实适用,所以我建议坚持使用您列出的 3 表方法(活动、用户活动、对象)。
What you are talking about is called database normalization (or in your case de-normalization).
Since a relational database is pretty much designed to perform joins, there is usually VERY little reason to de-normalize and store duplicate data in one big wide table (there are some edge cases and tradeoffs, but in the example you used they don't really apply, so I recommend sticking with 3-table approach you listed (activity, user_activity, object).