时态数据库建模和标准化
时态数据库的日期应该存储在一张还是两张表中?如果这不违反规范化?
PERSON1 DATE11 DATE21 INFO11 INFO21 DEPRECATED
PERSON2 DATE21 DATE22 INFO21 INFO22 CURRENT
PERSON1 DATE31 DATE32 INFO31 INFO32 CURRENT
DATE1 和 DATE2 列指示 INFO1 和 INFO2 在 DATE1 和 DATE2 之间的时间段内为 true。如果日期<今天,事实已被弃用,不应再在用户界面中显示,但不应出于历史目的而删除它们。例如,INFO11 和 INFO21 现已弃用。
我应该拆分这张桌子吗?我应该将状态(已弃用或当前)存储在表中吗?
为了进一步澄清问题,Deprecated是业务使用的术语,如果您更喜欢“不是当前的”,问题不是语义的,也不是关于sql查询的,我只是想知道哪种设计违反或最适合规范化规则(我知道标准化并不总是可行的方法,这也不是我的问题)。
Should dates for a temporal database stored in one or 2 tables ? If one doesn't this violate normalisation ?
PERSON1 DATE11 DATE21 INFO11 INFO21 DEPRECATED
PERSON2 DATE21 DATE22 INFO21 INFO22 CURRENT
PERSON1 DATE31 DATE32 INFO31 INFO32 CURRENT
DATE1 and DATE2 Columns indicate that INFO1 and INFO2 are true for the period between DATE1 and DATE2. If DATE < TODAY, the facts are deprecated and shouldn't show any more in the user interface but they shouldn't be deleted for historical purpose. For example INFO11 and INFO21 are now deprecated.
Should I split this table ? Should I store the state (deprecated or current) in the table ?
To clarify the question further more, Deprecated is the term used by the Business, if you prefer "not current", the problem is not semantic, it's not about sql queries either, I just want to know which design violates or best suits Normalisation rules (I know normalisation is not always the way to go, that is not my question either).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
“我想知道哪种设计违反了规范化规则”
取决于您想要遵循哪一组规范化规则。
第一个也是最有可能违反正常形式的,在 Date 的书违反了first NF 是保存“当前”信息的行中的结束日期(对未来信息的可能性进行抽象):如果您将该属性设为可为空,则违反了 1NF。
违反 BCNF 显然可能发生是由于您选择键的结果(因为在非时间数据库设计中也是如此 - 时间方面在这里没有区别)。关于“键的选择”:如果您使用单独的开始日期和结束日期(SQL 类型让您别无选择),那么很可能您应该声明两个键:一个包含开始日期,另一个包含结束日期。
另一个设计问题是多个数据列。这个问题在“时态数据和关系模型”中进行了详细讨论:如果 INFO1 和 INFO2 可以彼此独立地更改,那么最好分解表以仅保存一个属性,以避免“爆炸”如果每次行中的一个属性发生更改时都必须创建一个新的完整行,则可能会出现这种情况。在这种情况下,您给出的设计违反了第六范式,如“时态数据和关系模型”中定义的(该范式)。
"I want to know which design violates Normalisation rules"
Depends on which set of normalization rules you want to go by.
The first and most likely violation of normal forms, and in Date's book it is a violation of first NF, is your end-dates in the rows that hold "current" information (making abstraction of the possibility of future-dated information): you violate 1NF if you make that attribute nullable.
Violations of BCNF may obviously occur as a consequence of your choice of keys (as it is the case in nontemporal database designs too - the temporal aspect makes no difference here). Wrt "choice of keys": if you use separate start- and end-dates (and SQL kind of leaves you no other choice), then most likely you should declare TWO keys: one that includes the start date, and one that includes the end-date.
Another design issue is the multiple data columns. This issue is discussed quite at large in "Temporal Data and the Relational Model" : if INFO1 and INFO2 can change independently of one another, it might be better to decompose your tables to hold just one attribute, in order to avoid an "explosion of rows count" that might otherwise occur if you have to create a new complete row every time one single attribute in the row changes. In that case, your design as you gave it constitutes a violation of SIXTH normal form, as (that normal form is) defined in "Temporal Data and the Relational Model".
规范化是一个关系数据库概念 - 它不适用于时态数据库。这并不是说您不能在关系数据库中存储时态数据。你绝对可以。
但是,如果您要进行时态数据库设计,那么应用时态标准化的概念而不是关系标准化。
Normalization is a Relational database concept - it does not apply as well to temporal databases. That's not to say that you cannot store temporal data in a relational database. You definitely can.
But if you are going with Temporal Database Design, then the concepts of Temporal Normalization apply rather than Relational normalization.
您没有指出日期的含义。它们是指(a)所陈述的事实在现实生活中为真实的时期,还是(b)数据库持有者相信所陈述的事实为真实的时期?如果(b),那么我永远不会这样做。更新完成后立即将更新的行移至存档表/日志。如果 (a),则以下陈述是有问题的:
“事实已被弃用,不应再在用户界面中显示”
如果事实不再“需要在用户界面中显示”,则它不会也不需要再存在于数据库中。保留这些事实只会带来一件事:降低其他所有人的总体表现。
如果您确实需要这些历史事实陈述来满足您的要求,那么很可能您所谓的“已弃用的事实”仍然与业务非常相关,因此根本没有“弃用”。假设由于这个原因,您的数据库中几乎没有“真正弃用”的事实,那么您的设计是好的。只需定期从操作数据库中删除“真正弃用的事实”的数量即可。
(PS) 说你的设计很好,并不意味着你不会遇到任何问题。 SQL 非常不适合优雅地处理此类信息。 “时态数据和关系模型”是对该主题的精彩论述。另一本书,斯诺德格拉斯的书,也经常受到赞扬,尽管不是我本人。这本书有点像一本食谱,其中包含在 SQL 中处理这些问题的秘诀,正如下面关于这本书的 SO 对话所证明的那样:(
问)“我为什么要读那本书?”
(A)“因为你要求的触发器在第135页。”
You have not indicated the meaning of the dates. Do they refer to (a) the period when the stated fact was true in real-life, or (b) to the period when the stated fact was believed to be true by the holder of the database ? If (b), then I would never do it this way. Move the updated line to an archive table/log immediately when the update is done. If (a), then the following statement is questionable :
"the facts are deprecated and shouldn't show any more in the user interface"
If a fact doesn't "need to show up in the user interface" anymore, then it doesn't need to be in the database anymore either. Keeping such facts there achieves only one thing : deteriorate general performance for all the rest.
If you really need these historical statements of fact to suit your requirements, then chances are that your so-called "deprecated facts" are still very much relevant to the business, and therefore not "deprecated" at all. Assumming that for this reason, there are very little "genuinely deprecated" facts in your database, your design is good. Just keep the number of "genuinely deprecated facts" small by periodically removing them from the operational database.
(PS) To say that your design is good, doesn't mean you won't run into any problems. SQL is extremely ill-suited to handle this kind of information elegantly. "Temporal Data and the Relational Model" is an excellent treatment of the subject. Another book, the one from Snodgrass, is often praised too, though not by me. That one is something of a cookbook with recipes for dealing with these problems in SQL, as proven by the following conversation on SO about this book :
(Q) "Why would I read that ?"
(A) "Because the trigger you asked for is on page 135."