钻石模式:如何(去)规范化?
假设我们有以下实体:
- 制作工作室
- 记者
- 摄影师
- 新闻镜头
在这个简单的世界中,制作工作室有许多记者和许多摄影师。每位记者都属于一个工作室。运营商也是如此。新闻片段由一名记者和一名操作员制作,两人都来自同一工作室。
这是我将此模型放入关系数据库的天真的方法:
CREATE TABLE production_studios(
id SERIAL PRIMARY KEY,
title TEXT NOT NULL
);
CREATE TABLE journalists(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
prodution_studio_id INTEGER NOT NULL REFERENCES production_studios
);
CREATE TABLE camera_operators(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
production_studio_id INTEGER NOT NULL REFERENCES production_studios
);
CREATE TABLE news_footages(
id SERIAL PRIMARY KEY,
description TEXT NOT NULL,
journalist_id INTEGER NOT NULL REFERENCES journalists,
camera_operator_id INTEGER NOT NULL REFERENCES camera_operators
);
此模式形成形状优美的菱形 ERD 和一些问题。
问题是新闻镜头可以将来自不同制作工作室的记者和摄像师联系在一起。我知道可以通过编写相应的约束来解决这个问题,但为了实验的目的,让我们假设我们正在做范式数据库设计的练习。
第一个问题是关于术语的:说这个模式是非规范化的是否正确?如果是,它打破了哪种范式?或者对于这种异常是否有更好的名称,例如记录间冗余、多路径关系等?
如何更改此架构以使所描述的异常不可能发生?
当然,我非常感谢参考解决这个特定问题的论文。
Let's suppose we have the following entities:
- Production Studio
- Journalist
- Camera Operator
- News Footage
In this simple world, production studio has many journalists and many camera operators. Each journalist belongs to exactly one studio. Same thing with operators. A news footage is produced by one journalist and one operator, where both come from the same studio.
Here's my naive approach to put this model into relational database:
CREATE TABLE production_studios(
id SERIAL PRIMARY KEY,
title TEXT NOT NULL
);
CREATE TABLE journalists(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
prodution_studio_id INTEGER NOT NULL REFERENCES production_studios
);
CREATE TABLE camera_operators(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
production_studio_id INTEGER NOT NULL REFERENCES production_studios
);
CREATE TABLE news_footages(
id SERIAL PRIMARY KEY,
description TEXT NOT NULL,
journalist_id INTEGER NOT NULL REFERENCES journalists,
camera_operator_id INTEGER NOT NULL REFERENCES camera_operators
);
This schema forms nicely shaped diamond ERD and a few questions.
The problem is that news footage can link together a journalist with a camera operator which come from different production studios. I understand that this can be cured by writing corresponding constraints, but for the sake of experiment let's pretend that we're doing exercise in Normal Form database design.
The first question is about terminology: is it correct to state that this schema is denormalized? If yes, which normal form does it break? Or is there any better name for this anomaly, like inter-record redundancy, multipath relationships, etc?
How this schema can be changed to make described anomaly impossible?
And of course I'd very much appreciate references to papers addressing this specific issue.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
天真的方法是让你的记者和摄像操作员依赖实体,依赖于他们工作的工作室。这意味着制作工作室外键成为其主键的一部分。然后,您的 news_footage 表有一个由 4 个组件组成的主键:
和两个外键:
Easy 的camera_operator、Production_studio_id。
或者不是。现在,您已经在 ER 模型中定义了这样一个概念:摄像师或记者的存在取决于他们工作的工作室。这并不能很好地反映真实的工作:在这个模型中,人们无法改变他们的雇主。
我们不要那样做。
在您的原始模型中,您将一个人与他们扮演的角色(记者或摄像师)混淆,并且您错过了一个实际上负责新闻素材制作的短暂实体: 【工作室专用】制作团队。
我的 ER 模型看起来像这样:
现在你拥有一个人们可以扮演不同角色的世界:同一个人在某些情况下可能是摄影师,而在另一些情况下可能是记者。人们可以更换雇主。工作室特定团队由一名记者和一名摄像师组成。在某些情况下,同一个人可能在团队中扮演这两个角色。最后,一段新闻片段是由一个且唯一一个工作室特定团队制作的。
这更好地反映了现实世界,而且更加灵活。
编辑添加示例查询:
查找为特定工作室工作的记者:
这将为您提供一组正在(或已经)以记者角色与工作室相关联的人员。但应该注意的是,在现实世界中,人们为雇主工作一段时间:为了正确建模,您需要一个开始/结束日期,并且需要使用现在的相对概念来限定查询。
The naive way would be to make your journalists and camera_operators dependent entities, dependent upon the studio for which they work. That means the production studio foreign key becomes part of their primary key. Your news_footage table then has a primary key consisting of 4 components:
and two foreign keys:
Easy.
Or Not. Now you have defined in your E-R model the notion that the very existence of a camera operator or a journalist is dependent upon the studio for which they work. This does not reflect the real work very well: in this model, people can't change their employer.
Let's not do that.
In your original model, you confusing a person with a _role they play (journalist or camera operator), and you're missing a somewhat transient entity that is actually responsible for the production of your news footage: the [studio-specific] production team.
My E-R model would look something like this:
Now you have a world in which people can work in different roles: the same person might be a camera operator in some contexts and a journalist in others. People can change employers. Studio-specific teams are composed, consisting of a journalist and a camera operator. In some contexts, the same person might play both roles on a team. And, finally, a piece of news footage is produced by one and only one studio-specific team.
This reflects the real world much better, and it is much more flexible.
Edited to add sample query:
To find the journalists working for a particular studio:
This would give you the set of people who are (or have) been associated with a studio in the role of journalist. One should note though, that in the real world, people work for employers for a period of time: to model it properly you need a start/end date and you need to qualify the query with a relative notion of now.