关系数据库设计(规范化多对多映射)
学生可以在许多不同的班级。每个班级都有许多不同的学生。一个学生可以有多个成绩,但每个班级只能有 1 个成绩——每一对(学生、班级)都有一个成绩。
我应该像这样布局(MySQL)数据库:
students (student_id, student_name)
classes (class_id, class_name)
students_classes (student_class_id, student_id, class_id)
grades (student_class_id, grade)
还是这样:
students (student_id, student_name)
classes (class_id, class_name)
grades (student_id, class_id, grade)
或者其他什么?
现在选项 2 更简单,但将来我可能需要与每个 (student_id, class_id) 对相关的其他统计信息,在这种情况下,选项 1 可能更好,尽管它过于复杂。
Students can be in many different classes. Each class has many different student. A student can have many grades but only 1 grade per class--every (student, class) pair has one grade.
Should I layout the (MySQL) database like:
students (student_id, student_name)
classes (class_id, class_name)
students_classes (student_class_id, student_id, class_id)
grades (student_class_id, grade)
Or this:
students (student_id, student_name)
classes (class_id, class_name)
grades (student_id, class_id, grade)
Or something else?
Option 2 is simpler now, but in the future I might need other statistics related to each (student_id, class_id) pair, in which case option 1 may be better, though it is overly complicated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
选项3)
除非年级有可能成为一个成熟的实体。在这种情况下:
选项 4)
Option 3)
Unless grade has the possibility of becoming a full-fledged entity. In which case:
Option 4)
选择选项 2。成绩的复合主键没有任何问题,它捕获数据模型中所需的信息。
在选项 1 中,students_classes 除了具有代理键之外没有其他任何用途。
2NF:成绩(非关键)仅取决于学生/班级(关键)
3NF:不适用,您在非关键依赖项上没有非关键
BCNF:不适用,您只有一个候选键
Go for option 2. There is nothing wrong with a composite primary key for grades and it capture the information you need in your data model.
In option 1, students_classes serves no purpose except to have a surrogate key.
2NF: grade (non-key) depends solely on student/class (key)
3NF: does not apply, you have no non-key on non-key dependencies
BCNF: does not apply, you have one candidate key only
选项 2 是正确的,但它应该被称为
student_class
,反映其 n::n 函数或作为实体的 Enrollment。(student_id, class_id)
是 PK。Grade(如您所示)是对该复合键(而不是一个或另一个元素)的 1::1 依赖关系,而不是其他元素,因此它是
student_class
的一个属性。因此
student_class
属于3NF。如果人们没有像选项 1 那样盲目地将
Id
列粘贴在所有移动的内容上,他们将能够更好地理解数据,从而更好地标准化。 (选项 1 中的Id
列作为起点)干扰了您的直觉,即(student_id, class_id)
是标识符;不需要额外的Id
列及其附加索引。然后,当您开始评估等级
时,它对 PK 的依赖性是显而易见的。Id
列会损害数据库的关系功能。如果您在层次结构中有三个表,并且需要从顶部和底部的表中获取一些列,则您将被迫遍历中间的表。如果您有关系标识符,而不是 Id 列,则您可以从底部表到达顶部表,而必须读取中间表。“规范化”数据库中有如此多的连接,这一说法只对了一半。完整的事实是,由于数据库没有正确规范化,是的,您被迫进行比必要的更多的连接。在真正规范化的数据库中,使用相同的表,代码需要的连接要少得多。
这是大学数据模型来自最近的作业,简化版本。
IDEF1X 表示法那些需要符号解释的人。
请注意,仅需要一个代理密钥。
这是因为在替代方案中,(LastName+FirstName+Initials_BirthDate+BithDate) 将是 Person PK,并且它将作为 FK 在 5 个子/孙表中携带,即 81 字节,这是不明智的。
看看你是否能体会到标识符(实线)被传递给子代和孙子;它们具有并传达意义
当我们有一个完美的 PersonId(即外键并且已经是唯一的)时,为 TeacherId、StudentId、StaffId 添加代理键是愚蠢的。 (这些列的命名是为了标识它们的角色。)
所有业务规则都在 DDL 中实现:FK 约束;检查约束;规则。
房间有一个 4 列复合钥匙;既定课程具有 3 列复合密钥;两者一起消除重复预订。
既定课程 PK 和学生 PK 共同构成注册 PK(与此问题相同;PK 由不同的列组成,仅此而已)。
Option 2 is correct, except it should be called
student_class
, reflecting its n::n function, or Enrolment as an entity. and(student_id, class_id)
is the PK.Grade (as you have shown it) is a 1::1 dependency on that compound key (not on one or the other element), and on nothing else, therefore it is an attribute of
student_class
.And therefore
student_class
is in 3NF.If people did not start off by blindly sticking
Id
columns on everything that moved, as you did with Option 1, they would be able to understand the data better and thus normalise better. That (Id
column in Option 1 as a starting point) interfered with your intuition that the(student_id, class_id)
was the Identifier; no additionalId
column with its additional index was necessary. Then when you got around to evaluatinggrade
, its dependency on that PK is obvious.Id
columns damage the relational capability of the database. If you have say three tables in a hierarchy, and you need to grab some columns from the top and bottom tables, you are forced to go through the middle table. If you had Relational Identifiers, instead of Id columns, you get from the bottom table to the top table with having to read the middle table.It is only half true that there are so many joins in a "normalised" database. The full truth is, since the database is not correctly normalised, yes, you are forced into many more joins than are necessary. In a truly Normalised database, with the same tables, the code requires much less joins.
Here's a Data Model for a College from a recent assignment, simplified version.
IDEF1X Notation for those who need explanation of the symbols.
Note only one Surrogate Key is required.
This is because in the alternative, (LastName+FirstName+Initials_BirthDate+BithDate) would be the Person PK, and that would be carried as FK in 5 child/grandchild tables, which is 81 bytes, and that is not sensible.
See if you can appreciate that the Identifiers (solid lines) are carried through to the children and grandchildren; they have, and convey meaning
It would be stupid to add Surrogate Keys for TeacherId, StudentId, StaffId, when we have a perfectly good PersonId, which is the Foreign Key and already unique. (The columns are named as such to identify their roles.)
All Business Rules were implemented in DDL: FK Constraints; Check Constraints; Rules.
Room has a 4-column Compound Key; Offering has a 3-column Compound Key; the two together eliminate double bookings.
The Offering PK and the Student PK together form the PK for Enrolment (identical to this question; the PKs are made up of different columns, that's all).
我是第三范式的粉丝,在这种形式中,您有单独的学生、班级和成绩表,并将它们与多对多表(例如 ClassStudent 和 GradeClass)链接起来。
但这取决于您将来想如何维护它。最终归结为未来的扩展和可维护性。这就是为什么我更喜欢 3NF。
I'm a fan of third-normal form, where you have separate Student, Class and Grade tables and link them with many-to-many tables like ClassStudent and GradeClass.
But it depends on how you want to maintain it in the future. Ultimately it comes down to future extension and maintainability. Which is why I prefer 3NF.
这一切都取决于,真的。选项 1 可能是执行此应用程序的最可靠的方法;对于本次迭代,选项 2 可能会让您更快地到达那里。选项 2 -> 的更改是否会发生变化? 1 以后还会这么痛苦吗?您有多确定您需要额外的灵活性?
我建议只选择选项 1。查询不会那么复杂,如果您使用 ORM(例如 ActiveRecord for Rails 等),那么差异实际上为零。
It all depends, really. Option 1 is probably the most robust way of doing this application; option 2 might get you there quicker for this iteration. Will the change from option 2 -> 1 be that painful in the future? How sure are you that you will need that extra flexibility?
I would recommend just going for option 1. The queries won't be that much more complicated and if you are using an ORM (like ActiveRecord for Rails, among many), then the difference is practically null.