关系数据库设计（规范化多对多映射）

发布于 2024-10-03 18:57:57 字数 536 浏览 10 评论 0原文

学生可以在许多不同的班级。每个班级都有许多不同的学生。一个学生可以有多个成绩，但每个班级只能有 1 个成绩——每一对（学生、班级）都有一个成绩。

我应该像这样布局（MySQL）数据库：

students (student_id, student_name)
classes (class_id, class_name)
students_classes (student_class_id, student_id, class_id)
grades (student_class_id, grade)

还是这样：

students (student_id, student_name)
classes (class_id, class_name)
grades (student_id, class_id, grade)

或者其他什么？

现在选项 2 更简单，但将来我可能需要与每个 (student_id, class_id) 对相关的其他统计信息，在这种情况下，选项 1 可能更好，尽管它过于复杂。

原文

Students can be in many different classes. Each class has many different student. A student can have many grades but only 1 grade per class--every (student, class) pair has one grade.

Should I layout the (MySQL) database like:

students (student_id, student_name)
classes (class_id, class_name)
students_classes (student_class_id, student_id, class_id)
grades (student_class_id, grade)

Or this:

students (student_id, student_name)
classes (class_id, class_name)
grades (student_id, class_id, grade)

Or something else?

Option 2 is simpler now, but in the future I might need other statistics related to each (student_id, class_id) pair, in which case option 1 may be better, though it is overly complicated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

计㈡愣 2024-10-10 18:57:57

选项3）

students (student_id, student_name)
classes (class_id, class_name)
students_classes (student_class_id, student_id, class_id, grade)

除非年级有可能成为一个成熟的实体。在这种情况下：

选项 4)

students (student_id, student_name)
classes (class_id, class_name)
students_classes (student_class_id, student_id, class_id)
grades (grade_id, grade, student_class_id)

Option 3)

students (student_id, student_name)
classes (class_id, class_name)
students_classes (student_class_id, student_id, class_id, grade)

Unless grade has the possibility of becoming a full-fledged entity. In which case:

Option 4)

students (student_id, student_name)
classes (class_id, class_name)
students_classes (student_class_id, student_id, class_id)
grades (grade_id, grade, student_class_id)

回复收藏 0 原文

只为一人 2024-10-10 18:57:57

选择选项 2。成绩的复合主键没有任何问题，它捕获数据模型中所需的信息。

在选项 1 中，students_classes 除了具有代理键之外没有其他任何用途。

2NF：成绩（非关键）仅取决于学生/班级（关键）

3NF：不适用，您在非关键依赖项上没有非关键

BCNF：不适用，您只有一个候选键

回复收藏 0 原文

笑，眼淚并存 2024-10-10 18:57:57

选项 2 是正确的，但它应该被称为 student_class，反映其 n::n 函数或作为实体的 Enrollment。 (student_id, class_id) 是 PK。

Grade（如您所示）是对该复合键（而不是一个或另一个元素）的 1::1 依赖关系，而不是其他元素，因此它是 student_class 的一个属性。

因此student_class属于3NF。

如果人们没有像选项 1 那样盲目地将 Id 列粘贴在所有移动的内容上，他们将能够更好地理解数据，从而更好地标准化。（选项 1 中的 Id 列作为起点）干扰了您的直觉，即 (student_id, class_id) 是标识符；不需要额外的 Id 列及其附加索引。然后，当您开始评估等级时，它对 PK 的依赖性是显而易见的。

Id 列会损害数据库的关系功能。如果您在层次结构中有三个表，并且需要从顶部和底部的表中获取一些列，则您将被迫遍历中间的表。如果您有关系标识符，而不是 Id 列，则您可以从底部表到达顶部表，而必须读取中间表。

“规范化”数据库中有如此多的连接，这一说法只对了一半。完整的事实是，由于数据库没有正确规范化，是的，您被迫进行比必要的更多的连接。在真正规范化的数据库中，使用相同的表，代码需要的连接要少得多。

这是大学数据模型来自最近的作业，简化版本。

IDEF1X 表示法那些需要符号解释的人。

请注意，仅需要一个代理密钥。
这是因为在替代方案中，(LastName+FirstName+Initials_BirthDate+BithDate) 将是 Person PK，并且它将作为 FK 在 5 个子/孙表中携带，即 81 字节，这是不明智的。
看看你是否能体会到标识符（实线）被传递给子代和孙子；它们具有并传达意义
当我们有一个完美的 PersonId（即外键并且已经是唯一的）时，为 TeacherId、StudentId、StaffId 添加代理键是愚蠢的。（这些列的命名是为了标识它们的角色。）
所有业务规则都在 DDL 中实现：FK 约束；检查约束；规则。
房间有一个 4 列复合钥匙；既定课程具有 3 列复合密钥；两者一起消除重复预订。
既定课程 PK 和学生 PK 共同构成注册 PK（与此问题相同；PK 由不同的列组成，仅此而已）。

Option 2 is correct, except it should be called student_class, reflecting its n::n function, or Enrolment as an entity. and (student_id, class_id)is the PK.

Grade (as you have shown it) is a 1::1 dependency on that compound key (not on one or the other element), and on nothing else, therefore it is an attribute of student_class.

And thereforestudent_classis in 3NF.

If people did not start off by blindly sticking Id columns on everything that moved, as you did with Option 1, they would be able to understand the data better and thus normalise better. That (Id column in Option 1 as a starting point) interfered with your intuition that the(student_id, class_id) was the Identifier; no additional Id column with its additional index was necessary. Then when you got around to evaluatinggrade, its dependency on that PK is obvious.

Id columns damage the relational capability of the database. If you have say three tables in a hierarchy, and you need to grab some columns from the top and bottom tables, you are forced to go through the middle table. If you had Relational Identifiers, instead of Id columns, you get from the bottom table to the top table with having to read the middle table.

It is only half true that there are so many joins in a "normalised" database. The full truth is, since the database is not correctly normalised, yes, you are forced into many more joins than are necessary. In a truly Normalised database, with the same tables, the code requires much less joins.

Here's a Data Model for a College from a recent assignment, simplified version.

IDEF1X Notation for those who need explanation of the symbols.

Note only one Surrogate Key is required.
This is because in the alternative, (LastName+FirstName+Initials_BirthDate+BithDate) would be the Person PK, and that would be carried as FK in 5 child/grandchild tables, which is 81 bytes, and that is not sensible.
See if you can appreciate that the Identifiers (solid lines) are carried through to the children and grandchildren; they have, and convey meaning
It would be stupid to add Surrogate Keys for TeacherId, StudentId, StaffId, when we have a perfectly good PersonId, which is the Foreign Key and already unique. (The columns are named as such to identify their roles.)
All Business Rules were implemented in DDL: FK Constraints; Check Constraints; Rules.
Room has a 4-column Compound Key; Offering has a 3-column Compound Key; the two together eliminate double bookings.
The Offering PK and the Student PK together form the PK for Enrolment (identical to this question; the PKs are made up of different columns, that's all).

回复收藏 0 原文