我的简单测验数据库是包含主题和内容的吗?子主题设计得当吗?

发布于 2024-09-29 22:59:41 字数 1028 浏览 3 评论 0原文

我正在制作一个简单的测验数据库,其中一个问题有一个答案和一个或多个图像文件,并且属于一个子主题,而子主题又属于一个主题。此外,每个组可能属于三个级别之一。

我希望显示一个问题(以及一张图像,如果存在),并将用户的答案与正确答案进行比较。我需要主题和子主题才能显示正确的问题。

QUESTION
-------------------
question_id      pk
question    varchar
answer      varchar
subtopic_id foreign_key

MEDIA
-------------------
media_id         pk
file_name   varchar
question_id foreign_key

SUBTOPIC
-------------------
subtopic_id      pk
subtopic    varchar
topic_id    foreign_key

TOPIC
-------------------
topic_id         pk
topic       varchar
level       choices(1,2,3)

我的数据库设计正确吗?如果没有,我怎样才能让它变得更好?

diagram

注意:

  1. 级别可以有唯一的(主题 1 和 3)和 共享(主题 2)主题。
  2. 一个主题可以有一个或多个 子主题,但子主题不能 属于多个主题。
  3. 一个子主题可以有一个或多个 问题,但问题属于 仅一个子主题。两个问题(问题 2 和问题 3)可能具有相同的副主题和主题,但其级别可能不同。主题(主题 2)和子主题(子主题 2 和 3)可能属于多个级别(级别 1 和 2)。一些子主题仅包含级别 1 的问题,而其他子主题则包含级别 2(子主题 3)的问题(问题 4)。
  4. 一个问题只能有一个答案 这个答案不能属于 另一个问题。
  5. 答案(或其问题)可以 有一个可选的图像文件,并且 该图像文件可用于其他 问题。

I'm making a simple quiz database, where a question has an answer and one or more image files, and belongs to a subtopic which in turn belongs to a topic. Also each group may belong to one of three levels.

I expect to show a question (and an image, if one exists), and compare the user's answer to the correct answer. I need topics and subtopics to in order to display proper questions.

QUESTION
-------------------
question_id      pk
question    varchar
answer      varchar
subtopic_id foreign_key

MEDIA
-------------------
media_id         pk
file_name   varchar
question_id foreign_key

SUBTOPIC
-------------------
subtopic_id      pk
subtopic    varchar
topic_id    foreign_key

TOPIC
-------------------
topic_id         pk
topic       varchar
level       choices(1,2,3)

Is my database properly designed? If not, how can I make it better?

diagram

Notes:

  1. Levels can have both unique (Topics 1 and 3) and
    shared (Topic 2) topics.
  2. A topic can have one or more
    subtopics, but a subtopic cannot
    belong to more than one topic.
  3. A subtopic can have one or more
    questions, but a question belongs
    to only one subtopic. Two questions (Questions 2 and 3) may have the same subtopic and topic, but their levels may differ. Topics (Topic 2) and subtopics (Subtopics 2 and 3) may belong to many levels (Levels 1 and 2). Some subtopics hold exclusively questions of level 1 and others hold questions (Question 4) of level 2 (Subtopic 3).
  4. A question can have only one answer
    and that answer cannot belong to
    another question.
  5. An answer (or its question) can
    have one optional image file, and
    that image file can be used in other
    questions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

倾`听者〃 2024-10-06 22:59:41

很好的问题,在构建应用程序之前提出这个问题足以说明您学习和提高数据建模技能的能力。

模型的局限性

从简单的意义上讲,这是标准化的:您已经决定了实体是什么;您需要什么关系;您甚至认为它们是无意义的东西,没有识别特征,并且您将 IDENTITY 列指定为物理键。 然后您已经标准化了表格。

这与关系模型或标准化无关。

这是典型的应用程序思考方式(可以理解,因为这就是你所拥有的一切,并且你知道功能需求)。但这与数据无关。如果您从这个角度创建它,您将不会获得数据库,您将获得应用程序的存储位置。现在,如果该应用程序是为一项测验而编写并扔掉的,那是可以接受的。但如果测验要持续一两个月;公司十年后是否仍然存在;测验是否会增加;如果要运行的统计数据将会增长、改进和成熟……那么你将会遇到很多问题。每次增强应用程序时,您都必须更换其“数据库”。在某些时候,他们会认为应用程序(以及其中包含的“数据库”)的成本太高而无法更改,然后放弃它们。

即使应用程序存在的时间很短,由于数据收集没有标准化,它也非常有限,您将失去在每个商业 RDBMS 中实现的 RM 的“力量”。导航(无论如何都很麻烦,因为 SQL 连接很麻烦)甚至更麻烦。例如。要从孩子到祖父母,您必须获得父母,即使没有从中获得任何列。用户只能通过应用程序访问数据,它是“封闭的”。如今,有数千种报告工具连接到数据库并允许用户执行尚未识别的各种查询,而无需等待应用程序团队实施更改请求:能力丧失。并且您必须每年左右“重构”数据堆。

规范化

规范化产生开放数据库,允许不受限制地进行此类访问。

数据的寿命比应用程序长得多。它在应用程序中继续存在,并且对组织有价值。当应用程序被替换时,他们将提取数据并希望替换应用程序准备好加载数据。如果他们不使用数据建模器,新的应用程序设计者就不会从旧的应用程序设计者的错误中吸取任何教训,整个过程就会重演。

好的,该组织预计明年就会存在。好的,所以您将在生产中运行多个测验。好的,所以您已经聘请了一位数据建模师来检查您收集的数据,这样您就不会犯这些错误。出色的。谢谢。我接受。

关系建模 203

数据建模 (a) 完全独立于应用程序和您可能了解的任何功能,以及 (b) 使用应用程序开发人员不了解的完全不同的技术。那么让我们开始吧。这不是关于标准化的正式课程(这会花费太长时间,而且理论会让你厌烦得要命);这只是模型师工作时的观察。您所需要做的就是诚实地回答问题(每次我发布问号时,请停下来回答,然后再继续)(并不是暗示您不诚实;只是强调“我不知道”是完全可以接受的,因为它确定了我们必须努力的领域;而明确的肯定和否定使我们能够避免讨论)。另外,请原谅我,但我会假设答案,只是为了避免来回的延误;请指出我的任何错误,我会纠正模型。

好的,您已经清楚了一些实体,所以让我们从它们开始,我认为它们看起来像这样 在沙子上乱涂乱画。我故意不使用标准符号,因为我不希望在现阶段将这些含义纳入其中,因为我们还没有实现这些含义,并且我们不想向理解标准符号的人(包括我们自己)传达错误的期望。

  1. 每个实体中的行如何与其他行唯一标识? (这很重要,因为它可以帮助我们验证该实体实际上是一个实体,而不是唐老鸭,不是真实的东西。)从提供的列中,您将(至少):
    问题.问题
    .
    媒体 ?到底如何才能确保不存在 42 张相同的图像?如果我允许的话,当我们的老板对你尖叫时,会踢的屁股,然后也会踢我的屁股。 IDENTITY 不会帮助我,它会很乐意让我插入重复项。文件名是一个很好的标识符。媒体.文件名
    .
    话题 ?您想要 101 个相同的主题吗?我以为不是。
    主题.主题
    .
    副主题.副主题 ?号码,身份?不,你很清楚它属于主题,而且我已经将主题作为外键携带,这是有意义的;如果它是受抚养子女,而不是独立孤儿,则 FK 是识别关系
    副主题:(主题、副主题)。
    .
    好的,我们有了机身。

  2. 实体之间如何关联?
    你很清楚 Subtopic 是 Topic 的子主题,很好。
    .
    Question 是 Subtopic 的子主题吗?我认为不是。您是否真的想要相同的问题“家庭拥有多少辆汽车”,该问题可能存在于 42 个测验中,插入 42 行? “两次测验中永远不会有相同的问题”是可疑的,所以请不要说出来,除非你打算把它写下来。更糟糕的是,假设问题是一个非常生动的问题,那么我们也会有 42 个媒体条目。如果老板因为必须在 42 个地方更改相同的问题而感到不安,并且当他这样做时,他发现某些图像很古老,去年进行重大图像更新时它们没有更新......这是你的后端.
    .
    我们是否允许一个问题存在于多个测验中的可能性如何?
    .
    所以问题是独立的。

  3. 返回实体。在我们的讨论中,从第五个词开始,你一直说“测验”,我也一直这么说,但我们没有对应的实体。我无法想象一组松散且重复的问题,我刚刚删除了这些问题,而没有识别所述问题集合。现在你可能会说“实际上测验就是主题”,但这是非常严格的:每次老板需要为某个新客户添加测验时,他都必须添加整个主题/副主题/问题集 重新开始,即使他知道他之前在现有测验中输入过那些确切的问题;这就是为什么他赢得了新业务,也是为什么他最终会盈利,而你刚刚将利润减少到零。我的屁股仍然因为他上次的不安而受伤,所以我们要确保他没有其他借口。这让他能够独立成长、培育和更改测验和主题/副主题,而无需重复问题:
    .
    实体。测验
    钥匙 ?好吧,最好有一个索引,以确保我们不会向他提供重复的测验。假设我们不想将 CHAR(80) 键带入子级,并且用户认为他们组成的 ShortName CHAR(12) 是一个很好的标识符,比 10 位数字更好。
    测验:测验代码

    好的,我们的机身足够大,可以容纳有效载荷。到目前为止,我们已经有了改进的沙子涂鸦

  4. 什么实体是真正独立的,可以在没有其他实体存在的情况下存在;其余部分仅存在于另一个实体的上下文中?
    媒体仍然有点悬而未决。虽然我们排除了重复的文件名,但我们不允许在多个问题中使用同一张图像。让我们允许这一点。我们不需要身份,我们已经携带了问题 PK,因此它是 (a) 依赖的和 (b) 由父母识别的。
    .
    问题 PK 作为 CHAR(255) 以前没问题,但是在孩子中将其作为 FK 携带就变得愚蠢了。您喜欢“身份”,并且将会有数百万个问题。美好的。
    问题.问题Id
    .
    让我们为这些独立实体提供方角,并让子级从属实体为圆形。现在我们有了这个几乎,但不完全是,实体关系图

  5. 让我们结束关系吧。到目前为止我们已经:
    一个测验可以有多个问题,一个问题可以存在于多个测验中。
    问题是子主题的子主题,但它是独立的。
    媒体仅作为一种选项存在于问题的上下文中。
    .
    由于我们生活在一个有一定秩​​序的宇宙中,而且我们的努力是科学的,所以让我们给它的孩子一些秩序。我认为我们现在有资格获得逻辑实体关系图
    实线是识别关系;虚线不是。现在,因为我们已经对实体和关系应用了一些标准要求,所以我们可以使用标准符号。欢迎指出我的错误。

就这样,我们完成了。我希望我已经表达了这个练习是来回的,这就是它被称为建模的原因。宇航员有独立的私人宿舍,他们可以互相交流。

------------------------------------------------------------ -------------------------------------------------------

那是什么?您还想要数据模型吗?好的,给我五分钟。请为自己喝杯咖啡。

------------------------------------------------------------ -------------------------------------------------------

很小足够了,我也可以给你物理 数据模型。我使用 IDEF1X 方法来建模关系数据库,这是一个标准,因此数据模型以 IDEF1X 表示法,如果您需要快速了解这些符号的含义,请检查该链接。

  1. 多对多关系是一个逻辑概念,并在逻辑模型中如此绘制。它们在物理模型中被实现为关联表。我已经提供了。规范化数据库有更多的表(没什么可怕的),但每个表的列更少,并且没有重复的列(没有更新异常)。

  2. 天哪。主题和副主题栏非常大!我们无法将那些臃肿的外键迁移到 Question 中。 [与业务用户讨论。] 好吧,他们说只有一百个主题和几千个子主题。不需要 NUMERIC(10,0)。他们希望在下拉列表中包含完整的主题和副主题,并且他们同意它必须是唯一的,但额外的短 CHAR(6) 代码会更好。

看,它确实来回移动。纸很便宜;与任何人和每个人讨论;改进、纠正、改变、调整、改善,无需创建单个表格或编写一行代码;此时您将拥有一个值得为其编写代码的模型。其他的都不是。学习的唯一方法是呈现一些可靠的东西,然后将其推翻或改变;将所有错误记录在纸上,而不是记录在数据库中。

请注意,代理始终是一个附加键,一个附加索引。它们永远无法替代Key(这是你所拥有的,也是Eddie试图让你思考的:你并没有防止重复,你只是有一个毫无意义的键来保证行是独特,就像电子表格一样;以及错误的安全感)。所以我们需要尽量减少它们,而不是随意在每张桌子上印上它们。

我希望我已经在上面展示了,诸如“从不使用代理”和“始终使用代理”之类的简单化规则太愚蠢,无法讨论。不,仔细建模意味着:理解并考虑它们是附加的,而不是替代真正的密钥。仅在必要时使用,并且在必要时使用它们。在这里,我只用了一个身份就逃脱了惩罚。而对用户有意义的三个短代码是代理,但它们是有意义的; IDENTITY 列则不然(它们最终具有意义,但它们无法支持,这是问题的一部分)。

我将把数据类型留给你。但请记住,varchars 和 Nullable 列强制列可变。如果在索引中使用,那么速度会非常慢(每个条目都必须在每次访问时进行一些“解包”,甚至是中间级别),因此必须避免这种情况,除非您想向老板提供慢速数据库。

同样,如果您不希望页面上的行开销在每次更新时移动,请使用固定长度的列。这意味着我们不能偷懒,不能把所有东西都变成varchar。

好的,现在我们有了一个容纳火箭燃料的室。

对评论 1 的回复

根据您的上一个数据模型,如果我从问题表中省略 topicCode 会更好吗?在子主题和问题表中都包含 topicCode 是不是感觉有点多余?

很好的问题。

(最后、第五个是数据模型;第四个是实体关系图;前三个是天上掉下来的馅饼,到达那里。)

  1. 主键在子级中作为外键迁移,并不是多余的,它是必需的。

  2. 子主题 PK 是(TopicCode,SubtopicCode),一个复合键(商业数据库支持,作为关系模型的要求)。副主题水平线上方。

    • 一些开发人员害怕复合键,因为它们在 WHERE 子句中需要多个引用(纯粹是懒惰;SQL 的连接很麻烦;处理它)。
  3. 这是因为Topic::Subtopic关系是一个识别关系,即用Parent的PK来构造Child的PK,形成一个Compound Key。注意Parent PK无论如何都要被Child携带,作为FK,所以它并不多余;这是必需的。众所周知,这可以显着提高数据库的“功能”或“相关性”,并显着提高易用性(高级用户通常比开发人员更擅长使用 RDb)。

    • 这就是它被确定为标准的原因:IDEF1X 是一个强制执行关系模型更严格应用的标准;它促进对所有键的深思熟虑,这当然对于所谓的关系数据库的“关系性”至关重要。
  4. 在问题中,子主题的 FK 因此是 (TopicCode, SubtopicCode)。

  5. 如果您在主题和子主题中使用 ID,则问题将 (SubtopicId) 作为子主题的 FK,并且您会失去导航能力和意义。

    • 使用识别父母(和祖父母)的复合键优于使用 ID 的一个原因是,例如。当您只需要问题和主题中的列时,您不需要从子主题中进行选择,您可以直接从问题中获取主题代码;而使用 ID 时,您被迫从子主题中进行 SELECT,这不是报告要求的一部分,而是由于不良关系建模而强加的;未能实施识别关系,意义重大。仅具有 ID 作为 PK(作为 FK 携带)的数据堆比规范化关系数据库执行更多工作的原因之一。
      • 由于相关表简单且小,这可能不是一个很好的例子来展示关系键的优越性/IDENTITY 列的限制;考虑任何大型 Grandparent::Parent::Child 表。
      • 展示这种优越性/局限性的更好示例,值得任何希望进一步讨论此问题的人考虑,我的高级课程中的数据模型,因为这个问题已得到明确解决。
    • 另一个原因是失去了意义。例如。用户可以简单地从问题中选择主题代码、子主题代码、问题,并理解结果集。

回复评论 2

渐进式数据模型 101102

对评论 3 的回复

我不明白如何使用相同的图像文件来解决不同的问题?

简单,例如。 “命名一种不会飞的鸟”和“什么是鸸鹋”都可以使用同一张鸸鹋照片。这是我根据上面主帖中的 (1) 所做的假设。该模型通过在 FileName 上提供非唯一索引来实现这一点。

您是否应该在 Question 表中包含 Media.FileName 外键并从 Media 表中删除 QuestionId?

嗯,不。我的建模方式有几个优点。如果它在Question表中,那么有些大的时候,它会是Null。您需要文件名上的索引才能搜索它们;看看是否被使用;或不;这意味着我们无法对 FileName 列建立索引(或者我们可以对其建立索引,但由于 Null 索引会很慢)。现在它被建模为:

  • “选项”或问题的子类型(具有图像的问题)
  • 无空
  • 快速非唯一索引

它可以设置为图像库(唯一索引)。我们知道一个问题可以有零到一的图像。您可以告诉我们您的决定,我将更改模型:

  • 图像对于文件名是唯一的;它们只能在一个问题中使用
  • 图像的文件名是唯一的;它们可以用于多个问题

那是一种什么样的关系? 1::1?

是的。媒体端是“可选的”。准确地说:一个问题有零个或一个媒体。因此,“May Have”是关系的动词短语或标题。

该模型是根据自然层次结构绘制的:父母高于孩子;子类型位于同一水平面上。

看起来不像其他 n::n 或 1::n 关系。

没有 n::n 关系。 (逻辑级别有一个;在物理级别作为关联表实现。)

**为什么在 QuizQuestion 表中包含“答案”字段?*

您希望在哪里存储用户的答案?

看,您需要 Quiz 和 QuizQuestion 表;它们以前是有效的实体,当时你没有看到,但现在你看到了。

主题和子主题之间的关系在子主题表中定义。在 Question 表中,您已包含 TopicCode 和 SubtopicCode。您说过(如果我没记错的话)包含 TopicCode 很好,因为我可以直接从问题表访问主题,而无需加入副主题表。

是的。按照上面的(1.副主题)。

但是,考虑到您的模型,每次当我向数据库输入新问题时,我都应该选择一个主题和一个子主题[针对该问题,从现有列表中]。

那么,您必须这样做无论如何;该模型只是强制执行它。我(从数据中)了解到这就是您所需要的。一个Suptopic可以属于多个Topic。所以你必须给它一个主题和一个副主题(特定的组合)。

该数据库如何确保子主题属于子主题表中描述的某个主题?

您没有看到这一点吗?您是否可以看到 Subtopic 表已经提供了特定的 [无论您插入什么] Topic::Subtopic 组合...而不是其他组合?好的,所以当您添加一个新问题,并给它一个主题和子主题(这是子主题的主键,以及问题中的外键)时,数据库将强制执行 FK,以便只有主题的组合之一: :可以使用SubTopic中存在的子主题。

这是对良好自然关系键力量的一个小小的了解。

这就是良好关系模型的美妙之处:它通过单个[正确建模的]结构提供了许多要求。

回复评论4

渐进数据模型 101103

渐进式数据模型 101109

IDEF1X 表示法

Great question, and that you ask it before building the app speaks volumes for your ability to learn and improve your data modelling skills.

Limitations of Your Model

This is normalised in the simple sense: you've already decided what the entities are; what relations you need; you've even decided that they are meaningless things that have no identifying characteristics and you've given them IDENTITY columns as physical keys. Then you've normalised the tables.

That has nothing to do with the Relational Model or Normalisation.

It is typical of thinking in terms of the app (understandable, because that's all you have, and you know the functional needs). But that has nothing to do with the data. if you create it from that perspective, you will not get a database, you will get a storage location for the app. Now if the app was to be written for one quiz and thrown away, that would be acceptable. But if the quiz is going to be run for a month or two; if the company is still going to exist in ten years; if the quizzes are going to grow; if the stats that are going to be run, are going to grow, improve and mature ... well you will have quite a few problems. Every times you enhance the app, you will have to replace its "database". At some point they will decide that the app (and the "database" contained within it) is too expensive to change, and ditch both of them.

Even for the short duration that the app exists, because the collection of data is not normalised, it is very limited, the "power" of the RM, that which is implemented in every commercial RDBMS, is lost to you. Navigation (which is cumbersome anyway, due to SQL joins being cumbersome) is even more cumbersome. Eg. to get from a child to a grandparent, you are forced to get the parent, even though to are not getting any columns from it. User can only access the data via the app, it is "closed". These days there are thousands of report tools that connect to the database and allow the users to perform all kinds of queries, which have not been identified to you yet, without having to wait for a change request to be implemented by the app team: that capability is lost. And you will have to "re-factor" the data heap every year or so.

Nomalisation

Nomalisation produces Open databases, that allow such access without limitation.

Data lives much longer than the app. It survives the app, and has a value to the org. When the app gets replaced, they will extract the data and want the replacement app to be ready with the data loaded. If they did not use a data modeller, new app designer did not learn any lessons from the old app designers mistakes, and the whole repeats itself.

Ok, so the org expects to exist next year. Ok, so you are going to run more than one quiz in production. Ok, so you've hired a data modeller to run his eye past your collection of data, so that you don't make those mistakes. Excellent. Thankyou. I accept.

Relational Modelling 203

Data is modelled (a) completely independent of the app and any functions that you might know about and (b) using completely different techniques, that app developers do not know about. So let's get started. This is not a formal lesson re Normalisation (that will take too long, and the theory will bore you to death); this is just looking over the modellers shoulder while he works. All you need to do is answer questions [every time I post a question mark, please, stop and answer, before moving on], honestly (not suggesting you are dishonest; just reinforcing that "I don't know" is quite acceptable because it identifies an area we have to work on; whereas the clear affirmative and negative allows us to avoid discussion). Also, forgive me, but I will assume the answers, just to avoid the otherwise back-and-forth delays; please point out any of my mistakes, and I will correct the model.

Ok, you have some Entities that you are clear about, so let's start with them, I think they look like this Scribble in the Sand. I am purposely not using standard symbols because I do not want those meanings drawn into it at this stage, because we have not achieved those meanings, and we don't want to convey false expectations to people (including ourselves) who understand standard symbols.

  1. How are the rows in each Entity identified uniquely from every other row ? (This is important because it helps us verify that the Entity is in fact an Entity, and not Donald Duck, something not real.) From the columns provided, you would have (at least):
    Question.Question
    .
    Media ? how exactly, am I going to ensure that there are not 42 images which are identical ? If I allow that, you will kick my rear end when our boss screams at you, and then he will kick it as well. IDENTITY won't help me, it will happily let me insert duplicates. FileName is a good identifier. Media.FileName
    .
    Topic ? would you like 101 identical Topics ? I thought not.
    Topic.Topic
    .
    Subtopic.Subtopic ? No. IDENTITY ? No. You are pretty clear it belongs under Topic, and I am already carrying Topic as a Foreignkey, which has meaning; if it is a dependent child, not an independent orphan, then the FK is an Identifying Relation:
    Subtopic: (Topic, Subtopic).
    .
    Ok, we have fuselage.

  2. How are the Entities related to each other ?
    You are clear that Subtopic is a child of Topic, fine.
    .
    Question is a child of Subtopic? I think not. Do you really want the same Question "how many cars does the household own" which may exist in 42 quizzes, inserted in 42 rows ? "there will never be the same question in two quizzes" is suspicious, so please don't say it, unless you are going to put it in writing. What's worse is, let's say the question is a very picturesque one, then we will have 42 Media entries as well. If the boss gets upset at having to change the same question in 42 places, and when he does that he finds some of the images were ancient, they were not updated when the major image update took place last year ... it's your rear end.
    .
    How about we allow for the possibility of a Question existing in more than one Quiz ?
    .
    So Question is Independent.

  3. Back to Entities. In our discussions, starting with the fifth word, you keep saying "Quiz", and I keep saying it, but we don't have an Entity for it. I can't imagine a loose and duplicated set of Questions, that I've just de-duplicated, without identifying said collection of Questions. Now you may say "actually the Quiz is the Topic", but that is very restrictive: every time the boss needs to add a quiz for some new customer, he has to add the whole Topic/Subtopic/Question set all over again, even when he knows he has entered those exact Questions before, in existing Quizzes; which is why he won the new business, and why he was going to finally make a profit, which you've just reduced to zero. My rear end is still hurting from his last upset, so let's make sure he does not have another excuse. This let's him grow, nurture, and change Quizzes and Topic/Subtopics independently, without duplication of Questions:
    .
    Entity. Quiz
    Key ? Well, it better have an index on it to ensure we do not supply him with duplicate Quizzes. Let's say that we don't want to carry CHAR(80) keys into the children, and the user think a ShortName CHAR(12) that they get to make up is a nice identifier, better than a 10 digit number.
    Quiz: QuizCode
    .
    ok, we have fuselage big enough for payload. So far we have this Improved Scribble in the Sand.

  4. What Entities are truly Independent, that can exist without the existence of other Entities; and the remainder, which exists only in the context of another Entity ?
    Media is still a bit up in the air. While we have excluded duplicate FileNames, we have not allowed for the same Image being used in more than one Question. Let's allow that. We do not want an IDENTITY, we are already carrying the Question PK, so it is (a) Dependent and (b) Identified, by the parent.
    .
    Question PK as CHAR(255) was fine before, but carrying it as the FK in the child becomes silly. You like IDENTITY, and there are going to be millions of Questions. Fine.
    Question.QuestionId
    .
    Let's give these Independent Entities square corners, and leave the children Dependent Entities round. Now we have this almost, but not quite, Entity Relation Diagram.

  5. Let's finish the Relations. So far we have:
    A Quiz can have many Questions, and a Question can exist in many Quizzes.
    Question is a child of Subtopic, but it is Independent.
    Media exists only in the context of Question, as an option.
    .
    Since we live in an Universe that has some order, and our endeavour is a scientific one, let's give a child of it some order. I think we now qualify for a Logical Entity-Relation Diagram.
    The solid lines are Identifying Relations; the broken lines aren't. Now, because we have applied some standard requirements to the Entities and Relations, we can use standard symbols. Feel free to point out my mistakes.

That's it, we are done. I hope I have conveyed that the exercise is back-and-forth, that's why it is called modelling. The astronauts have separate, private quarters, and they can communicate with each other.

------------------------------------------------------------------------------------------

What's that ? You want the Data Model as well ? Ok, give me five minutes. Please grab a coffee for yourself.

------------------------------------------------------------------------------------------

It is small enough that I may as well give you the physical Data Model. I use the IDEF1X methodology for modelling Relational Databases, which is a standard, the Data Model is therefore rendered in IDEF1X Notation, please check that link if you need a quick review of what the symbols mean.
.

  1. Many-to-many Relations are a logical concept, and drawn as such in the Logical Model. They are implemented as Associative tables in the physical model. I have supplied that. A Normalised database has more tables (nothing to be afraid of) but fewer columns per table, and no duplicate columns (no Update Anomalies).

  2. Oh my God. The Topic and SubTopic columns are huge! We can't migrate those fat foreign keys into Question. [Discuss with The Business Users.] Ok, they say there will be only one hundred Topics and a few thousand Subtopics. No need for a NUMERIC(10,0). They want the full Topic and Subtopic in the drop-downs, and they agree it has to be unique, but an additional short CHAR(6) code would be nice.

See, it does go back-and-forth. Paper is cheap; discuss with anyone and everyone; improve, correct, change, modulate, ameliorate, without creating a single table or writing a line of code; at the you will have a model worth writing code for. Anything else is not. The only way to learn is by presenting something solid, and having it knocked back or changed; make all your mistakes on paper, not in the database.

Note that a Surrogate key is always an additional key, an additional index. They are never a substitute for the Key (which is what you had, and what Eddie was trying to get you to think about: you were not preventing duplicates, you merely had a meaningless key guaranteeing the rows were unique, like a spreadsheet; and a false sense of security). So we need to minimise them, not stamp them willy nilly on every table.

I hope I have shown above, simplistic rules such as "never use surrogates" and "always use surrogates" are too stupid to discuss. No, careful modelling means: understand and consider that they are additional, not in substitute of, the real Key. Use only when you have to, and when you have to, use them. Here I have managed to get away with just one IDENTITY. And the three short codes that are meaningful to the users are surrogates, but they have meaning; IDENTITY columns don't (they end up having meaning, which they can't support, which is part of the problem out there).

I will leave the DataTypes to you. But please keep in mind, varchars and Nullable columns force the column to be variable. Really slow if used in an Index (every entry has to have a bit of "unpacking" on every access, even the intermediate levels), so that must be avoided, unless you want to supply the boss with a slow database.

Likewise if you don't want the overhead of your rows on the page being moved on every UPDATE, fixed length columns all around. Which means we can't be lazy and varchar everything.

Ok, now we have a chamber to contain rocket fuel.

Response to Comments 1

from your last data model, would it be better if I omitted topicCode from the question table? DOesn't it feel little redundant to include topicCode in both the subtopic and question tables?

Excellent question.

(The last, fifth, is the Data Model; the fourth is the Entity Relation Diagram; the first three are pie in the sky, getting there.)

  1. A Primary Key migrated as a Foreign Key in the child, is not redundant, it is required.

  2. The Subtopic PK is (TopicCode, SubtopicCode), a Compound Key (which commercial databases support, as a requirement of the Relational Model). Above the horizontal line in Subtopic.

    • Some developers are scared of Compound Keys because they require more than one reference in the WHERE clause (pure laziness; SQL is cumbersome with joins; deal with it).
      .
  3. That is because the relation Topic::Subtopic is an Identifying relation, which means the PK of the Parent is used to construct the PK of the Child, forming a Compound Key. Note the Parent PK has to be carried in the Child anyway, as a FK, so it is not redundant; it is required. This is known to substantially increase the "power" or "relationality" of the db, and substantially increase ease of use (power users are often more adept in using a RDb than developers).

    • That is why it is identified as the standard: IDEF1X is a standard that enforces a stricter application of the Relational Model; it promotes thoughtful consideration of all Keys, which of course is crucial to the "relationality" of a supposed Relational Database.
      .
  4. In Question, the FK to Subtopic is therefore (TopicCode, SubtopicCode).

  5. If you used IDs in Topic and SubTopic, then Question would have (SubtopicId) as the FK to Subtopic, and you lose both navigational power and meaning.

    • One reason that using compound keys which identify the parent (and grandparent) is superior to using IDs is, eg. when you need columns from Question and Topic only, you do not have to SELECT from Subtopic, you can get TopicCode directly from Question; whereas with IDs, you are forced to SELECT from Subtopic, which is not part of the report requirement, it is an imposition due to poor relational modelling; failure to implement Identifying Relations, strong meaning. Just One Reason that data heaps that have IDs only as PKs (carried as FKs) perform much more work than Normalised Relational Databases.
      • Because the tables concerned are simple and small, this may not be a good example for demonstrating the superiority of Relational keys/limitation of IDENTITY columns; consider any large Grandparent::Parent::Child tables.
      • A better example for demonstrating this superiority/limitation, and worth consideration for anyone wishing to discuss this issue further, is the Data Model from my Advanced Course, because this issue is addressed explicitly.
    • Another reason is lost meaning. Eg. an user can simply SELECT TopicCode, SubtopicCode, Question FROM Question, and make sense of result set.

Response to Comments 2

Progressed Data Model 101102

Response to Comments 3

I don't understand how you can use the same image file with different questions?

Simple eg. "name a flightless bird" and "what is an Emu" can both use the same photograph of an Emu. That was my assumption as per (1) in the main post above. The model allows it by providing a Non-Unique index on FileName.

Should you include Media.FileName foreign key in the Question table and remove QuestionId from the Media table?

Well, no. There are several advantages to the way I have modelled it. If it is in the Question table, then some large potion of the time, it will be Null. You need an index on Filename to search on them; see if it is used; or not; etc. Which means we cannot index the FileName column (or we can index it, but the index will be slow due to Nulls). Right now it is modelled as:

  • an "option" or Subtype of Question (one that has an image)
  • No Nulls
  • Fast Non-unique Index

It can be set up as an image bank (Unique index). We know a Question can have zero-to-one Images. You can tell us your decisions, and I will change the model:

  • Images are Unique re FileName; they can be used in only one Question
  • Images are Unique re FileName; they can be used in more than one Question

And what kind of relationship is that? 1::1?

Yes. With the Media end being "optonal". Precisely: one Question has zero-or-one Media. Hence "May Have" is the Verb Phrase or Title of the relation.

The model is drawn with The Natural hierarchy in mind: Parents are above children; subtypes are on the same horizontal.

Doesn't look like other n::n or 1::n relationships.

There are no n::n relations. (There was one at the Logical level; which was implemented as the Associative table at the physical level.)

**Why did you include the Answer field in the QuizQuestion table?*

Where would you like to store the users' answers ?

See, you need the Quiz and QuizQuestion tables; they were valid entities before, you did not see that then, but you do now.

The relationship between topics and subtopics are defined in the Subtopic table. In the Question table, you have included TopicCode and SubtopicCode. You said (if I'm not mistaken) that including the TopicCode is good because I can get to the Topic directly from the Question table without joining the Subtopic table.

Yes. As per (1.Subtopic) above.

But, given your model, every time when I enter a new question to the DB, I should choose a topic and a subtopic [for the question, from the existing list].

Well, you have to do that anyway; the model just enforces it. I understood (from the data) that that is what you needed. One Suptopic can belong to more than one Topic. So you have to give it both a Topic and a Subtopic (the specific combination).

How does this database make sure that the subtopic belongs to a certain topic as described in the Subtopic table?

You don't see that ? Can you see that the Subtopic table already supplies specific [whatever you have inserted] Topic::Subtopic combinations ... and not others ? Ok, so when you add a new Question, and give it a Topic and Subtopic (which is the Primary Key of SubTopic, and the Foreign Key in Question), the database will enforce the FK such that only one of the combinations of Topic::Subtopics that exist in SubTopic can be used.

That's a small insight into the power of Good Natural Relational Keys.

That's the beauty of a good Relational model: it supplies many requirements from a single [correctly modelled] structure.

Response to Comments 4

Progressed Data Model 101103

Progressed Data Model 101109

IDEF1X Notation

送君千里 2024-10-06 22:59:41

由于问题属于子主题,因此问题表应该有一个指向子主题 id 的外键

Since Question belongs to a subtopic, the Question table should have a foreign key to the subtopic id

夏见 2024-10-06 22:59:41

当然,这是标准化的,并且根据您的要求看起来很好。但是,您可能在问题表上遗漏了 subtopic_id

Sure, this is normalized and looks fine given your requirements. However, you may have left off the subtopic_id on the question table.

风柔一江水 2024-10-06 22:59:41
  1. 您应该在它们的表之后命名 ids,例如,question_id、media_id 等。(在它们的主键位置,也就是说,不仅仅是当它们用作外键时。)这使得连接更容易,因为您可以说诸如 来自问题 JOIN MEDIA USING (question_id))。
  2. 问题表缺少 subtopic_id。
  3. 您可能想用 level_id 替换 level,在 LEVEL 表上有一个外键,但这不是一个硬性规则,只是一个建议。更灵活,但工作量也更大;最终取决于什么对您的问题领域有意义。

除了那些小的勘误之外,它对我来说看起来不错。

  1. You should name ids after their tables, e.g., question_id, media_id, etc. (In their primary key locations, that is, not just when they are used as foreign keys.) This makes joins easier, since you can say things like FROM question JOIN MEDIA USING (question_id)).
  2. The question table is missing a subtopic_id.
  3. You may want to replace level with level_id, having a foreign key on a LEVEL table, but that's not a hard-and-fast rule, just a suggestion. It's more flexible, but also more work; it ultimately comes down to what makes sense for your problem domain.

Other than those minor errata, it looks fine to me.

他是夢罘是命 2024-10-06 22:59:41

命名约定建议:

主键:someNameId
外键:someOtherName_Id

目前在几个表中您有 media_id 和 Question_id。不看数据结构就不清楚什么是什么。现在假设您有 80 个表,每个表中有 20 到 40 个属性。您会迷失在哪里是外键、哪里是主键。

Suggestion to naming conventions:

Primary key: someNameId
Foreign key: someOtherName_Id

At the moment in few tables you have media_id and question_id. It's unclear what's what without looking at the data structure. Now imagine you have 80 tables and 20 to 40 attributes in each table. You'll get lost in where is a foreign key and where is a primary key.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文