具有多列的单个固定表与灵活的抽象表

发布于 2024-09-28 20:26:25 字数 1021 浏览 6 评论 0原文

我想知道您是否有一个包含十几种不同类型的列表(商店、餐厅、俱乐部、酒店、活动)的网站,这些列表需要不同的字段,创建一个具有如此定义的列的表是否有好处
示例商店:

shop_id | name | X | Y | city | district | area | metro | station | address | phone | email | website | opening_hours

或者与此类似的更抽象的方法:

object_id | name        
---------------
1         | Messy Joe's  
2         | Bate's Motel 

type_id | name
---------------
1       | hotel
2       | restaurant


object_id | type_id
---------------
1         | 2
2         | 1

field_id | name           | field_type
---------------
1        | address        | text
2        | opening_hours  | date 
3        | speciality     | text

type_id | field_id
---------------
1       | 1
1       | 2
2       | 1
2       | 3

object_id | field_id | value
1         | 1        | 1st street....
1         | 3        | English Cuisine

当然,如果预定义值,它可以更抽象(例如:专业可以有自己的列表)

如果我采用抽象方法,它可以非常灵活,但查询会更多具有大量连接的复杂情况。 但我不知道这是否会影响执行这些“更复杂”查询的性能。

我很想知道这两种方法的优点和缺点是什么。我只能自己想象,但我没有经验来证实这一点。

I was wondering if you have a website with a dozen different types of listings (Shops, Restaurants, Clubs, Hotels, Events) that require different fields, is there a benefit of creating a table with columns defined like so
Example Shop:

shop_id | name | X | Y | city | district | area | metro | station | address | phone | email | website | opening_hours

Or a more abstract approach similar to this:

object_id | name        
---------------
1         | Messy Joe's  
2         | Bate's Motel 

type_id | name
---------------
1       | hotel
2       | restaurant


object_id | type_id
---------------
1         | 2
2         | 1

field_id | name           | field_type
---------------
1        | address        | text
2        | opening_hours  | date 
3        | speciality     | text

type_id | field_id
---------------
1       | 1
1       | 2
2       | 1
2       | 3

object_id | field_id | value
1         | 1        | 1st street....
1         | 3        | English Cuisine

Of course it can be more abstract if value's are predefined (Example: specialties could have their own list)

If I take the abstract approach it can be very flexible, but queries will be more complex with a lot of joins.
But I don't know if this affects the performance, executing these 'more complex' queries.

I would be interested to know what are the up and downsides of both methods. I can just imagine for myself, but I don't have the experience to confirm this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

舂唻埖巳落 2024-10-05 20:26:25

在我们进行合理的讨论之前,某些问题需要澄清和解决。

先决条件解决方案

  1. 标签
    在一个要求精确的职业中,我们使用精确的标签非常重要,以避免混淆,这样我们就可以进行交流,而不必使用冗长的描述和限定词。

    您作为固定表发布的内容是非标准化的。公平地说,这可能是第三范式的尝试,但实际上它是一个平面文件,非规范化(不是“非规范化”)。准确地说,您发布的 AbstractTables 是 Entity-Attribute-Value< /strong>,这几乎是第六范式,但不完全是,因此比 3NF 更规范化,当然,假设它做得正确。

    • 非规范化平面文件不是“非规范化”的。它充满了重复(没有采取任何措施来删除重复组和重复列或解决依赖关系)和 Null,它在很多方面都是性能消耗者,并且会阻止并发性。

    • 为了非规范化,必须首先对其进行规范化,然后规范化由于某种充分的原因而稍微后退。由于它一开始就没有被规范化,所以它不能被非规范化。它根本就是非标准化的。

    • 不能说它是“为了性能”而非规范化的,因为作为一个性能消耗者,它是性能的对立面。好吧,他们需要一个缺乏正式设计的理由],而“为了性能”就是这样。即使是最小的正式审查也会暴露出虚假陈述(但很少有人能够提供,因此它仍然是隐藏的,直到他们让外部人员来解决,你猜对了,巨大的性能问题)。

    • 标准化结构的性能远优于非标准化结构。标准化程度较高的结构 (EAV/6NF) 比标准化程度较低的结构 (3NF/5NF) 表现更好。

    • 我同意 OMG Ponies 的主旨,但不同意他们的标签和定义

    • 我不是说“除非必要,否则不要“非规范化””,而是说,“忠实地规范化,句号” “如果存在性能问题,则说明您没有正确标准化”。

  2. 维基百科
    范式和规范化的条目提供了不正确的定义;他们混淆了范式;他们缺乏正常化进程;他们对早已被揭穿的荒谬或有问题的 NF 给予同样的重视。结果是,维基百科增加了一个已经很混乱且很少被理解的主题。所以不要浪费你的时间。

    但是,为了取得进展,并且不让该引用构成障碍,让我这样说。

    • 3NF 的定义是稳定的,没有改变。
    • 3NF 和 5NF 之间的 NF 有很多混淆。事实上,这个领域在过去 15 年里取得了进步;许多组织、学术机构以及产品有局限性的供应商纷纷创建一种新的“正常形式”来验证他们的产品。一切都是为了商业利益而在学术上不健全。 3NF 在其原始未篡改状态下旨在并保证某些属性。
    • 总而言之,5NF 就是今天的 3NF 15 年前的样子,您可以跳过商业玩笑和中间的十二个左右“特殊”(商业和伪学术)NF,一些其中的内容已在维基百科中找到,甚至使用了令人困惑的术语。
  3. 第五范式
    由于您已经能够在帖子中理解并实现 EAV,因此理解以下内容不会有任何问题。当然,真正的关系模型是先决条件、强键等。第五范式是,因为我们跳过第四范式:

    • 第三范式
      • 简单来说,每个表中的每个非键列都与表的主键具有 1::1 关系,
      • 并且没有其他非关键列
    • 零数据重复(如果努力推进规范化,就会得到结果;不能仅靠智力或经验实现,也不能将其作为没有正式流程的目标来实现)
    • 无更新异常(当您更新某处的列时,您不必更新位于其他位置的同一列;该列仅存在于一个位置)。
    • 如果您理解上述内容,4NF、BCNF 和所有愚蠢的“NF”都可以被忽略,它们是学术界所提倡的物理化记录归档系统所必需的,这与关系模型 (Codd) 相当陌生。
  4. 第六范式

    • 目的是消除缺失数据(属性列),也称为消除空值
    • 这是空值问题(也称为处理缺失值)的唯一真正解决方案,其结果是一个没有空值的数据库。 (可以在 5NF 中使用标准和 Null 替代来完成,但这不是最佳选择。)如何解释和显示缺失值是另一回事了。
    • 从技术上讲,它不是真正的范式,因为它没有 5NF 作为先决条件,但它有一个值
  5. EAV 与第六范式
    我写的所有数据库,除了一个之外,都是纯粹的 5NF。我曾使用(管理、修复、增强)几个 EAV 数据库,并且实现了许多真正的 6NF 数据库。 EAV 是 6NF 的松散实现,通常由对规范化和 NF 没有很好掌握的人完成,但他们可以看到 EAV 的价值并需要 EAV 的灵活性。你就是一个完美的例子。

    区别在于:因为它是松散的,并且因为实现者没有忠实的参考(6NF),所以他们只实现他们需要的东西,并且将它们全部写在代码中;这最终会成为一个不一致的模型。

    然而,纯粹的 6NF 实现确实有纯粹的学术参考点,因此它通常更严格且一致。通常这会显示在两个可见元素中:

    • 6NF 有一个包含元数据的目录,所有内容都在元数据中定义,而不是在代码中。 EAV 没有,一切都在代码中(实现者跟踪对象和属性)。显然,目录可以简化列和导航的添加,并允许形成实用程序。
    • 当理解 6NF 时,它为空问题提供了真正的解决方案。 EAV 实现者由于缺少 6NF 上下文,因此会不一致地处理代码中丢失的数据,或者更糟糕的是,允许数据库中存在 Null。 6NF 实现者不允许 Null,并一致且优雅地处理丢失的数据,而不需要代码构造(对于 Null 处理;当然,您仍然需要为丢失的数据编写代码)。

例如。对于带有目录的 6NF 数据库,我有一组过程将[重新]生成执行所有 SELECT 所需的 SQL,并且我为所有用户提供 5NF 中的视图,因此他们不需要知道或理解底层的 6NF 结构。他们被从目录中赶出了。因此,更改变得简单且自动化。由于缺少目录,EAV 类型手动执行此操作。

讨论

现在,我们可以开始讨论了。

“当然,如果
值是预定义的(示例:
专业可以有自己的
列表)”

当然。但是不要太“抽象”。保持一致性并以与其他列表相同的 EAV(或 6NF)方式实现此类列表。

“如果我采取抽象方法
可以非常灵活,但查询会
通过大量连接变得更加复杂。
但不知道这是否会影响
性能,执行这些“更多”
复杂的查询。”

  1. 连接在关系数据库中是无足轻重的。问题不在于数据库,问题在于 SQL 在处理连接时很麻烦,尤其是复合键。

  2. EAV 和 6NF 数据库有更多的连接,就像行人一样,不多也不少。如果您必须手动编写每个 SELECT 代码,当然,这会变得非常麻烦。

  3. 整个问题可以通过 (a) 使用 6NF over EAV 和 (b) 实现一个目录来消除,从中您可以 (c) 生成所有基本 SQL。还消除了一整类错误。

  4. 连接在某种程度上是有成本的,这是一个常见的误解。完全错误。

    • 连接是在编译时实现的,没有任何实质内容会“消耗”CPU 周期。
    • 问题在于所连接的表的大小,而不是这些相同表之间的连接成本。
    • 按照正确的 PK⇢FK 关系连接两个各有数百万行的表,每个表都有适当的索引
      (在父端 [PK] 端唯一;在子端唯一 [PK=父 FK + 某些内容]
      是瞬时的
    • 如果子索引不唯一,但至少前导列有效,则速度较慢;没有有用的索引的地方,当然很慢。
    • 这一切都与加入成本无关。
    • 当返回很多行时,瓶颈将是网络和磁盘布局;不是连接处理。
  5. 因此,你可以随心所欲地“复杂”,没有任何成本,SQL 可以处理它。

我有兴趣知道什么是
两种方法的优点和缺点。
我只能自己想象,但我
没有经验来确认
这个。

  1. 就实施而言,5NF(或对于那些没有取得进步的人来说是 3NF)是最简单、最好的;易用性(开发人员和用户);和维护。

    • 缺点是,每次添加列时,都必须更改数据库结构(表 DDL)。在某些情况下这很好,但在大多数情况下则不然,因为变更控制到位,相当繁重。
    • 其次,您必须更改现有代码(处理新列的代码不算在内,因为这是必须的):如果实施了良好的标准,则将其最小化;如果没有它们,范围就难以预测。
  2. EAV(这是您发布的内容)允许在不更改 DDL 的情况下添加列。这就是人们选择它的唯一原因。 (处理新列的代码不算数,因为这是命令)。如果实现得好,不会影响现有代码;如果没有,它就会。

  3. 但是您需要具备 EAV 能力的开发人员。

    • 当 EAV 实施不当时,这是令人憎恶的,比 5NF 实施不当更糟糕,但并不比大多数数据库所采用的非规范化更糟糕(被错误地表述为“为了性能而非规范化”)。
    • 当然,拥有强大的 Transaction 上下文(比在 5NF/3NF 中)更为重要,因为列的分布更加分散。
    • 同样,保留声明性引用完整性也很重要:我所看到的混乱很大程度上是由于开发人员删除了 DRI,因为它变得“太难维护”,结果是,正如你可以想象的,一位母亲到处都有重复的 3NF/5NF 行和列的数据堆。并且 Null 处理不一致。
  4. 假设服务器已针对预期目的进行了合理配置,则性能没有差异。 (好吧,有一些特定的优化只能在 6NF 中实现,而在其他 NF 中是不可能的,但我认为这超出了本线程的范围。)同样,EAV 做得不好可能会导致不必要的瓶颈,最多不超过非标准化。

  5. 当然,如果您选择EAV,我建议您更加正式;购买全额英镑;与 6NF 一起使用;实施目录;生成 SQL 的实用程序;意见;一致地处理缺失数据;完全消除空值。这减少了您对开发人员质量的影响;他们可以忘记 EAV/6NF 深奥问题,使用视图并专注于应用程序逻辑。

Certain issues need to be clarified and resolved before we can enter into a reasonable discussion.

Pre-requisite Resolution

  1. Labels
    In a profession that demands precision, it is important that we use precise labels, to avoid confusion, and so that we can communicate without having to use long-winded descriptions and qualifiers.

    What you have posted as FixedTables, is Unnormalised. Fair enough, it may be an attempt at Third Normal form, but in fact it is a flat file, Unnormalised (not "denormalised). What you have posted as AbstractTables is, to be precise, Entity-Attribute-Value, which is almost, but not quite, Sixth Normal form, and is therefore more Normalised than 3NF. Assuming it is done correctly, of course.

    • The Unnormalised flat file is not "denormalised". It is chock full of duplication (nothing has been done to remove repeating groups and duplicate columns or to resolve dependencies) and Nulls, it is a performance hog in many ways, and prevents concurrency.

    • In order to be Denormalised, it has to first be Normalised, and then the Normalisation backed off a little for some good reason. Since it is not Normalised in the first place, it cannot be Denormalised. It is simply Unnormalised.

    • It cannot be said to be denormalised "for performance", because being a performance hog, it is the very antithesis of performance. Well, they need a justification for the lack of formalised design], and "for performance" is it. Even the smallest formal scrutiny exposed the misrepresentation (but very few people can provide, so it remains hidden, until they get an outsider to address, you guessed it, the massive performance problem).

    • Normalised structures perform far better than Unnormalised structures. More normalised structures (EAV/6NF) perform better than less normalised structures (3NF/5NF).

    • I am agreeing with the thrust of OMG Ponies, but not their labels and definitions

    • rather than saying 'don't "denormalise" unless you have to', I am saying, 'Normalise faithfully, period' and 'if there is a performance problem, you have not Normalised correctly'.

  2. Wikipedia
    The entries for Normal Forms and Normalisation offer definitions that are incorrect; they confuse the Normal Forms; they are lacking regarding the process of Normalisation; and they give equal weight to absurd or questionable NFs which have been debunked long ago. The result is, Wikipedia adds to an already confused and rarely understood subject. So don't waste your time.

    However, in order to progress, without that reference posing a hindrance, let me say this.

    • The definition of 3NF is stable, and has not changed.
    • There is a lot of confusion of the NFs between 3NF and 5NF. The truth is that this is an area that progressed over the last 15 years; and many orgs, academics as well as vendors with their products with limitations, jumped to create a new "Normal Form" to validate their offerings. All serving commercial interests and academically unsound. 3NF in its original untampered state intended and guaranteed certain attributes.
    • The sum total is, 5NF is today, what 3NF was intended to be 15 years ago, and you can skip the commercial banter and the twelve or so "special" (commercial and pseudo-academic) NFs in-between, some of which are identified in Wikipedia, and even that in confusing terms.
  3. Fifth Normal Form
    Since you have been able to understand and implement the EAV in your post, you will have no problem understanding the following. Of course a true Relational Model is pre-requisite, strong keys, etc. Fifth Normal Form is, since we are skipping the Fourth:

    • Third Normal Form
      • which in simple definitive terms is, every non-key column in every table has a 1::1 relationship to the Primary Key of the table,
      • and to no other non-key columns
    • Zero data duplication (the result, if Normalisation is progressed diligently; not achieved by intelligence or experience alone, or by working toward it as a goal without the formal process)
    • no Update Anomalies (when you update a column somewhere, you do not have to update the same column located somewhere else; the column exists in one and only one place).
    • If you understand the above, 4NF, BCNF, and all the silly "NFs" can be dismissed, they are required for physicalised Record Filing Systems, as promoted by academics, quite foreign to the Relational Model (Codd).
  4. Sixth Normal Form

    • The purpose is elimination of missing data (attribute columns), aka elimination of Nulls
    • This is the one true solution to the Null Problem (also called Handling Missing Values), and the result is a database without Nulls. (It can be done at 5NF with standards and Null substitutes but that is not optimal.) How you interpret and display the missing values is another story.
    • Technically, is not a true Normal Form, because it does not have 5NF as a pre-requisite, but it has a value
  5. EAV vs Sixth Normal Form
    All the databases I have written, except one, are pure 5NF. I have worked with (administered, fixed up, enhanced) a couple of EAV databases, and I have implemented many true 6NF databases. EAV is a loose implementation of 6NF, often done by people who do not have a good grasp on Normalisation and the NFs, but who can see the value in, and need the flexibility of, EAV. You are a perfect example.

    The difference is this: because it is loose, and because implementers do not have a reference (6NF) to be faithful to, they only implement what they need, and they write it all in code; that ends up being an inconsistent model.

    Whereas, a pure 6NF implementation does have a pure academic reference point, and thus it is usually tighter, and consistent. Typically this shows up in two visible elements:

    • 6NF has a catalogue to contain metadata, and everything is defined in metadata, not code. EAV does not have one, everything is in code (implementers keep track of the objects and attributes). Obviously a catalogue eases the addition of columns, navigation, and allows utilities to be formed.
    • 6NF when understood, provides the true solution to The Null Problem. EAV implementers, since they are absent the 6NF context, handle missing data in code, inconsistently, or worse, allow Nulls in the database. 6NF implementers disallow Nulls, and handle missing Data consistently and elegantly, without requiring code constructs (for Null handling; you still have to code for missing data of course).

Eg. For 6NF databases with a catalogue, I have a set of procs that will [re]generate the SQL required to perform all SELECTs, and I provide Views in 5NF for all users, so they do not need to know or understand the underlying 6NF structure. They are driven off the catalogue. Thus changes are easy and automated. EAV types do that manually, due to the absence of the catalogue.

Discussion

Now, we can start the discussion.

"Of course it can be more abstract if
value's are predefined (Example:
specialities could have their own
list)"

Sure. But do not get too "abstract". Maintain consistency and implement such lists in the same EAV (or 6NF) manner as other lists.

"If I take the abstract approach it
can be very flexible, but queries will
be more complex with a lot of joins.
But I don't know if this affects the
performance, executing these 'more
complex' queries."

  1. Joins are pedestrian in relational databases. The problem is not the database, the problem is that SQL is cumbersome when handling joins, especially compound keys.

  2. EAV and 6NF databases have more Joins, which just as pedestrian, no more, no less. If you have to code each SELECT manually, sure, the cumbersome gets really cumbersome.

  3. The entire problem can be eliminated by (a) going with 6NF over EAV and (b) implementing a catalogue, from which you can (c) generate all the basic SQL. Eliminates an entire class of errors as well.

  4. It is a common myth that Joins somehow have a cost. Totally false.

    • The join is implemented at compile time, there is nothing of substance to 'cost' CPU cycles.
    • The issue is the size of tables being joined, not the cost of the Join between those same tables.
    • Joining two tables with millions of rows each, on a correct PK⇢FK relation, each of which have the appropriate indices
      (Unique on the parent [PK] side; Unique on the Child side [PK=parent FK + something]
      is instantaneous
    • Where the Child index is not unique, but at least the leading columns are valid, it is slower; where there is no useful index, of course it is very slow.
    • None of it has to do with Join cost.
    • Where many rows are returned, the bottleneck will be the network and the disk layout; not the join processing.
  5. Therefore you can get as "complex" as you like, there is no cost, SQL can handle it.

I would be interested to know what are
the up and downsides of both methods.
I can just imagine for myself, but I
don't have the experience to confirm
this.

  1. 5NF (or 3NF for those who have not made the progression) is the easiest and best, in terms of implementation; ease of use (developers as well as users); and maintenance.

    • The drawback is, every time you add a column, you have to change the database structure (table DDL). That is fine is some cases, but not in most cases, due to change control in place, quite onerous.
    • Second, you have to change existing code (code handling the new column does not count, because that is an imperative): where good standards are implemented, that is minimised; where they are absent, the scope is unpredictable.
  2. EAV (which is what you have posted), allows columns to be added without DDL changes. That is the single reason people choose it. (code handling the new column does not count, because that is an imperative). If implemented well, it will not affect existing code; if not, it will.

  3. But you need EAV-capable developers.

    • When EAV is implemented badly, it is abominable, a worse mess than 5NF done badly, but not any worse than Unnormalised which is what most databases out there are (misrepresented as "denormalised for performance").
    • Of course, it is even more important (than in 5NF/3NF) to hold a strong Transaction context, because the columns are far more distributed.
    • Likewise, it is essential to retain Declarative Referential Integrity: the messes I have seen were due in large part to the developers removing DRI because it became "too hard to maintain", the result was, as you can imagine, one mother of a data heap with duplicate 3NF/5NF rows and columns all over the place. And inconsistent Null handling.
  4. There is no difference in performance, assuming that the server has been reasonably configured for the intended purpose. (Ok, there are specific optimisations that are possible only in 6NF, which are not possible in other NFs, but I think that is outside the scope of this thread.) And again, EAV done badly can cause unnecessary bottlenecks, no more so than Unnormalised.

  5. Of course, if you go with EAV, I am recommending more formality; buy the full quid; go with 6NF; implement a catalogue; utilities to produce SQL; Views; handle Missing Data consistently; eliminate Nulls altogether. This reduces your vulnerability to the quality of your developers; they can forget about the EAV/6NF esoteric issues, use Views, and concentrate on the app logic.

孤独患者 2024-10-05 20:26:25

在你的问题中,你至少同时提出了两个主要问题。这两个问题是 EAV 和 gen-spec。

首先,我们来谈谈EAV。您的最后一个表(object_id、field_id、value)本质上是一个 EAV。 EAV 有优点也有缺点。优点是该结构非常通用,可以容纳描述几乎任何主题的几乎任何数据体。这意味着您可以在没有数据分析和不了解主题的情况下继续设计和实现,并且不用担心错误的假设。缺点是在检索时,您必须进行在构建数据库之前跳过的数据分析,以便提出有意义的查询。这比单纯的检索效率要严重得多。但你也会遇到检索效率方面的严重问题。要了解这个陷阱只有两种方法:亲身经历它或从那些经历过它的人那里阅读有关它的内容。我推荐阅读。

其次,您有一个通用规格的机箱。您的表 (object_id, type_id) 捕获了 gen-spec(通用化-专业化)模式以及相关表。如果我必须在酒店和餐馆之间进行概括,我可能会将其称为“公共住宿”或“场所”。但我不确定我是否理解你的情况,而且你可能正在追求比这两个名字所暗示的更普遍的东西。毕竟,您已将“活动”纳入您的列表中,而活动在我看来并不是一种场所。

我已在之前的回复中向其他人推荐了有关 gen-spec 和关系模型的读物。
当两个表非常相似时,什么时候应该将它们组合起来?

但是我犹豫是否让您朝同一方向出发,因为我不清楚您是否想在构建数据库之前提出数据的关系模型。数据体的关系模型和相同数据的 EAV 模型几乎完全不一致。在我看来,在探索如何在数据的关系模型中表达基因规格之前,你必须做出这样的选择。

In your question, you have presented at least two major issues at the same time. Those two issues are E-A-V and gen-spec.

First, let's talk about E-A-V. Your last table (object_id, field_id, value) is essentially an E-A-V. There is an upside to E-A-V and a downside to E-A-V. The upside is that the structure is so generic that it can accomodate almost any body of data describing almost any subject matter. That means that you can proceed to design and implementation with no data analysis and no understanding of the subject matter, and not worry about wrong assumptions. The down side is that at retrieval time, you have to do the data analysis that you skipped over before building the data base, in order to come up with queries that mean anything. This is much more serious than just retrieval efficiency. But you are also going to have terrible problems with retrieval efficiency. There are only two ways to learn about this pitfall: live through it or read about it from those who have. I recommend the reading.

Second, you have a gen-spec case. Your table (object_id, type_id) captures a gen-spec (generalization-specialization) pattern, along with the related tables. If I had to generalize between hotels and restaurants, I might call it something like "public accomodations" or "venues". But I'm not sure I understand your case, and you may be driving for something even more general than those two names suggest. After all, you've included "events" in your list, and an event is not a type of venue in my mind.

I've referred other people to readings on gen-spec and the relational model in previous responses.
When two tables are very similar, when should they be combined?

But I hesitate to send you off in the same direction, because it's not clear to me that you want to come up with a relational model of the data before building your database. A relational model of a body of data and an E-A-V model of the same data are almost totally at odds with each other. It seems to me you have to make that choice before you even explore how to express gen-spec in the relational model of data.

染火枫林 2024-10-05 20:26:25

“抽象”方法更广为人知的名称是“标准化”,看起来像第三范式(3NF)。

另一种称为“非规范化”,并且可以是一个有效的性能选项......当您使用规范化方法遇到速度问题时,而不是以前。

The "abstract" approach is better known as "Normalization", looks like 3rd Normal Form (3NF).

The other one is called "Denormalized", and can be a valid performance option... when you've encountered speed issues using the Normalized approach, not before.

天冷不及心凉 2024-10-05 20:26:25

当您开始需要大量不同的实体时(甚至之前......),nosql 解决方案将比任何一种选择简单得多。
只需使用您需要的确切字段存储每个实体/记录即可。

{
   "id": 1,
   "type":"Restaurant",
   "name":"Messy Joe",
   "address":"1 Main St.",
   "tags":["asian","fusion","casual"]
}

When you start to require a large number of different entities (or even before...), a nosql solution would be vastly simpler than either choice.
Just store each entity/record with the exact fields you require.

{
   "id": 1,
   "type":"Restaurant",
   "name":"Messy Joe",
   "address":"1 Main St.",
   "tags":["asian","fusion","casual"]
}
递刀给你 2024-10-05 20:26:25

您如何用代码表示列表?我猜想 Listing 作为超类型,而 ShopRestuarant 等作为子类型?

假设是这样,这就是如何将子类型映射到关系数据库的情况。通常有三种选择:

  • 选项 1:每个子类型单个表,
    具有重复的共同属性
    每个表(名称、ID 等)。
  • 选项 2:所有对象的单个表(您的单个表方法)
  • 选项 3:超类型的表和每个子类型的表

没有普遍正确的解决方案。我通常倾向于从选项 3 开始;它提供了一个直观的工作结构,非常规范化并且可以轻松扩展。这意味着检索每个实例的单个连接 - 但 RDBMS 对于连接进行了很好的优化,因此在实践中它不会真正导致性能问题。

选项 2 对于查询来说性能更高(无联接),但如果其他表需要引用所有超类型实例(外键激增),则会导致问题。

乍一看,选项 1 似乎是性能最好的,但有两个注意事项:(1) 它不能适应变化。如果添加新的子类型(以及不同的属性),您将需要更改表结构并迁移它。 (2) 它的效率可能比看起来要低。由于表数量稀疏,因此某些数据库不能特别有效地存储它。因此,它的效率可能低于选项 1 - 因为查询引擎执行连接的速度比搜索臃肿的稀疏表空间的速度快。

选择哪一个实际上取决于了解问题的细节。我建议阅读一些有关选项的内容:这篇文章是一篇很好的文章开始的地方。

How do you have the listings represented in code? I'd guess Listing as a supertype, with Shop, Restuarant, etc. as subtypes?

Assuming so, this is a case of how to map subtypes to a relational database. There are generally three choices:

  • Option 1: single table per subtype,
    with common attributes repeated in
    each table (name, id, etc).
  • Option 2: single table for all objects (your single table approach)
  • Option 3: table for the supertype and one for each subtype

There's no universally correct solution. My preference is generally to start with option 3; it provides an intituitive structure to work with, is pretty well normalised and can easily be extended. It means a single join for retrieving each instance - but RDBMS are well optimised for doing joins so it doesn't really cause performance problems in practice.

Option 2 can be more performant for queries (no joins) but causes problems if other tables need to refer to all supertype instances (proliferation of foreign keys).

Option 1 appears at first sight to be the most performant, although 2 caveats: (1) It's not resilient to change. If you add a new subtype (and so different attributes) you'll need to change the table structure and migrate it. (2) It can be less efficient than it seems. Because the table population is sparse, some DBs don't store it particularly efficiently. As a consequence it can be less efficicent than option 1 - since the query engine can do joins faster than it can search bloated sparse table spaces.

Which to choose really comes down to knowing details of your problem. I'd suggest reading up a bit on the options: this article is a good place to start.

hth

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文