将图形存储在完全标准化的关系数据库中

发布于 2024-09-27 23:13:44 字数 1292 浏览 6 评论 0原文

目标

找到一个完美、灵活的模式,用于在关系数据库中存储许多不同类型的对象以及它们之间的各种链接


问题

EAV 是 RDBMS 正常限制的一种解决方法。

如果您要规范化 EAV 模式,那将是丑陋的。


想法

如果 EAV 被标准化,那就很难看了。

我们传统上手动维护这些架构的事实是否限制了它们的复杂性和功能?

但是如果以编程方式维护和查询它,那又有什么关系呢?


图表

如果您在 n 个不同的表中有 n 个不同的实体,为什么不让您的代码生成 n(n+1)/2 链接表以及它们之间的查询?这不会在规范化模式中产生真实的图吗?

在高度互连的数据库中,边的数量总是比顶点多得多。为什么不专注于创建正确的、规范化的顶点(n 个实体表)并让我们的代码维护边(n^x 链接表)?


结论

系统能否标准化 EAV 并维护由此产生的复杂模式?

复杂的图形可以存储在(并保持真实)关系数据库中吗?

我确信以前有人这样做过,但我从未见过。我缺少什么?


示例问题

存储印刷作品及其书目数据

  • 许多属性,它们可能不仅仅是字符串,而是整个对象。
  • 在图书馆世界中,不存在可以在没有极其复杂的模式的情况下“无损”存储数据的简单(和关系)模式。
  • 许多不同类型的关联和关联对象
    • 及其相关属性(可能差异很大)。
    • 以及他们之间的许多不同类型的关系。

问题

您想要解决什么问题?
-Piet

我正在寻找关系数据库系统中 EAV、图形和多态关系的规范化解决方案。

我不想成为在它投入生产后必须理解或维护它的人。
-Andrew

这种“传统维护”正是我所说的我们应该自动化的事情。这不是很繁重的工作吗?

Goal

Find a perfect, flexible schema for storing many different types of objects with a wide variety of links between them in a relational database.


Problem

EAV is a workaround to the normal confinements of a RDBMS.

If you were to normalize an EAV schema, it would be ugly.


Idea

If EAV was normalized, it would be ugly.

Does the fact that we traditionally maintain these schema by hand limit their complexity and power?

But if it was maintained and queried programmatically, what would it matter?


Graphs

If you have n different entities in n different tables, why not let your code generate n(n+1)/2 link tables and the queries between them? Would this not result in a true graph in a normalized schema?

In a highly interlinked database, there will always be exponentially more edges than vertices. Why not focus on creating proper, normalized verticles (n entity tables) and let our code maintain the edges (n^x link tables)?


Conclusion

Can a system normalize EAV and maintain the resulting complex schema?

Can complex graphs be stored in (and remain true to) relational databases?

I'm sure this has been done before, but I've never seen it. What am I missing?


Example problem

Storing printed works and their bibliographic data

  • Many properties which might be not just strings but whole objects.
  • In the library world, there is no simple (and relational) schema which can store data "losslessly" without extremely complex schemas.
  • Many different types of associations and associated objects
    • And their relevant properties (which can vary wildly).
    • And their many relationships, of different types, amongst themselves.

Questions

"What problem are you trying to solve?"
-Piet

I'm looking for a normalized solution to EAV, graphs, and polymorphic relationships in a relational database system.

"I would hate to be the guy who has to understand or maintain it after it's gone into production."
-Andrew

This "traditional maintenance" is the exact thing I'm saying we should be automating. Isn't it largely grunt work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

等待我真够勒 2024-10-04 23:13:44

由于您正在编辑问题,因此它必须处于活动状态。

是的,对于您描述的目的和用途,有更好的设计方法。

第一个问题是 EAV,它通常实施得非常糟糕。更准确地说,EAV人群众多,因此文献质量不高,并且标准得不到维护,因此失去了关系数据库的基本完整性和质量。这导致了许多有据可查的问题。

您应该考虑适当的学术替代方案。这保留了完整的关系完整性和能力。它被称为第六范式。 EAV实际上是6NF的一个子集,没有完全理解;更广为人知的 6NF 版本。

正确实现的 6NF 特别快,因为它存储列,而不是行。因此,您可以以这样的方式映射数据(图形系列、数据点),以便获得平坦的高速,而不管用于访问图形的向量如何。 (您可以消除比 5NF 更高阶的重复,但那是高级用途。)

“高度互连”根本不是问题。这就是关系数据库的本质。这里需要注意的是,它必须是真正标准化的,而不是一堆相互链接的平面文件。

自动化或代码生成不是问题。当然,如果你想要质量和可维护性,你需要扩展 SQL 目录,并确保它是表驱动的。

我对这些问题的回答提供了对该主题的全面处理。由于上下文和提出的论点,最后一篇特别长。
EAV-6NF 答案一
EAV-6NF 答案二
EAV-6NF 答案三

这也是值得的:
架构相关问题

Since you are editing the question, it must be active.

Yes, there are much better ways of designing this, for the purpose and use you describe.

The first issue is EAV, which is usually very badly implemented. More precisely, the EAV crowd, and therefore the literature is not of high quality, and standards are not maintained, therefore the basic integrity and quality of a Relational Database is lost. Which leads to the many well-documented problems.

You should consider the proper academically derived alternative. This retaiins full Relational integrity and capability. It is called Sixth Normal Form. EAV is in fact a subset of 6NF, without the full understanding; the more commonly known rendition of 6NF.

6NF implemented correctly is particularly fast, in that it stores columns, not rows. Therefore you can map your data (graph series, data points) in such a way, as to gain a flat high speed regardless of the vectors that you use to access the graphs. (You can eliminate duplication to a higher order than 5NF, but that is advanced use.)

"Highly-interlinked" is not a problem at all. That is the nature of a Relational Database. The caveat here is, it must be truly Normalised, not a inlerlinked bunch of flat files.

The automation or code generation is not a problem. Of course, you need to extend the SQL catalogue, and ensure it is table-driven, if you want quality and maintainability.

My answers to these questions provide a full treatment of the subject. The last one is particularly long due the the context and arguments raised.
EAV-6NF Answer One
EAV-6NF Answer Two
EAV-6NF Answer Three

And this one is worthwhile as well:
Schema-Related Problem

鸠魁 2024-10-04 23:13:44

您的想法肯定会创建一个完全灵活的模式,可以表示任何类型的对象图。我不想成为在它投入生产后必须理解或维护它的人。

精心设计的数据模式的一个好处是约束。我不仅指您可以定义的物理列约束,还指由整体结构施加的约束。有一组固定的显式关系,这提供了明确定义的路径可供遵循。

在您的场景中,从一个实体到另一个实体总是存在大量路径。有人怎么知道哪条路径是“正确”的路径。 “正确”的路径只是“开发人员选择填充的一组关系”。

想象一个具有这些关系的数据库。

客户<===>发票<===> InvoiceLineItem <====>;产品

如果我正在看这个,有人问我:“给我一份客户列表,并为每个客户提供他们购买的产品列表”,我就会知道如何编写查询。

但是,如果这是一个所有内容都指向其他所有内容的图表,我怎么知道哪条路径是“正确”的路径。是“Customer_Product”关系,“Customer_Invoice_Line_Item”到“Customer_Product”,还是“Customer_Invoice”到“Invoice_Product”,还是“Customer”到“Invoice”到“Invoice_Line_Item”到“SomeOtherTableIHaven'tEven'tEvenLookedAtYet”到“Product”?答案可能是“它应该是显而易见的”,但某些事情仅对一名开发人员来说是显而易见的,这是很常见的。

Your idea would certainly create a completely flexible schema that can represent any kind of object graph. I would hate to be the guy who has to understand or maintain it after it's gone into production.

One benefit in a well designed data schema is the constraints. I'm not just refering to the physical column constraints you can define, but the constraints imposed by the overall structure. There are a fixed set of explicit relationships, and this provides well defined paths to follow.

In your scenario, there would always be a large number of paths from one entity to another. How would somebody know which path was the "right" path. The "right" path will simply be "the set of relationships the developer chose to populate".

Imagine a database that has these relationships.

Customer <===> Invoice <===> InvoiceLineItem <====> Product

If I'm looking at this, and somebody asks me: "Give me a list of customers and for each customer a list of product's they've bought", I would know how to write the query.

But, if this was a graph where everything pointed to everything else, how will I know which path is the "right" path. Will it be the "Customer_Product" relationship, the "Customer_Invoice_Line_Item" to "Customer_Product", or "Customer_Invoice" to "Invoice_Product", or "Customer" to "Invoice" to "Invoice_Line_Item" to "SomeOtherTableIHaven'tEvenLookedAtYet" to "Product"? The answer can be "It should be obvious", but it is very common for something to be obvious to one developer only.

离不开的别离 2024-10-04 23:13:44

为什么不让你的代码生成
n(n+1)/2“链接”表和查询
他们之间?

每当我在计算机科学中看到答案是“大约 n 平方”时,我立即认为答案是错误的。 :-)

但更现实的是,当“n”大小适中时,链接表的数量会变得非常非常快。如此之多,以至于你不能说这种方法可以代表通用解决方案,IMO。

但我真正的反对意见是——你提出的方法不是一个可行的工程解决方案。工程就是要进行权衡,而这种方法为了通用性而进行了大量的权衡。例如,通过使用您的方法而不是经过验证的“传统”数据库设计,您会失去以下结果:

  • 您失去了拥有可发现模式的能力 - 表的数量很快就失去了控制,任何人都在查看您的数据库。表设计无法知道这些关系是什么。
  • 除了最基本的引用类型之外,数据库几乎无法强制执行任何类型的数据完整性 - 使用数据库的所有代码都必须小心,不要违反规则,否则就会出现数据损坏。
  • 您最终可能会拥有大量表,这些表对业务领域中并不真正存在的关系进行建模。当您使用“链接”表时,您本质上是在建模多对多关系,这种关系在现实世界中可能存在也可能不存在。
  • 您可能会损失大量速度,并在使用的存储方面遭受非常大的损失。通过直接引用“子”实体中的“父”实体来建模 1:N 关系要高效得多。

why not let your code generate
n(n+1)/2 "link" tables and the queries
between them?

Any time I see anything in Computer Science where the answer comes out to be "about n-squared", I immediately think that the answer is wrong. :-)

But more realistically, when "n" gets to be a moderate size, the number of link-tables gets to be enormous, really, really quick. So much so that you can't say that this methodology could represent a general-purpose solution, IMO.

But here's my real objection -- your proposed methodology isn't a viable engineering solution. Engineering is all about making tradeoffs, and this method trades a LOT for generality's sake. For example, here's what you lose by using your method over a tried-and-true "traditional" database design:

  • You lose the ability to have a discoverable schema -- the number of tables gets out of hand so quickly, anyone looking at your table design can't know what the relationships are.
  • Almost no kind of data integrity can be enforced by the database other than the most basic referential kind -- all code which uses the database must be careful not to break the rules, or you have data corruption.
  • You end up potentially having a very large number of tables which model relationships that don't really exist in your business domain. When you use a "link" table, you are essentially modeling a many-to-many relationship, which may or may not exist in the real world.
  • You potentially lose enormous amounts of speed, and incur a very large penalty in terms of storage used. It's far more efficient to model 1:N relationships by referring to the "parent" entity in the "child" entity directly.
谜泪 2024-10-04 23:13:44

这完全取决于图表的定义。

在关系数据库或其他数据库中存储图形的唯一“真正”方法是一个简单的邻接列表(或其变体之一)。其他一切都是该技术的衍生、专业化或优化,并且取决于问题领域的知识。

您在问题中描述的方法本质上是将这个通用邻接列表反规范化或重新规范化为“类型化”邻接列表(或链接表)的数量,这可能更合适,也可能不更合适,具体取决于您的问题。

我确信以前有人这样做过,但我从未见过。我错过了什么?

您可能不会遗漏任何内容:实际上很少需要存储这样的通用图表。您想解决什么问题?

附录

在高度互连的数据库中,边的数量总是比顶点的数量多得多。为什么不专注于创建正确的、规范化的顶点(表)并让我们的代码维护边缘?

我认为这比您想象的要普遍得多。我主要熟悉Python,但所有可用的主要ORM/RDBMS工具包(SQLAlchemy、Django、SQLObject等)都支持自动维护多对多链接表作为标准功能。

This depends wholly on the definition of your graph.

The only "true" way to store a graph, in a relation database or otherwise, is a simple adjacency list (or one of its variants). Everything else is a derivative, specialization, or optimization of this technique, and depends on knowledge of the problem domain.

The method you describe in your question is essentially de- or re-normalizing this universal adjacency list into number of "typed" adjacency lists (or link tables), which may or may not be more appropriate, depending on your problem.

I'm sure this has been done before, but I've never seen it. What am I missing?

You're probably not missing anything: it's actually extremely rare to need to store a general graph like this. What problem are you trying to solve?

Addendum

In a highly interlinked database, there will always be exponentially more edges than vertices. Why not focus on creating proper, normalized verticles (tables) and let our code maintain the edges?

I think this is much more common than you might think. I'm mainly familiar with Python, but all the major ORMs / RDBMS toolkits available for it (SQLAlchemy, Django, SQLObject, ...) support automatic maintenance of many-to-many link tables as a standard feature.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文