数据库的层次结构和网络模型到底存在什么问题?

发布于 2024-09-13 16:15:36 字数 261 浏览 0 评论 0原文

在 EF Codd 于 1970 年发表论文“大型共享数据库的数据关系模型”之前,层次结构网络 是数据库的两个重要模型。

他们到底出了什么问题,才没有获胜呢?

Before E F Codd published his paper "A Relational Model of Data for Large Shared Data Banks" in 1970, hierarchal and network were the two prominent models of the database.

What exactly was wrong with them that they did not prevail?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

桃气十足 2024-09-20 16:15:36

到目前为止的答案涵盖了网络和分层模型最终被关系模型(包括 SQL 数据库系统)取代的许多实际原因。 Codd 1970 年的论文详细解释了为什么需要新模型。这是一本很棒的书。事实上,在 Codd 之前,“数据模型”这个术语实际上是闻所未闻的。因此,他创造了“层次模型”和“网络模型”这两个术语来描述在没有考虑精确模型的情况下构建的数据库系统。

层次模型和网络模型可以统称为“图模型”。数据图模型的基本特征是通过声明数据项的位置来引用数据项。如果您了解指针,您就了解了图模型的所有基础知识。

数据图模型有两个非常强大的优势。首先是程序员很容易掌握。新手程序员要经历一定的学习曲线才能掌握指针,但一旦他们完成了这一点,他们就可以轻松地理解图形数据。

第二个优点是指针非常快,只要在写入数据时预期要遵循的导航路径即可。

使用指针来识别数据有几个缺点。一是数据被“固定”。也就是说,当要对数据进行混洗时,必须定位并更新引用该数据的所有指针。或者必须在旧位置保留“转发地址”。如果您曾经在网络上单击过一个一直有效的按钮,却遇到了臭名昭著的“找不到页面”错误,那么您可能遇到过在不更新对固定数据的引用的情况下打乱固定数据的陷阱。

第二个问题是,沿着计划外的访问路径导航数据可能会带来彻底的灾难,无论是在性能方面还是在逻辑正确性方面。这是图数据库的临时报告如此困难的原因之一。

图数据的第三个缺点是图数据中可能存在给定数据中不固有的逻辑关系。关系模型的根本优点是所有关系都是数据本身固有的。这是一个优势的原因很复杂。我再次请您参阅 1970 年的论文。

在你我可能使用的所有“关系型 DBMS”中,在使用数据识别数据和使用指针定位数据之间都存在一座桥梁。它称为索引。索引与两项相关:索引键(表中的一列或多列)和指针(定位包含索引键的行)。我掩盖了有关索引的所有细节。

无论如何,索引允许 SQL 引擎将指明正在寻找什么数据的查询转换为在何处寻找该数据。索引指向的数据仍然可以进行混洗,但索引必须作为该过程的一部分进行重建。

这是一个概述。

The answers so far cover a lot of the practical reasons why the network and hierarchical models were eventually displaced by the relational model (including SQL database systems). Codd's 1970 paper explains why a new model is needed, in detail. It's a great read. Indeed, before Codd, the term "data model" was practically unheard of. So he coined the terms "hierachical model" and "network model" in order to describe database systems that had been constructed with no precise model in mind.

The hierarchical and network models can be collected into a general term, called the "graph model". The essential feature of the graph model of data is that data items are referenced by stating their location. If you understand pointers, you understand everything fundamental about the graph model.

There are two very powerful advantages to the graph model of data. The first is that it's very easy for programmers to grasp. Novice programmers go through a certain learning curve coming to grips with pointers, but once they've done that, they are ready to understand graph data easily.

The second advantage is that pointers are extremely fast, provided that the navigation path to be followed was anticipated at the time the data was written.

There are several disadvantages to using pointers to identify data. One is that the data becomes "pinned". That is, when the data is to be shuffled all of the pointers that reference the data have to be located and updated. Or a "forwarding address" has to be left at the old location. If you've ever been in the web and clicked on a button that has always worked, only to be greeted with the infamous "page not found" error, you've probably come across the pitfall of shuffling pinned data without updating references to it.

A second one is that navigating data along unplanned access paths can be downright disastrous, both in terms of performance, and in terms of logical correctness. This is one of the reasons why ad hoc reporting is so difficult with graph databases.

A third drawback of graph data is that there may be logical relationships in the graph data that are not inherent in the data as given. The fundamental advantage of the relational model is that all the relationships are inherent in the data itself. The reason why this is an advantage is complex. I refer you again to the 1970 paper.

In all the "relational DBMSes" that you and I are likely to use, there is a bridge between using data to identify data and using pointers to locate data. It's called an index. The index relates two items: an index key (one or more columns from a table), and a pointer (that locates a row containing the index key). I'm glossing over all the details about indexes.

Anyway an index allows the SQL engine to translate a query that states what data is being sought into where to look for that data. Data that is pointed to by indexes can still be shuffled, but the index has to be rebuilt as part of the process.

This is an overview.

找回味觉 2024-09-20 16:15:36

基本问题是无法支持即席查询。这些数据库非常快,但前提是您按照原始设计者期望的方式查询它们。如果您想出另一种类型的查询,它们可能会非常慢,或者最坏的情况是需要更改数据库架构以支持该查询。

事实上,我在 80 年代就研究过这两种类型(Codasyl 和 Nomad/2),并且当 SQL 变得更广泛可用时我感到非常高兴。

The basic problem was the inability to support ad hoc queries. These databases were very fast, but only if you queried them in the ways their original designers expected you to. If you came up with another type of query, they could either be very slow or at worst require that the database schema be changed to support the query.

I actually worked on both kinds of these in the 80s (Codasyl and Nomad/2) and was very glad when SQL became more widely available.

甜妞爱困 2024-09-20 16:15:36
  1. 没有简单的方法来生成报告
  2. 僵化的模式
  3. 网络模型与层次结构太相似,因此它并没有真正从使用网络模型中获得好处。记录可以有多个父项,而不是像分层模型中那样具有单个父项。所以一切仍然是从父子关系的角度来考虑的。

这些模型的优点在于性能,我认为这就是 RDBMS 花了这么长时间才占据主导地位的原因(它们一开始的表现非常糟糕)。

如果您想深入了解历史采访Charles Bachman 强烈推荐阅读!他也是一个有趣的人,实际上他为 RDBMS 编写了第一个自动化数据建模工具!

顺便说一句,分层/网络数据库至少在大型机设置中仍在使用。

  1. no easy way to generate reports
  2. rigid schemas
  3. the network model was too similar to the hierarchical, so it didn't really get the benefits from using a network model. instead of having a single parent like in the hierarchical model a record could have multiple parents. so everything is still thought of in terms of parent/child relationships.

What was good in these models was performance, and that's what I think made it take so long for RDBMS to become dominant (they performed really bad in the beginning).

If you want to dig deeper into history this interview with Charles Bachman is highly recommended reading! He's an interesting person as well, actually he coded the first automated data modeling tool for RDBMS!

BTW, hierarchical/network databases are still in use at least in mainframe settings.

被翻牌 2024-09-20 16:15:36

导航。分层模型和网络模型取决于数据库中的导航结构(也称为指针/链接/图形)。因此,它们的功能受到这些结构设计的限制。相比之下,关系模型“提供了一种仅使用其自然结构来描述数据的方法,即不为机器表示目的叠加任何附加结构。”[1]

具有讽刺意味的是,当前数据库中的“NOSQL”趋势也包含了导航结构,经常将它们(在我看来是相当错误的)视为解决 SQL 数据库的局限性的一个很好的解决方案。

[1]“数据大型共享数据库的关系模型”EF CODD,1970

Navigation. The hierarchical and network models depend on navigational structures (aka pointers / links / graphs) in the database. Their functionality is therefore constrained by the design of those structures. In contrast, the relational model "provides a means of describing data with its natural structure only--that is, without superimposing any additional structure for machine representation purposes."[1]

Ironically, the current "NOSQL" trend in databases also embraces navigational structures, often viewing them (quite mistakenly in my view) as a good solution to the perceived limitations of SQL databases.

[1] "A Relational Model of Data Large Shared Data Banks" E. F. CODD, 1970

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文