数据库的层次结构和网络模型到底存在什么问题？

发布于 2024-09-13 16:15:36 字数 261 浏览 0 评论 0原文

在 EF Codd 于 1970 年发表论文“大型共享数据库的数据关系模型”之前，层次结构和网络是数据库的两个重要模型。

他们到底出了什么问题，才没有获胜呢？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

桃气十足 2024-09-20 16:15:36

到目前为止的答案涵盖了网络和分层模型最终被关系模型（包括 SQL 数据库系统）取代的许多实际原因。 Codd 1970 年的论文详细解释了为什么需要新模型。这是一本很棒的书。事实上，在 Codd 之前，“数据模型”这个术语实际上是闻所未闻的。因此，他创造了“层次模型”和“网络模型”这两个术语来描述在没有考虑精确模型的情况下构建的数据库系统。

层次模型和网络模型可以统称为“图模型”。数据图模型的基本特征是通过声明数据项的位置来引用数据项。如果您了解指针，您就了解了图模型的所有基础知识。

数据图模型有两个非常强大的优势。首先是程序员很容易掌握。新手程序员要经历一定的学习曲线才能掌握指针，但一旦他们完成了这一点，他们就可以轻松地理解图形数据。

第二个优点是指针非常快，只要在写入数据时预期要遵循的导航路径即可。

使用指针来识别数据有几个缺点。一是数据被“固定”。也就是说，当要对数据进行混洗时，必须定位并更新引用该数据的所有指针。或者必须在旧位置保留“转发地址”。如果您曾经在网络上单击过一个一直有效的按钮，却遇到了臭名昭著的“找不到页面”错误，那么您可能遇到过在不更新对固定数据的引用的情况下打乱固定数据的陷阱。

第二个问题是，沿着计划外的访问路径导航数据可能会带来彻底的灾难，无论是在性能方面还是在逻辑正确性方面。这是图数据库的临时报告如此困难的原因之一。

图数据的第三个缺点是图数据中可能存在给定数据中不固有的逻辑关系。关系模型的根本优点是所有关系都是数据本身固有的。这是一个优势的原因很复杂。我再次请您参阅 1970 年的论文。

在你我可能使用的所有“关系型 DBMS”中，在使用数据识别数据和使用指针定位数据之间都存在一座桥梁。它称为索引。索引与两项相关：索引键（表中的一列或多列）和指针（定位包含索引键的行）。我掩盖了有关索引的所有细节。

无论如何，索引允许 SQL 引擎将指明正在寻找什么数据的查询转换为在何处寻找该数据。索引指向的数据仍然可以进行混洗，但索引必须作为该过程的一部分进行重建。

这是一个概述。

The answers so far cover a lot of the practical reasons why the network and hierarchical models were eventually displaced by the relational model (including SQL database systems). Codd's 1970 paper explains why a new model is needed, in detail. It's a great read. Indeed, before Codd, the term "data model" was practically unheard of. So he coined the terms "hierachical model" and "network model" in order to describe database systems that had been constructed with no precise model in mind.

The hierarchical and network models can be collected into a general term, called the "graph model". The essential feature of the graph model of data is that data items are referenced by stating their location. If you understand pointers, you understand everything fundamental about the graph model.

There are two very powerful advantages to the graph model of data. The first is that it's very easy for programmers to grasp. Novice programmers go through a certain learning curve coming to grips with pointers, but once they've done that, they are ready to understand graph data easily.

The second advantage is that pointers are extremely fast, provided that the navigation path to be followed was anticipated at the time the data was written.

There are several disadvantages to using pointers to identify data. One is that the data becomes "pinned". That is, when the data is to be shuffled all of the pointers that reference the data have to be located and updated. Or a "forwarding address" has to be left at the old location. If you've ever been in the web and clicked on a button that has always worked, only to be greeted with the infamous "page not found" error, you've probably come across the pitfall of shuffling pinned data without updating references to it.

A second one is that navigating data along unplanned access paths can be downright disastrous, both in terms of performance, and in terms of logical correctness. This is one of the reasons why ad hoc reporting is so difficult with graph databases.

A third drawback of graph data is that there may be logical relationships in the graph data that are not inherent in the data as given. The fundamental advantage of the relational model is that all the relationships are inherent in the data itself. The reason why this is an advantage is complex. I refer you again to the 1970 paper.

In all the "relational DBMSes" that you and I are likely to use, there is a bridge between using data to identify data and using pointers to locate data. It's called an index. The index relates two items: an index key (one or more columns from a table), and a pointer (that locates a row containing the index key). I'm glossing over all the details about indexes.

Anyway an index allows the SQL engine to translate a query that states what data is being sought into where to look for that data. Data that is pointed to by indexes can still be shuffled, but the index has to be rebuilt as part of the process.

This is an overview.

回复收藏 0 原文