为什么主流的 DBMS 没有图形功能?
关系数据库经常用于存储各种类型的图(树、有向图、无向图……)。
那么为什么主要的 DBMS(Microsoft、MySql、Oracle、PostgreSQL、SqlLite,仅按字母顺序仅举几例)都没有包含将关系视为图形的库支持?
一些理想的功能,例如:
- 约束检查(连通性、非循环性、平面性……)
- 常用函数(最短路径、最小生成树、传递闭包、最大流/最小割、派系检测、哈密顿/欧拉循环。 ..)
- 提高上述任何性能所需的辅助数据结构
在数据库之外构建对其中一些事物的支持很复杂,因为(除其他原因外):
- 它本质上很复杂(库在这里提供帮助)
- 简短的答案通常得到许多支持数据:运行最短路径算法的外部客户端要么需要与数据库非常“闲聊”,要么需要检索比所需数量大得多的数据;任何一种选择都不利于网络
- 当完整性依赖于图论约束时维护完整性需要访问所有建议的更新,因此触发器以及从触发器访问现有图形库在许多系统中都很复杂
- DBMS 存储管理器和优化器是独一无二的定位于解决辅助数据结构的问题,就像它们对索引所做的那样
这不是一个反问句,我实际上想知道是否有有趣的技术(或历史)原因。
Relational databases are frequently used to store graphs in all their many flavors (trees, directed graphs, undirected graphs, ...).
Why then do none of the major DBMSs (Microsoft, MySql, Oracle, PostgreSQL, SqlLite, just to name a few in alphabetical order) include library support for treating relations as graphs?
Some desirable features, by way of example:
- Constraint checking (connectedness, acyclicity, planarity, ...)
- Commonly needed functions (shortest path, minimum spanning tree, transitive closure, max flow/min cut, clique detection, Hamiltonian/Eulerian cycles ...)
- Auxiliary data structures needed to improve performance for any of the above
Building support for some of these things outside the database is complicated because (among other reasons):
- It's inherently complicated (libraries help here)
- Short answers are often supported by lots of data: an external client running a shortest path algorithm would need to either be very "chatty" with the database or would need to retrieve a much-larger-than-needed amount of data; either choice is bad for the network
- Maintaining integrity when integrity depends on a graph-theoretic constraint requires access to all proposed updates, hence a trigger, and access to existing graph libraries from triggers is complicated in many systems
- The DBMS storage manager and optimizer are uniquely positioned to address the question of auxiliary data structures, as they do with indexes
This isn't a rhetorical question, I actually want to know if there are interesting technical (or historical) reasons.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我曾在一个研究小组工作过,对开发 RDF(S) 数据库感兴趣数据,基本上是标记图,或三元组 [主语,谓语,宾语],基本上是图边:[sourceNode,edgeLabel,targetNode]。
为了理解问题的难度,要问的问题是:您将为标记图构建什么样的索引?你必须利用常见的“属性”(每个“谓词”是主语的一个属性,具有宾语的值),并相应地索引边,这样你就可以快速找到是否“有Person 上的一条名为“hasAge”的边,其值大于 18"。
为了便于说明,这里有一个简单的方法,它是无模式的(并且与传统数据库研究的方向完全相反,传统数据库研究一致认为拥有模式好)。它完全忽略任何架构信息(本文提供了有用的上下文)。只需将所有内容存储在三个大表中(s:主语,p:谓语,o:宾语):
这三个足以回答任何有效评估任何带有(至多)一个主语、(至多)一个谓语和(至多)一个宾语的查询(即形式为
(s, *, *)
,(* , p, *)
,(*, *, o)
,(s, p, *)
,(s, *, o) ,<代码>(*,p,o),<代码>(s,p,o))。复杂查询虽然由许多“路径表达式”组成(即,您描述可以找到满足某些条件的某些路径的数据),每个查询都被转换为这些(大!)表之一上的自联接,这不是一切都那么高效,这是一个问题。
这就是口袋里的一个简单的图形数据库。 :)
总之,这是一个活跃的研究领域。我不了解当前的技术水平,但我见过像 AllegroGraph< /a> 和其他声称非常好的结果。
I've worked in a research group, interested among others in delevoping a database for RDF(S) data, which is basically labeled graphs, or triples [subject, predicate, object], which are basically graph edges: [sourceNode, edgeLabel, targetNode].
The question to ask, to appreciate the hardness of the problem: What kind of indices are you going to build for a labeled graph? You have to take advantage of common "properties" (each "predicate" is a property of the subject, with the value of object), and index edges accordingly, so you can quickly find if "there is an edge called 'hasAge' on a Person with value greater than 18".
For illustration, here is a simple approach, which is schema-oblivious (and goes quite to the opposite direction of traditional database research which quite unanimously agrees that schemata are good to have). It completely ignores any schema information (this paper provides useful context). Just store everything in three big tables (s: subject, p: predicate, o: object):
These three suffice to answer any efficiently evaluate any query with (at most) a subject, (at most) a predicate, and (at most) an object (i.e. queries of the form
(s, *, *)
,(*, p, *)
,(*, *, o)
,(s, p, *)
,(s, *, o)
,(*, p, o)
,(s, p, o)
). Complex queries though consist of many "path expressions" (i.e. you describe data for which you can find certain paths satisfying some criteria), each of which is translated to a self-join on one of these (big!) tables, which is not all that efficient, which is a problem.There, that's a simple graph database in a pocket. :)
Concluding, this is the field of active research. I'm not up to date with the current state of art, but I've seen products like AllegroGraph and others that claim very good results.
Oracle 支持图形功能 (Oracle Locator/Oracle Spatial) 和语义 Web 功能。
Oracle has support for graph functionality (Oracle Locator/Oracle Spatial) and semantic web functionality.
我怀疑你的问题包含了它自己答案的开始。
对于通用数据库来说,您列出的常用功能根本不是常用的。是的,图形操作肯定需要它们,但很少用于客户计费等。当然,关系数据库可以在表中存储图形,但图形操作超出了我见过的任何版本的 SQL 的能力。
您编写在数据库之外构建对其中一些内容的支持很复杂。确实如此,这就是为什么我们都得到如此高的薪水。但是将这些东西的支持构建到数据库中也会同样复杂,不是吗?
I suspect that your question contains the beginnings of its own answer.
The commonly needed functions you list are not, for general purpose databases, commonly needed at all. Yes they are certainly needed for graph operations, but rarely for, say, customer billing. Of course, a relational database can store graphs in tables but the graph operations are beyond the powers of any version of SQL I've seen.
You write building support for some of these things outside the database is complicated. True that and it's why we all get paid so much. But it would be just as complicated to build support for those things into the database wouldn't it ?