当前位置：文江博客话题详情

将图数据结构映射到关系数据库有意义吗？

发布于 2024-10-09 17:53:46 字数 117 浏览 9 评论 0原文

具体来说是一个多重图。

一些同事建议这样做，我完全感到困惑。

对此有什么见解吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我纯我任性 2024-10-16 17:53:46

在数据库中存储图形非常简单：您有一个节点表和一个边表，它充当节点表与其自身之间的多对多关系表。像这样：

create table node (
  id integer primary key
);

create table edge (
  start_id integer references node,
  end_id integer references node,
  primary key (start_id, end_id)
);

但是，以这种方式存储图形有几个棘手的问题。

首先，该方案中的边缘是自然定向的——起点和终点是不同的。如果您的边是无向的，那么您要么必须小心编写查询，要么在表中为每条边存储两个条目，一个方向一个（然后小心编写查询！）。如果您存储单个边，我建议规范化存储的形式 - 也许始终将 ID 最低的节点视为开始（并向表添加检查约束以强制执行此操作）。您可以通过不让边引用节点，而是在它们之间建立一个连接表来获得真正的无序表示，但这对我来说似乎不是一个好主意。

其次，上面的模式无法表示多重图。您可以轻松地扩展它；如果给定的一对节点之间的边无法区分，最简单的方法是向每个边行添加一个计数，表示所引用的节点之间有多少条边。如果它们是可区分的，那么您将需要在节点表中添加一些内容以允许区分它们 - 自动生成的边缘 ID 可能是最简单的事情。

然而，即使整理好了存储，您仍然会遇到使用图表的问题。如果您想对内存中的对象进行所有处理，并且数据库纯粹用于存储，那么没问题。但如果你想对数据库中的图进行查询，那么你必须弄清楚如何在 SQL 中进行查询，SQL 没有任何内置的图支持，而且其基本操作也不容易适应使用图表。这是可以做到的，特别是如果您有一个支持递归 SQL 的数据库（PostgreSQL、Firebird、一些专有数据库），但这需要一些思考。如果您想这样做，我的建议是发布有关特定查询的进一步问题。

It's pretty straightforward to store a graph in a database: you have a table for nodes, and a table for edges, which acts as a many-to-many relationship table between the nodes table and itself. Like this:

create table node (
  id integer primary key
);

create table edge (
  start_id integer references node,
  end_id integer references node,
  primary key (start_id, end_id)
);

However, there are a couple of sticky points about storing a graph this way.

Firstly, the edges in this scheme are naturally directed - the start and end are distinct. If your edges are undirected, then you will either have to be careful in writing queries, or store two entries in the table for each edge, one in either direction (and then be careful writing queries!). If you store a single edge, i would suggest normalising the stored form - perhaps always consider the node with the lowest ID to be the start (and add a check constraint to the table to enforce this). You could have a genuinely unordered representation by not having the edges refer to the nodes, but rather having a join table between them, but that doesn't seem like a great idea to me.

Secondly, the schema above has no way to represent a multigraph. You can extend it easily enough to do so; if edges between a given pair of nodes are indistinguishable, the simplest thing would be to add a count to each edge row, saying how many edges there are between the referred-to nodes. If they are distinguishable, then you will need to add something to the node table to allow them to be distinguished - an autogenerated edge ID might be the simplest thing.

However, even having sorted out the storage, you have the problem of working with the graph. If you want to do all of your processing on objects in memory, and the database is purely for storage, then no problem. But if you want to do queries on the graph in the database, then you'll have to figure out how to do them in SQL, which doesn't have any inbuilt support for graphs, and whose basic operations aren't easily adapted to work with graphs. It can be done, especially if you have a database with recursive SQL support (PostgreSQL, Firebird, some of the proprietary databases), but it takes some thought. If you want to do this, my suggestion would be to post further questions about the specific queries.

回复收藏 0 原文

余厌 2024-10-16 17:53:46

这是一种可以接受的方法。您需要考虑如何操纵该信息。您很可能需要一种与数据库分开的语言来执行此类数据所暗示的与图相关的计算。 Skiena 的算法设计手册有一个广泛的部分图数据结构及其操作。

在不考虑您可能执行什么类型的查询的情况下，从两个表顶点和边开始。顶点很简单，一个标识符和一个名称。考虑到多重图，边是复杂的。边应该由两个顶点（即外键）和一些附加信息的组合来唯一标识。附加信息取决于您要解决的问题。例如，航班信息、出发和到达时间以及航空公司。此外，您需要确定边缘是否是有向的（即单向），并跟踪该信息是否也是如此。

根据计算的不同，您最终可能会遇到一个可以通过某种人工智能/机器学习算法更好地解决的问题。例如，最佳航班。集体智能编程一书为此目的提供了一些有用的算法。但数据的保存位置不会改变算法本身。