将图数据结构映射到关系数据库有意义吗?

发布于 2024-10-09 17:53:46 字数 117 浏览 9 评论 0原文

具体来说是一个多重图

一些同事建议这样做,我完全感到困惑。

对此有什么见解吗?

Specifically a Multigraph.

Some colleague suggested this and I'm completely baffled.

Any insights on this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

我纯我任性 2024-10-16 17:53:46

在数据库中存储图形非常简单:您有一个节点表和一个边表,它充当节点表与其自身之间的多对多关系表。像这样:

create table node (
  id integer primary key
);

create table edge (
  start_id integer references node,
  end_id integer references node,
  primary key (start_id, end_id)
);

但是,以这种方式存储图形有几个棘手的问题。

首先,该方案中的边缘是自然定向的——起点和终点是不同的。如果您的边是无向的,那么您要么必须小心编写查询,要么在表中为每条边存储两个条目,一个方向一个(然后小心编写查询!)。如果您存储单个边,我建议规范化存储的形式 - 也许始终将 ID 最低的节点视为开始(并向表添加检查约束以强制执行此操作)。您可以通过不让边引用节点,而是在它们之间建立一个连接表来获得真正的无序表示,但这对我来说似乎不是一个好主意。

其次,上面的模式无法表示多重图。您可以轻松地扩展它;如果给定的一对节点之间的边无法区分,最简单的方法是向每个边行添加一个计数,表示所引用的节点之间有多少条边。如果它们是可区分的,那么您将需要在节点表中添加一些内容以允许区分它们 - 自动生成的边缘 ID 可能是最简单的事情。

然而,即使整理好了存储,您仍然会遇到使用图表的问题。如果您想对内存中的对象进行所有处理,并且数据库纯粹用于存储,那么没问题。但如果你想对数据库中的图进行查询,那么你必须弄清楚如何在 SQL 中进行查询,SQL 没有任何内置的图支持,而且其基本操作也不容易适应使用图表。这是可以做到的,特别是如果您有一个支持递归 SQL 的数据库(PostgreSQL、Firebird、一些专有数据库),但这需要一些思考。如果您想这样做,我的建议是发布有关特定查询的进一步问题。

It's pretty straightforward to store a graph in a database: you have a table for nodes, and a table for edges, which acts as a many-to-many relationship table between the nodes table and itself. Like this:

create table node (
  id integer primary key
);

create table edge (
  start_id integer references node,
  end_id integer references node,
  primary key (start_id, end_id)
);

However, there are a couple of sticky points about storing a graph this way.

Firstly, the edges in this scheme are naturally directed - the start and end are distinct. If your edges are undirected, then you will either have to be careful in writing queries, or store two entries in the table for each edge, one in either direction (and then be careful writing queries!). If you store a single edge, i would suggest normalising the stored form - perhaps always consider the node with the lowest ID to be the start (and add a check constraint to the table to enforce this). You could have a genuinely unordered representation by not having the edges refer to the nodes, but rather having a join table between them, but that doesn't seem like a great idea to me.

Secondly, the schema above has no way to represent a multigraph. You can extend it easily enough to do so; if edges between a given pair of nodes are indistinguishable, the simplest thing would be to add a count to each edge row, saying how many edges there are between the referred-to nodes. If they are distinguishable, then you will need to add something to the node table to allow them to be distinguished - an autogenerated edge ID might be the simplest thing.

However, even having sorted out the storage, you have the problem of working with the graph. If you want to do all of your processing on objects in memory, and the database is purely for storage, then no problem. But if you want to do queries on the graph in the database, then you'll have to figure out how to do them in SQL, which doesn't have any inbuilt support for graphs, and whose basic operations aren't easily adapted to work with graphs. It can be done, especially if you have a database with recursive SQL support (PostgreSQL, Firebird, some of the proprietary databases), but it takes some thought. If you want to do this, my suggestion would be to post further questions about the specific queries.

余厌 2024-10-16 17:53:46

这是一种可以接受的方法。您需要考虑如何操纵该信息。您很可能需要一种与数据库分开的语言来执行此类数据所暗示的与图相关的计算。 Skiena 的算法设计手册有一个广泛的部分图数据结构及其操作。

在不考虑您可能执行什么类型的查询的情况下,从两个表顶点开始。顶点很简单,一个标识符和一个名称。考虑到多重图,边是复杂的。边应该由两个顶点(即外键)和一些附加信息的组合来唯一标识。附加信息取决于您要解决的问题。例如,航班信息、出发和到达时间以及航空公司。此外,您需要确定边缘是否是有向的(即单向),并跟踪该信息是否也是如此。

根据计算的不同,您最终可能会遇到一个可以通过某种人工智能/机器学习算法更好地解决的问题。例如,最佳航班。 集体智能编程一书为此目的提供了一些有用的算法。但数据的保存位置不会改变算法本身。

It's an acceptable approach. You need to consider how that information will be manipulated. More than likely you'll need a language separate from your database to do the kinds graph related computations this type of data implies. Skiena's Algorithm Design Manual has an extensive section graph data structures and their manipulation.

Without considering what types of queries you might execute, start with two tables vertices and edges. Vertices are simple, an identifier and a name. Edges are complex given the multigraph. Edges should be uniquely identified by a combination two vertices (i.e. foreign keys) and some additional information. The additional information is dependent on the problem you're solving. For instance, if flight information, the departure and arrival times and airline. Furthermore you'll need to decide if the edge is directed (i.e. one way) or not and keep track if that information as well.

Depending on the computation you may end up with a problem that's better solved with some sort of artificial intelligence / machine learning algorithm. For instance, optimal flights. The book Programming Collective Intelligence has some useful algorithms for this purpose. But where the data is kept doesn't change the algorithm itself.

林空鹿饮溪 2024-10-16 17:53:46

嗯,信息必须存储在某个地方,关系数据库不是一个坏主意。

它只是一个多对多关系、一个节点列表表和一个边/连接列表表。

Well, the information has to be stored somewhere, a relational database isn't a bad idea.

It would just be a many-to-many relationship, a table of a list of nodes, and table of a list of edges/connections.

暖树树初阳… 2024-10-16 17:53:46

考虑 Facebook 如何在其数据库中实现社交图谱。他们可能有一张桌子供人们使用,另一张桌子供友谊使用。 Friendships 表至少有两列,每一列都是 people 表的外键。

由于友谊是对称的(在 Facebook 上),他们可能会确保第一个外键的 ID 始终小于第二个外键的 ID。 Twitter 的社交网络有一个有向图,因此它不会使用这样的规范表示。

Consider how Facebook might implement the social graph in their database. They might have a table for people and another table for friendships. The friendships table has at least two columns, each being foreign keys to the table of people.

Since friendship is symmetric (on Facebook) they might ensure that the ID for the first foreign key is always less than the ID for the second foreign key. Twitter has a directed graph for its social network, so it wouldn't use a canonical representation like that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文