如何在图数据库(如 Neo4j)中对现实世界的关系进行建模?
我有一个关于在图形数据库中建模的一般性问题,但我似乎无法解决这个问题。
您如何建模这种类型的关系:“牛顿发明了微积分”?
在简单图表中,您可以像这样建模:
Newton (node) -> invented (relationship) -> Calculus (node)
。 ..所以当你添加更多的人和发明时,你就会有一堆“发明的”图形关系。
问题是,您开始需要向关系中添加一堆属性:
- vention_date
- 有影响力的概念
- 有影响力的人
- books_inventor_wrote
...并且您将需要开始在这些属性和其他节点之间创建关系,例如:
- 有影响力的人:与人物节点的关系
- books_inventor_wrote :与书籍节点的关系
所以现在看起来“现实世界的关系”(“发明的”)实际上应该是图中的一个节点,并且该图应该如下所示:
Newton (node) -> (relationship) -> Invention of Calculus (node) -> (relationship) -> Calculus (node)
并且使事情变得复杂更多,其他人也参与了微积分的发明,所以图现在变成了这样:
Newton (node) ->
(relationship) ->
Newton's Calculus Invention (node) ->
(relationship) ->
Invention of Calculus (node) ->
(relationship) ->
Calculus (node)
Leibniz (node) ->
(relationship) ->
Leibniz's Calculus Invention (node) ->
(relationship) ->
Invention of Calculus (node) ->
(relationship) ->
Calculus (node)
所以我问这个问题是因为你似乎不想在实际的图数据库“关系”对象上设置属性,因为您可能希望在某些时候将它们视为图中的节点。
这是正确的吗?
我一直在研究 Freebase Metaweb 架构,他们似乎将一切视为一个节点。例如,Freebase 有一个 Mediator/CVT 的想法,您可以在其中创建一个“将“Actor”节点链接到“Film”节点的“Performance”节点,如下所示: http://www.freebase.com/edit/topic/en/the_last_samurai。但不太确定这是否是同一个问题。
您使用哪些指导原则来确定“现实世界关系”是否实际上应该是图节点而不是图关系?
如果有关于这个主题的任何好书,我很想知道。谢谢!
I have a general question about modeling in a graph database that I just can't seem to wrap my head around.
How do you model this type of relationship: "Newton invented Calculus"?
In a simple graph, you could model it like this:
Newton (node) -> invented (relationship) -> Calculus (node)
...so you'd have a bunch of "invented" graph relationships as you added more people and inventions.
The problem is, you start needing to add a bunch of properties to the relationship:
- invention_date
- influential_concepts
- influential_people
- books_inventor_wrote
...and you'll want to start creating relationships between those properties and other nodes, such as:
- influential_people: relationship to person nodes
- books_inventor_wrote: relationship to book nodes
So now it seems like the "real-world relationships" ("invented") should actually be a node in the graph, and the graph should look like this:
Newton (node) -> (relationship) -> Invention of Calculus (node) -> (relationship) -> Calculus (node)
And to complicate things more, other people are also participated in the invention of Calculus, so the graph now becomes something like:
Newton (node) ->
(relationship) ->
Newton's Calculus Invention (node) ->
(relationship) ->
Invention of Calculus (node) ->
(relationship) ->
Calculus (node)
Leibniz (node) ->
(relationship) ->
Leibniz's Calculus Invention (node) ->
(relationship) ->
Invention of Calculus (node) ->
(relationship) ->
Calculus (node)
So I ask the question because it seems like you don't want to set properties on the actual graph database "relationship" objects, because you may want to at some point treat them as nodes in the graph.
Is this correct?
I have been studying the Freebase Metaweb Architecture, and they seem to be treating everything as a node. For example, Freebase has the idea of a Mediator/CVT, where you can create a "Performance" node that links an "Actor" node to a "Film" node, like here: http://www.freebase.com/edit/topic/en/the_last_samurai. Not quite sure if this is the same issue though.
What are some guiding principles you use to figure out if the "real-world relationship" should actually be a graph node rather than a graph relationship?
If there are any good books on this topic I would love to know. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
其中一些内容(例如发明日期)可以作为属性存储在边上,因为在大多数图形数据库中,边可以具有属性,就像顶点可以具有属性一样。例如,您可以执行类似的操作(代码如下 TinkerPop 的蓝图):
现在,让我们稍微扩展一下它并添加李布尼茨:
添加书籍:
为了找出牛顿写的关于他发现的事物的所有书籍,我们可以构造一个图遍历。我们从牛顿开始,沿着他的链接找到他发现的东西,然后反向遍历链接以获取有关该主题的书籍,并再次反向链接以获取作者。如果作者是牛顿,则返回书本并返回结果。该查询是用 Gremlin 编写的,这是一种基于 Groovy 的用于图形遍历的领域特定语言:
因此,我希望我'我们已经展示了如何使用巧妙的遍历来避免创建中间节点来表示边缘的问题。在小型数据库中,这并不重要,但在大型数据库中,这样做将遭受巨大的性能损失。
是的,遗憾的是您无法将图中的边与其他边关联起来,但这是这些数据库的数据结构的限制。有时让所有东西都成为节点是有意义的,例如,在 Mediator/CVT 中,性能也更加具体。个人可能希望在评论中只关注汤姆·克鲁斯在《最后的武士》中的表演。然而,对于大多数图数据库,我发现应用一些图遍历可以从数据库中得到我想要的东西。
Some of these things, such as
invention_date
, can be stored as properties on the edges as in most graph databases edges can have properties in the same way that vertexes can have properties. For example you could do something like this (code follows TinkerPop's Blueprints):Now, lets expand it a little bit and add in Liebniz:
Adding in the books:
To find out all of the books that Newton wrote on things he discovered we can construct a graph traversal. We start with Newton, follow the out links from him to things he discovered, then traverse links in reverse to get books on that subject and again go reverse on a link to get the author. If the author is Newton then go back to the book and return the result. This query is written in Gremlin, a Groovy based domain specific language for graph traversals:
Thus, I hope I've shown a little how a clever traversal can be used to avoid issues with creating intermediate nodes to represent edges. In a small database it won't matter much, but in a large database you're going to suffer large performance hits doing that.
Yes, it is sad that you can't associate edges with other edges in a graph, but that's a limitation of the data structures of these databases. Sometimes it makes sense to make everything a node, for example, in Mediator/CVT a performance has a bit more concreteness too it. Individuals may wish address only Tom Cruise's performance in "The Last Samurai" in a review. However, for most graph databases I've found that application of some graph traversals can get me what I want out of the database.