图形数据库如何水平扩展(如果有的话)?
对于键值、文档和列族数据库,我知道您可以通过键空间中的复制和分片组合进行扩展。但是,对于最短路径等常见的图形操作——这些似乎并没有真正从复制中获得任何好处......而且我看不出如何在不找到独立子图的情况下对图形数据库进行分片(非常困难) )。
是否有图数据库试图解决这个问题?目前该领域的研究进展如何?
With key-value, document, and column-family databases, I understand you can scale out with combinations of replication and sharding in the keyspace. But, with common graph operations like shortest path, etc. -- these don't really seem to gain any benefit from replication...and I can't see how you would shard a graph database without finding an independent subgraph (very difficult).
Are there graph databases that try to tackle this problem? What is the current research in this area?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
复制对于任何类型的数据库都很有用 - 它只是创建数据的多个副本,这样您就可以提供比单个服务器可以处理的更多的查询。
分片稍微复杂一些,但实际上与键/值或文档存储没有太大区别,因为内部边缘必须表示为简单的列表。
虽然在大多数情况下找到独立子图确实是不可能的,但实际上没有必要。只要处理查询的节点能够从其他节点获取数据,在本地提供可用数据只是一种性能优化。
一旦完成设置,您就有很多选项可以根据您正在使用的图表类型来优化性能 - 例如,在社交图表中,您可以使用位置来为用户选择节点,因为您知道大多数连接是本地的。
我不知道任何现有的图形数据库内置了分片,可能是因为在一般情况下问题更难解决,而且边缘数据的小尺寸意味着您需要一个非常大的图形才能超过单个图形的容量服务器。
Replication can be useful for any kind of database - it's just creating multiple copies of data so you can serve more queries than a single server can handle.
Sharding is a little more complex, but actually not too different from key/value or document stores, because internally edges have to be represented as simple lists.
While it is true that finding independent subgraphs is impossible in most cases, it isn't actually necessary. As long as the node processing the query is able to get data from other nodes, having the data available locally is just a performance optimisation.
Once you have that set up, you have a lot of options for optimising performance based on the type of graph you are working with - for example in a social graph you might use location to select the node for a user because you know that most connections are local.
I'm not aware of any existing graph databases that have sharding built in, probably because the problem is much harder to solve for the general case and the small size of edge data means you need a really large graph to exceed the capacity of a single server.
Neo4j支持分片并正在尝试解决分片问题。请查看 http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx
Neo4j supports sharding and is trying to tackle with sharding problems. Please take a look at http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx
GoldenOrb 是一个旨在创建水平可扩展的图形数据库的概念。它已作为开源发布,但该项目现在似乎已经死亡(GitHub 的链接已离线)。它是基于 Hadoop 的。
尽管如此,考虑到对于图数据库的某些用例来说,需要在节点之间共享的信息量太大且太复杂,这样的模型还不能被认为是一个功能齐全的图数据库。计算、分层缓存分层架构的发展可以让它被认为是完全可扩展的和事实上的图数据库。
所以总的来说,这个日期的答案是“不”,但不完全是。
托管该项目的原始网站是:http://goldenorbos.org
GoldenOrb was a concept that aimed to create a horizontally scalable Graph Database. It was released as open source, but the project appears to be dead now (link to GitHub is offline). It was based on Hadoop.
Even though, such model cannot be considered yet a fully capable graph database, given the amount of information that needs to be shared among nodes is too large and complex for certain use cases of graph databases. Evolution of computing, tiered caching layered architectures, could allow it to be considered fully scalable and a de-fact Graph database.
So overall, the answer to this date is "no", not fully.
The original web site hosting the project is this: http://goldenorbos.org
查看 http://thinkaurelius.com/
对于 Titan,他们使用 Cassandra、HBase 或 BerkeleyDB 作为后备存储,本质上具有商店的可扩展性特征。
Check out http://thinkaurelius.com/
For Titan they use Cassandra, HBase, or BerkeleyDB as the backing store which comes inherently with the store's scalability characteristics.
ArangoDB 是一个多模型图形数据库,它可以像图形文档存储一样水平扩展。它遵循图表的混合索引方法。
借助 SmartGraph 功能,人们可以通过用户定义的分片键(例如区域、客户、类别或任何其他属性)对图数据集进行分片,并将顶点及其边分配到同一台机器。然后,查询引擎知道给定查询所需的数据驻留在哪里,将请求发送到所需的机器并在本地执行查询。对于许多横向扩展用例,这可能是一个合适的解决方案。https://www.arangodb.com/why-arangodb/arangodb-enterprise/arangodb-enterprise-smart-graphs/
ArangoDB is a multi-model graph database which scales horizontally like a document store also for graphs. It follows the hybrid-index approach to graphs.
With the SmartGraph feature, one can shard a graph dataset by a user defined sharding key (e.g. region, customer, category or any other property) and vertices as well as their edges get distributed to the same machine. The query engine then knows where the data needed for a given query resides, sends the request to the needed machines and executes the query locally. For many scale-out use cases, this can be a suitable solution .https://www.arangodb.com/why-arangodb/arangodb-enterprise/arangodb-enterprise-smart-graphs/