你可以对图数据库进行分区吗?如果是这样,怎么办?
我知道数据库通常可以使用 master/ 水平扩展从复制。当并发读取数量不断增长时,这是一个很好的策略。
然而,随着并发写入数量或数据量开始增长,主/从复制不会给您带来任何好处,因此您需要对数据进行分区。
这对于键值场景非常有效。对我来说,一个典型的例子是 TinyURL/bit.ly;短网址foo的读/写数据可以完全独立于短网址bar的读/写数据。
但是,如果您处于图形场景中,您应该做什么?更具体地说,是否可以对像 Neo4j 这样的图形数据库进行分区?如果是这样,怎么办?
我无法理解如何在不破坏使用图形数据库(高效遍历)的目的的情况下分解图形。
I know that databases in general can scale horizontally using master/slave replication. This is a great strategy when the number of concurrent reads is growing.
As the number of concurrent writes or just the amount of data starts to grow, though, master/slave replication doesn't get you anything, so you need to partition your data instead.
This works great for key-value scenarios. A classic example to me is TinyURL/bit.ly; reading/writing the data for short URL foo can be totally independent of reading/writing data for short URL bar.
But what are you supposed to do if you're in a graph scenario? More concretely, is it possible to partition a graph database like Neo4j at all? If so, how?
I can't wrap my head around how you could possibly break up a graph without defeating the purpose of using a graph database (efficient traversals).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您很少遍历整个图结构。
此外,图结构很少在所有节点之间紧密连接。
只要稍加小心,您就可以找到由与其他集群的少量连接分隔开的连接良好的节点的集群。
http://en.wikipedia.org/wiki/Cluster_analysis
如果您基于聚类进行分区,则集群内的遍历可能会更快,但遍历到另一个集群会更慢。
分区的总体效益取决于簇内遍历与簇间遍历的比率。
You rarely traverse an entire graph structure.
Further, graph structures are rarely heavily connected among all the nodes.
With a little care, you can locate clusters of well connected nodes separated by a small number of connections to other clusters.
http://en.wikipedia.org/wiki/Cluster_analysis
If you partition based on clustering, then traversal within the cluster may be faster, but traversal to another cluster will be slower.
Overall benefit of partitioning depends on the ratio of in-cluster traversals compared with between-cluster traversals.