在持久存储中存储图的最佳方法是什么
我想知道在持久存储中存储图形的最佳方法是什么,以便以后分析、搜索、聚类等。
我认为 Neo4j 是一个选项,我很好奇是否还有其他可用的图形数据库。有谁对大型社交网络如何存储基于图形的数据(或其他需要存储图形模型的网站,例如 RDF)有任何见解。
Cassandra 或 MySQL 等选项怎么样?
I am wondering what the best ways to store graphs in persistent storage are, for later analysis, search, clustering, etc.
I see neo4j being an option, I am curious if there are also other graph databases available. Does anyone have any insights into how larger social networks store their graph based data (or other sites that require the storage of graph like models, e.g. RDF).
What about options like Cassandra, or MySQL?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
图数据库:
来源:http://nosql.mypopescu.com/post /498705278/quick-review-of-existing-graph-databases
图库:
在他们的页面上 - “它提供了利用现代压缩技术来管理非常大的图形的简单方法。”
他们使用的是“内存映射I/O,基于磁盘的线性散列”。
Graph Databases:
Source: http://nosql.mypopescu.com/post/498705278/quick-review-of-existing-graph-databases
Graph Libraries:
From their page - "It provides simple ways to manage very large graphs, exploiting modern compression techniques."
they use is "memory-mapped I/O, disk-based linear-hashing".
免责声明:我是从图形分析的角度来说的。
有多种文件格式用于存储图形数据:GraphML、GXL 和其他几个。但存储通常不是问题。在不将图形完全加载到 RAM 中的情况下使用图形是棘手的部分。
RDF 模型过于通用,无法进行严肃的图形分析。如果您不介意分析速度缓慢并自行编写算法,请使用现有的图形数据库 - 请参阅 维基百科对此进行了介绍。
对于实际分析,请使用现有的图形分析库将所有数据加载到 RAM 中,例如 SNAP 或参见 < a href="https://stackoverflow.com/questions/51574/good-java-graph-algorithm-library">这个问题。
Disclaimer: I am speaking form the graph analysis standpoint.
There are several file formats for storing graph data: GraphML, GXL and several others. But storage usually is not a problem. Working with the graphs without fully loading them into RAM is the tricky part.
The RDF model is too generic to do serious graph analysis stuff. If you don't mind your analysis being slow and programming the algorithms yourself, go with the existing graph databases - see wikipedia on this.
For real analysis, load all data into RAM using existing graph analysis libraries, like SNAP or see This question.
这里没有绝对正确的答案;有多种选择,具体选择取决于您的需求。通过大规模检索/遍历(例如社交网络和类似的后端),您很快就会遇到随机 I/O 瓶颈;我相信将图表存储在 RAM 中是目前唯一可行的做法。对延迟不太敏感的应用程序有多种选择,包括 neo4j (带有商业风味的开源)和 Allegrograph(商业版,有限免费版)。
在 Delver,我们最终在 GigaSpaces(一些信息可以在此演示文稿中找到),使用自定义映射减少代码进行查询和数据分析。如果您走这条路,Cassandra 似乎是一个可行的开源平台。
There is no absolutely correct answer here; there is a large variety of options, the choice of which seriously depends on your needs. With large-scale retrievals/traversals (e.g. social networks and similar back-ends) you're quickly going to run into the random I/O bottleneck; I believe storing your graph in RAM is currently the only practical course of action. Less latency-sensitive applications have quite a wide variety of options, including neo4j (open source with a commercial flavor) and Allegrograph (commercial with a limited free edition).
At Delver we ended up implementing our own denormalized data model (essentially an adjacency list to represent the graph) in RAM on top of GigaSpaces (some info can be found in this presentation), with custom map-reduce code for queries and data analysis. If you go this route, Cassandra seems to be a viable open source platform to build on.
您可以查看 InfiniteGraph,它很快就会发布测试版 (http://www.infinitegraph.com/)
如果这是用于商业用途,那么您会看到它的目标是具有更大图表的网站。社交网站构建了当时对他们有用的定制解决方案。但他们的内部解决方案比使用 InfiniteGraph 等解决方案更具限制性。 Cassandra 或 MySQL 等产品并不是为这种多对多问题集而设计的。你能做到吗?当然可以,但这是大量手写编码,并且不可扩展。
如果您有真实的项目,请告诉我们,我们可以帮助您确定图形需求。
谢谢,
沃伦
[电子邮件受保护]
You could look at InfiniteGraph, which will be released for beta very soon (http://www.infinitegraph.com/)
If this is for commercial use then you'll see it's targeted towards sites that will have larger graphs. The social networking sites built custom solutions, which worked for them at the time. But they're in-house solutions are more limiting than using something like InfiniteGraph. Products like Cassandra or MySQL weren't designed for this many-to-many problem set. Can you do it? Sure, but it's a lot of hand-written coding, and not scalable.
Let us know if you have a real project, we could help you figure out you graph requirements.
Thanks,
Warren
[email protected]