在持久存储中存储图的最佳方法是什么

发布于 2024-09-04 18:12:49 字数 172 浏览 8 评论 0原文

我想知道在持久存储中存储图形的最佳方法是什么,以便以后分析、搜索、聚类等。

我认为 Neo4j 是一个选项,我很好奇是否还有其他可用的图形数据库。有谁对大型社交网络如何存储基于图形的数据(或其他需要存储图形模型的网站,例如 RDF)有任何见解。

Cassandra 或 MySQL 等选项怎么样?

I am wondering what the best ways to store graphs in persistent storage are, for later analysis, search, clustering, etc.

I see neo4j being an option, I am curious if there are also other graph databases available. Does anyone have any insights into how larger social networks store their graph based data (or other sites that require the storage of graph like models, e.g. RDF).

What about options like Cassandra, or MySQL?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

可遇━不可求 2024-09-11 18:12:49

图数据库:

  1. HyperGraphDB:通用、可扩展、可移植、分布式,可嵌入的、开源的数据存储机制。
  2. InfoGrid:一个 Internet 图形数据库,具有许多附加软件组件,可以在图形基础上开发 REST 式 Web 应用程序简单的。
  3. vertexdb:支持自动垃圾收集的高性能图形数据库服务器。

来源:http://nosql.mypopescu.com/post /498705278/quick-review-of-existing-graph-databases

图库:

  1. WebGraph是一个研究网络图的框架。
    在他们的页面上 - “它提供了利用现代压缩技术来管理非常大的图形的简单方法。”
  2. Dex 是一个高性能库,用于管理非常大的图形或网络。
  3. 这篇博文 - 关于愚蠢地构建快速图形数据库 - 提供一些构建图形数据库的指南 - 该技术
    他们使用的是“内存映射I/O,基于磁盘的线性散列”。

Graph Databases:

  1. HyperGraphDB: a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.
  2. InfoGrid: an Internet Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.
  3. vertexdb: a high performance graph database server that supports automatic garbage collection.

Source: http://nosql.mypopescu.com/post/498705278/quick-review-of-existing-graph-databases

Graph Libraries:

  1. WebGraph is a framework to study the web graph.
    From their page - "It provides simple ways to manage very large graphs, exploiting modern compression techniques."
  2. Dex is a high performance library to manage very large graphs or networks.
  3. This blog post - On Building a Stupidly Fast Graph Database - provides some guidelines on building a graph database - the technique
    they use is "memory-mapped I/O, disk-based linear-hashing".
倾城月光淡如水﹏ 2024-09-11 18:12:49

免责声明:我是从图形分析的角度来说的。

有多种文件格式用于存储图形数据:GraphMLGXL 和其他几个。但存储通常不是问题。在不将图形完全加载到 RAM 中的情况下使用图形是棘手的部分。

RDF 模型过于通用,无法进行严肃的图形分析。如果您不介意分析速度缓慢并自行编写算法,请使用现有的图形数据库 - 请参阅 维基百科对此进行了介绍。

对于实际分析,请使用现有的图形分析库将所有数据加载到 RAM 中,例如 SNAP 或参见 < a href="https://stackoverflow.com/questions/51574/good-java-graph-algorithm-library">这个问题。

Disclaimer: I am speaking form the graph analysis standpoint.

There are several file formats for storing graph data: GraphML, GXL and several others. But storage usually is not a problem. Working with the graphs without fully loading them into RAM is the tricky part.

The RDF model is too generic to do serious graph analysis stuff. If you don't mind your analysis being slow and programming the algorithms yourself, go with the existing graph databases - see wikipedia on this.

For real analysis, load all data into RAM using existing graph analysis libraries, like SNAP or see This question.

我一向站在原地 2024-09-11 18:12:49

这里没有绝对正确的答案;有多种选择,具体选择取决于您的需求。通过大规模检索/遍历(例如社交网络和类似的后端),您很快就会遇到随机 I/O 瓶颈;我相信将图表存储在 RAM 中是目前唯一可行的做法。对延迟不太敏感的应用程序有多种选择,包括 neo4j (带有商业风味的开源)和 Allegrograph(商业版,有限免费版)。

在 Delver,我们最终在 GigaSpaces(一些信息可以在此演示文稿中找到),使用自定义映射减少代码进行查询和数据分析。如果您走这条路,Cassandra 似乎是一个可行的开源平台。

There is no absolutely correct answer here; there is a large variety of options, the choice of which seriously depends on your needs. With large-scale retrievals/traversals (e.g. social networks and similar back-ends) you're quickly going to run into the random I/O bottleneck; I believe storing your graph in RAM is currently the only practical course of action. Less latency-sensitive applications have quite a wide variety of options, including neo4j (open source with a commercial flavor) and Allegrograph (commercial with a limited free edition).

At Delver we ended up implementing our own denormalized data model (essentially an adjacency list to represent the graph) in RAM on top of GigaSpaces (some info can be found in this presentation), with custom map-reduce code for queries and data analysis. If you go this route, Cassandra seems to be a viable open source platform to build on.

浪推晚风 2024-09-11 18:12:49

您可以查看 InfiniteGraph,它很快就会发布测试版 (http://www.infinitegraph.com/)

如果这是用于商业用途,那么您会看到它的目标是具有更大图表的网站。社交网站构建了当时对他们有用的定制解决方案。但他们的内部解决方案比使用 InfiniteGraph 等解决方案更具限制性。 Cassandra 或 MySQL 等产品并不是为这种多对多问题集而设计的。你能做到吗?当然可以,但这是大量手写编码,并且不可扩展。
如果您有真实的项目,请告诉我们,我们可以帮助您确定图形需求。
谢谢,
沃伦
[电子邮件受保护]

You could look at InfiniteGraph, which will be released for beta very soon (http://www.infinitegraph.com/)

If this is for commercial use then you'll see it's targeted towards sites that will have larger graphs. The social networking sites built custom solutions, which worked for them at the time. But they're in-house solutions are more limiting than using something like InfiniteGraph. Products like Cassandra or MySQL weren't designed for this many-to-many problem set. Can you do it? Sure, but it's a lot of hand-written coding, and not scalable.
Let us know if you have a real project, we could help you figure out you graph requirements.
Thanks,
Warren
[email protected]

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文