用小内存在大图上进行广度优先搜索

发布于 2024-08-21 14:33:06 字数 495 浏览 5 评论 0原文

我目前有一个大约有1000万个节点3500万个边的图。目前,完整的图表已在程序启动时加载到内存中。这需要几分钟(毕竟是 Java)并且需要大约半 GB 的 RAM。目前,它运行在具有双核处理器和 4 GB RAM 的机器上。

当使用广度优先搜索来搜索图时,内存使用量会上升到 1 GB 的峰值,并且平均需要 10 秒。

我想在几台计算机上部署该程序。除了图形搜索之外的功能确实占用很少的资源。我的目标系统非常小,只有 512 MB RAM。

关于如何实现一种方法(可能使用数据库)来搜索该图而不消耗太多内存有什么建议吗?该程序在访问硬件设备时大部分时间都是空闲的,因此对于上述图表,寻路最多可能需要大约 5 分钟...

感谢您向我提出的任何想法。

更新:

刚刚找到 neo4j。有人知道它是否适合这种巨大的图表?

I currently have a graph that has about 10 million nodes and 35 million edges. For now the complete graph is loaded into memory at program start. This takes a couple of minutes (it is Java after all) and needs about half a gigabyte of RAM. For now it runs on a machine with a dual core processor and 4 gigabytes of RAM.

When the graph is searched using a breadth-first-search the memory usage rises to a peak of one gigabyte and it takes ten seconds on average.

I would like to deploy the program on a couple of computers. The functionality apart from the graph search does take very little resources. My target system is very miniature and has only 512 megabytes of RAM.

Any suggestions on how to implement a method (probably using a database) to search that graph without consuming too much memory? The program is idle most of the time as it is accessing a hardware device, so the path-finding could take about 5 minutes max for the mentioned graph...

Thanks for any thoughts thrown in my direction.

UPDATE:

Just found neo4j. Anybody knows if it would be suitable for this kind of humongous graph?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

你如我软肋 2024-08-28 14:33:07

您的问题有点模糊,但总的来说,一个好的策略主要遵循广度优先语义,同时使用与深度优先搜索相同的内存量 迭代深化。这个想法是,您首先进行仅限于 1 级的深度优先搜索;如果无法找到解决方案,请从头开始并将其限制为 2 个级别;如果失败,请尝试 3 个级别,依此类推。

乍一看这似乎有点多余,但由于您正在进行深度优先搜索,因此您在内存中保留的节点要少得多,并且总是比直接的广度优先搜索少搜索一个级别。由于一个级别中的节点数量呈指数级增长,因此在较大的图表上,节省最后一个额外级别很可能会因为冗余地尝试所有前面的层而得到回报。

Your question is a little vague, but in general, a good strategy that mostly follows breadth first semantics while using the same amount of memory as depth-first search is Iterative Deepening. The idea is that you do a depth-first search limited to 1 level at first; if that fails to find a solution, start from scratch and limit it to 2 levels; if that fails, try 3 levels, and so on.

This may seem a bit redundant at first, but since you're doing a depth-first search, you keep much fewer nodes in memory, and always search one less level than a straightforward breadth-first search. Since the amount of nodes in a level grows exponentially, on larger graphs, it's very likely that saving that one last extra level pays off for trying all preceding layers redundantly.

浪漫人生路 2024-08-28 14:33:07

我想说,当您拥有像这样大小合适的图表时,Neo4j 绝对是一个好方法。它不仅具有内置的 BFS 算法,您还可以将数据保存在磁盘上,从而减少启动时间。

在 highscalability.com 上查看:NEO4J - 一个令人兴奋的图形数据库

我使用过 Neo4j,他们的文档非常好,他们提供了一些很好的入门示例,实际上只需要几分钟就可以开始。

查看他们的 - 10 分钟入门指南

I would say that Neo4j is definitely a good way to go when you have a decent sized graph such as this. Not only does it have built-in BFS algorithms you will also persist you data on disk, thus reducing your start-up time.

Check this out on highscalability.com: NEO4J - A GRAPH DATABASE THAT KICKS BUTTOX

I've used Neo4j and their documentation is very good, and they provide some nice getting started examples, which really do take just a few minutes to get going.

Check out their - Getting started in 10 minutes guide

随风而去 2024-08-28 14:33:07

Neo4j 将数据以图形形式存储在数据库中,它会被持久化,您可以使用图形遍历 Api(BFS、DBS、A* Dijkstra ...)或使用 Cypher 查询语言进行访问。

Neo4j stores data in database as graph, it becomes persisted and you can access using the Graph Traversal Api (BFS , DBS , A* Dijkstra ... ), or Using Cypher query language.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文