如何有效地实现具有大量大完整子图的图？

发布于 2024-11-27 21:27:23 字数 1003 浏览 2 评论 0 原文

我目前正在处理图形实现的性能问题。

使用的技术

它是用 C++ 编程的。目前，该图是通过 BGL 实现的。

关于图

托管图是动态且无向的。它有两种结构：大量的完整子图和少量的单边。唯一需要的信息是顶点的直接邻域。

问题陈述

一开始，完整的子图很小（大约10个顶点）并且数量很多（大约13k）。 BGL 的邻接表实现是完美的。但现在，它被要求管理 5000 个顶点的少数子图。这意味着 5000x5000 边。那么现在在时间和空间上的表现都很差。

被拒绝的解决方案

我的第一个想法是使用 BGL 提供的邻接矩阵实现。但它不允许动态图。为了解决这个约束，有两种解决方案：为动态图提供邻接矩阵的新实现或在静态图中使用可用顶点池。经过反思，我认为这不是一个好主意：空间复杂度仍然是VxV/2。

最终解决方案和问题

所以，这里是我的最终解决方案：不使用 BGL，实现顶点袋来表示完整的子图（不需要边）并直接连接少数单边的顶点。通过这样做，最大子图的空间复杂度下降到它的顶点数，大约 5000。

您认为最后一个解决方案是好的吗？
如果没有，我可以使用哪种实现？为什么？

更新 1

有关该图的更多信息：该图有约 100k 个顶点，约 3 个顶点的约 13k 个完整子图，以及大小范围 [10-5000] 的约 100 个完整子图。每条边都有捆绑的属性。

更新 2

感谢 Salim Jouilli，我最近了解到节点包是一种坦率的方法超图，其中超边包含在节点的子集中。

更新3

我已经完成了解决方案的实施。我有效地提高了内存消耗和运行时间：从 6GB 到 24MB，从 50 分钟到 2 分钟 30。

原文

I'm currently dealing with a performance issue of graph implementation.

Used technologies

It is programmed in C++. For the moment, the graph is implemented thanks to the BGL.

About the graph

The managed graph is dynamic and undirected. It has two kinds of structures: a lot of complete subgraphs and few of single edges. The only needed information is the direct neighborhood of a vertex.

Problem statement

At the beginning, the complete subgraphs were small (about 10 vertices) and numerous (about 13k). An adjacency list implementation, the BGL's one, was perfect. But now, it is asked to manage few subgraph of 5000 vertices. That means 5000x5000 edges. The performance in time and space are then very poor now.

Rejected solutions

My first thought was to use the adjacency matrix implementation provided by BGL. But it doesn't allow dynamic graph. To resolve this constraint, two solutions: provide a new implementation of adjacency matrix for dynamic graph or use a pool of available vertices in a static graph. After reflection, I think it's not a good idea: the space complexity is still VxV/2.

Final Solution and question

So, here my final solution: don't use the BGL, implement bags of vertices to represent complete subgraphs (no need of edges) and directly connect vertices for the few single edges. By doing so, the space complexity of the biggest subgraph falls to its number of vertices, about 5000.

Do you think this last solution is the good one?
If not, which implementation could I use? And why?

Update 1

More information about the graph:
The graph has ~100k vertices, ~13k complete subgraphs of about 3 vertices, and ~100 complete subgraphs of size range [10-5000]. And each edge has bundled properties.

Update 2

I've recently learned thanks to Salim Jouilli that the bag of nodes is a candid approach of hypergraph where a hyperedge consists in a subset of nodes.