我想了解如何存储包含大量数据的图表。我正在设计一个应用程序,其中包含巨大的铁路路线网络图。其中顶点
是火车站名称
。我在C++
中使用邻接表
进行设计。但现在我发现它消耗非常高的内存,有时我还会收到no-memory
错误。我想知道这么大的图是如何存储的,以便可以使用图上的算法。
图形定义为
std::map > Railway_graph;
或者 google/facebook 如何存储图形数据结构。
I want to understand how to store a graph with huge data. I am designing an application which has a graph of huge railway route network. Where vertices
are the railway station name
. I have designed using adjacency list
in C++
. But now i found that it is consuming very high memory and sometime i also get no-memory
error. I was wondering how such huge graph are stored so that algorithm
on the graph can be used.
Graph is defined as
std::map<std::string, std::set<std::string> > railway_graph;
or how does google/facebook store there graph data structure.
发布评论
评论(5)
使用邻接矩阵表示而不是邻接列表可以减少密集矩阵的内存分配。
因为您没有提到系统的大小或您尝试运行的算法类型,所以很难判断您的算法是否需要检查不适当的内存消耗,或者您是否确实需要使用文件作为整个程序的间歇性“内存”,以便使计算成为可能。
Using an adjacency matrix representation instead of an adjacency list can reduce memory allocation for dense matrices.
Because you didn't mention what the size of the system is or what types of algorithms you are attempting to run, it is hard to judge whether your algorithm needs to be checked for inappropriate memory consumption, or if you actually need to make use of files as intermittent "memory" throughout your program in order to make the calculation possible.
您选择的数据结构将需要大量多余的内存,在堆上动态分配。
std::map
和std::string
将为每个单个条目分配一块内存(加上其自身的开销)。std::string
也会为字符串分配一块内存。这在很多情况下都很舒服并且完全没问题。但对于大型数据结构来说就不行了。
最后你得到了一个映射,其中包含指向集合的指针(其本身被一一分配),集合包含指向字符串的指针(其本身被一一分配),字符串包含指向实际字符串缓冲区的指针。
您的实际问题是动态分配产生的开销。在大多数平台上,堆分配需要额外的 16 字节内存用于堆管理(尽管数字各不相同......)。
我建议您按以下方式重新定义图表:
或者,以下数据结构可能更适合您的用例。它与您的示例类似,但内存表示更加紧凑:
编辑:添加并使用
NodeIdList
...如果这仍然消耗太多内存,那么您应该考虑将数据保存在磁盘上并按需加载。
如果您的节点名称是不变的,那么您还应该考虑某种字符串表,它是内存中字符串数据的更紧凑的表示形式。但这是相当低级的东西。
首先尝试使用更好的数据结构!
Your choice of data structure will require a lot superfluous memory, dynamically allocated on the heap.
std::map
andstd::string
will allocate a piece of memory for each single entry (plus its own overhead).std::string
will also allocate a piece of memory for the string.This is comfortable and totally ok for many cases. But not ok for large data structures.
In the end you have a map, which contains pointers (which itself were allocated one by one) to sets, which contains pointers (which itself were allocated one by one) to strings, which contain pointers to the actual string buffers.
Your actual problem is the overhead that dynamic allocation incurs. On most platforms, a heap allocation requires an extra 16-byte of memory just for heap management (though the numbers vary...).
I suggest, that you re-define your graph in the following way:
Or, alternatively the following data structures may be easier for your use cases. It is similar to your example, but is much more compact in memory representation:
EDIT: Added and used
NodeIdList
...If this still consumes too much memory, then you should think about keeping data on disk and loading it on demand.
If your node names are constant, then you should also think about some kind of string-table, a more compact representation of string data in memory. But this is rather low-level stuff.
Try to use better data structures first!
像这样使用map和string会增加很多冗余的内存使用。如果您将名称存储在一个向量中,并且仅使用整数索引来存储邻接列表,那么它应该会更加紧凑。
Using map and string like that will add a lot of redundant memory use. If you store the names in one vector and the adjacency list using just integer indices it should be a lot more compact.
您可以将与每个站点相关的数据存储在数据库中,并在需要时通过
id
获取它。图密度定义为
D = 2|E|/(|V|(|V|-1))
。您必须根据D
设计数据结构。如果你有密集的图,那么你可以使用矩阵表示。您只需要 |V|*|V|位大约。
对于稀疏图的边列表表示是很好的。
You can store data related to each station in a database and fetch it by
id
as and when required.Graph density is defined as
D = 2|E|/(|V|(|V|-1))
. You have to design data structure depending uponD
.If you have dense graph then you can use matrix representation. You will require only |V|*|V| bits approximately.
For sparse graph edge list representation is good.
看看
我的建议是获取您的地图并将其转换为 redis 中的地图,然后可以将其保留在本地文件系统上。查找速度非常快,不会对性能造成太大影响。
Take a look at
what i would propose is to take your map and convert it to a map in redis, which can then be persisted on local file system. Look up is really fast and should not hurt performance a lot.