巨大的图形存储问题

发布于 2024-12-22 18:27:24 字数 363 浏览 1 评论 0 原文

我想了解如何存储包含大量数据的图表。我正在设计一个应用程序,其中包含巨大的铁路路线网络图。其中顶点火车站名称。我在C++中使用邻接表进行设计。但现在我发现它消耗非常高的内存,有时我还会收到no-memory错误。我想知道这么大的图是如何存储的,以便可以使用图上的算法。

图形定义为

std::map > Railway_graph;

或者 google/facebook 如何存储图形数据结构。

I want to understand how to store a graph with huge data. I am designing an application which has a graph of huge railway route network. Where vertices are the railway station name. I have designed using adjacency list in C++. But now i found that it is consuming very high memory and sometime i also get no-memory error. I was wondering how such huge graph are stored so that algorithm on the graph can be used.

Graph is defined as

std::map<std::string, std::set<std::string> > railway_graph;

or how does google/facebook store there graph data structure.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

指尖上得阳光 2024-12-29 18:27:24

使用邻接矩阵表示而不是邻接列表可以减少密集矩阵的内存分配。

因为您没有提到系统的大小或您尝试运行的算法类型,所以很难判断您的算法是否需要检查不适当的内存消耗,或者您是否确实需要使用文件作为整个程序的间歇性“内存”,以便使计算成为可能。

Using an adjacency matrix representation instead of an adjacency list can reduce memory allocation for dense matrices.

Because you didn't mention what the size of the system is or what types of algorithms you are attempting to run, it is hard to judge whether your algorithm needs to be checked for inappropriate memory consumption, or if you actually need to make use of files as intermittent "memory" throughout your program in order to make the calculation possible.

秋心╮凉 2024-12-29 18:27:24

您选择的数据结构将需要大量多余的内存,在堆上动态分配。 std::mapstd::string 将为每个单个条目分配一块内存(加上其自身的开销)。 std::string 也会为字符串分配一块内存。

这在很多情况下都很舒服并且完全没问题。但对于大型数据结构来说就不行了。

最后你得到了一个映射,其中包含指向集合的指针(其本身被一一分配),集合包含指向字符串的指针(其本身被一一分配),字符串包含指向实际字符串缓冲区的指针。

您的实际问题是动态分配产生的开销。在大多数平台上,堆分配需要额外的 16 字节内存用于堆管理(尽管数字各不相同......)。

我建议您按以下方式重新定义图表:

// a list of node names, its index (a size_t) is used in the following data structures
// - alternatively, you may use an std::map<int,std::string> here, to simplify the
//   "index" to "name" lookup...
typedef size_t NodeId;
typedef std::vector<std::string> NodeList;

// an edge
typedef std::pair<NodeId,NodeId> Edge;
// or alternatively:
struct Edge {
    NodeId from, to;
};

// a plain list of edges
typedef std::vector<Edge> EdgeList;

或者,以下数据结构可能更适合您的用例。它与您的示例类似,但内存表示更加紧凑:

// a list of node names, its index (a size_t) is used in the following data structures
typedef size_t NodeId;
typedef std::vector<std::string> NodeList;

typedef std::vector<NodeId> NodeIdList;

// a map from one node to its adjacent nodes
typedef std::map< NodeId, NodeIdList > Graph;

编辑:添加并使用NodeIdList ...

如果这仍然消耗太多内存,那么您应该考虑将数据保存在磁盘上并按需加载。

如果您的节点名称是不变的,那么您还应该考虑某种字符串表,它是内存中字符串数据的更紧凑的表示形式。但这是相当低级的东西。

首先尝试使用更好的数据结构!

Your choice of data structure will require a lot superfluous memory, dynamically allocated on the heap. std::map and std::string will allocate a piece of memory for each single entry (plus its own overhead). std::string will also allocate a piece of memory for the string.

This is comfortable and totally ok for many cases. But not ok for large data structures.

In the end you have a map, which contains pointers (which itself were allocated one by one) to sets, which contains pointers (which itself were allocated one by one) to strings, which contain pointers to the actual string buffers.

Your actual problem is the overhead that dynamic allocation incurs. On most platforms, a heap allocation requires an extra 16-byte of memory just for heap management (though the numbers vary...).

I suggest, that you re-define your graph in the following way:

// a list of node names, its index (a size_t) is used in the following data structures
// - alternatively, you may use an std::map<int,std::string> here, to simplify the
//   "index" to "name" lookup...
typedef size_t NodeId;
typedef std::vector<std::string> NodeList;

// an edge
typedef std::pair<NodeId,NodeId> Edge;
// or alternatively:
struct Edge {
    NodeId from, to;
};

// a plain list of edges
typedef std::vector<Edge> EdgeList;

Or, alternatively the following data structures may be easier for your use cases. It is similar to your example, but is much more compact in memory representation:

// a list of node names, its index (a size_t) is used in the following data structures
typedef size_t NodeId;
typedef std::vector<std::string> NodeList;

typedef std::vector<NodeId> NodeIdList;

// a map from one node to its adjacent nodes
typedef std::map< NodeId, NodeIdList > Graph;

EDIT: Added and used NodeIdList ...

If this still consumes too much memory, then you should think about keeping data on disk and loading it on demand.

If your node names are constant, then you should also think about some kind of string-table, a more compact representation of string data in memory. But this is rather low-level stuff.

Try to use better data structures first!

背叛残局 2024-12-29 18:27:24

像这样使用map和string会增加很多冗余的内存使用。如果您将名称存储在一个向量中,并且仅使用整数索引来存储邻接列表,那么它应该会更加紧凑。

std::vector<std::string> name;
std::vector<std::vector<size_t> > adj_list;

Using map and string like that will add a lot of redundant memory use. If you store the names in one vector and the adjacency list using just integer indices it should be a lot more compact.

std::vector<std::string> name;
std::vector<std::vector<size_t> > adj_list;
风苍溪 2024-12-29 18:27:24
class Node
{
   string id; 
   Data data; // fetch data by ID when required from some database 
}

您可以将与每个站点相关的数据存储在数据库中,并在需要时通过 id 获取它。

图密度定义为D = 2|E|/(|V|(|V|-1))。您必须根据D设计数据结构。

如果你有密集的图,那么你可以使用矩阵表示。您只需要 |V|*|V|位大约。

对于稀疏图的边列表表示是很好的。

class Node
{
   string id; 
   Data data; // fetch data by ID when required from some database 
}

You can store data related to each station in a database and fetch it by id as and when required.

Graph density is defined as D = 2|E|/(|V|(|V|-1)). You have to design data structure depending upon D.

If you have dense graph then you can use matrix representation. You will require only |V|*|V| bits approximately.

For sparse graph edge list representation is good.

醉态萌生 2024-12-29 18:27:24

看看

http://redis.io/

我的建议是获取您的地图并将其转换为 redis 中的地图,然后可以将其保留在本地文件系统上。查找速度非常快,不会对性能造成太大影响。

Take a look at

http://redis.io/

what i would propose is to take your map and convert it to a map in redis, which can then be persisted on local file system. Look up is really fast and should not hurt performance a lot.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文