如何加载/保存 C++类实例(使用 STL 容器)到磁盘
我有一个 C++ 类,表示一个非常大的分层组织的数据树(~Gb,基本上与我在内存中可以容纳的一样大)。它使用 STL 列表来存储每个节点的信息以及其他节点的迭代器。每个节点只有一个父节点,但有 0-10 个子节点。 抽象起来,它看起来像:
struct node {
public:
node_list_iterator parent; // iterator to a single parent node
double node_data_array[X];
map<int,node_list_iterator> children; // iterators to child nodes
};
class strategy {
private:
list<node> tree; // hierarchically linked list of nodes
struct some_other_data;
public:
void build(); // build the tree
void save(); // save the tree from disk
void load(); // load the tree from disk
void use(); // use the tree
};
我想将 load() 和 save() 实现到磁盘,并且它应该相当快,但是明显的问题是:
我事先不知道大小;< /p>
数据包含迭代器,其中 是易失性的;
我对 C++ 的无知是惊人的。
有人可以建议一个纯 C++ 解决方案吗?
I have a C++ class representing a hierarchically organised data tree which is very large (~Gb, basically as large as I can get away with in memory). It uses an STL list to store information at each node plus iterators to other nodes. Each node has only one parent, but 0-10 children.
Abstracted, it looks something like:
struct node {
public:
node_list_iterator parent; // iterator to a single parent node
double node_data_array[X];
map<int,node_list_iterator> children; // iterators to child nodes
};
class strategy {
private:
list<node> tree; // hierarchically linked list of nodes
struct some_other_data;
public:
void build(); // build the tree
void save(); // save the tree from disk
void load(); // load the tree from disk
void use(); // use the tree
};
I would like to implement the load() and save() to disk, and it should be fairly fast, however the obvious problems are:
I don't know the size in advance;
The data contains iterators, which
are volatile;My ignorance of C++ is prodigious.
Could anyone suggest a pure C++ solution please?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
似乎您可以使用以下语法保存数据:
也就是说,当序列化时,根节点包含所有节点,直接(子节点)或间接(其他后代)。编写格式相当简单:只需从根节点开始递归写入函数即可。
读书并没有那么难。
std::list
迭代器是稳定的。一旦插入根节点,它的迭代器就不会改变,即使插入其子节点时也是如此。因此,当您读取每个节点时,您已经可以设置父迭代器。当然,这会给您留下子迭代器,但这些都是微不足道的:每个节点都是其父节点的子节点。因此,在读取所有节点后,您将修复子迭代器。从第二个节点、第一个子节点(第一个节点是根节点)开始,迭代到最后一个子节点。然后,对于每个子 C,将其父级和子级获取到其父级的集合。现在,这意味着您必须在读取时将int
子 ID 设置在一边,但您可以在与std::list< 并行的简单 std::vector 中执行此操作/代码>。一旦您修补了相应父项中的所有子 ID,您就可以丢弃该向量。
It seems like you could save the data in the following syntax:
That is to say, when serialized the root node contains all nodes, either directly (children) or indirectly (other descendants). Writing the format is fairly straightforward: just have a recursive write function starting at the root node.
Reading isn't that much harder.
std::list<node>
iterators are stable. Once you've inserted the root node, its iterator will not change, not even when inserting its children. Hence, when you're reading each node you can already set the parent iterator. This of course leaves you with the child iterators, but those are trivial: each node is a child of its parents. So, after you've read all nodes you'll fix up the child iterators. Start with the second node, the first child (The first node one was the root) and iterate to the last child. Then, for each child C, get its parent and the child to its parent's collection. Now, this means that you have to set theint
child IDs aside while reading, but you can do that in a simple std::vector parallel to thestd::list<node>
. Once you've patched all child IDs in the respective parents, you can discard the vector.您可以使用 boost.serialization 库。这将保存容器的整个状态,甚至是迭代器。
You can use boost.serialization library. This would save entire state of your container, even the iterators.
boost.serialization 是一种解决方案,或者恕我直言,您可以使用 SQLite + Visitor 模式来加载和保存这些节点,但这并不像听起来那么容易。
boost.serialization is a solution, or IMHO, you can use SQLite + Visitor pattern to load and save these nodes, but it won't be easy as it sounds.
Boost Serialization 已经被建议,而且这当然是一个合理的可能性。
很大程度上取决于您将如何使用数据——您在内存中使用多路树这一事实并不意味着您必须将其作为多路树存储在磁盘上。由于您(显然)已经突破了内存中存储内容的限制,因此明显的问题是您是否只是对序列化数据感兴趣,以便在以下情况下可以重新构建同一棵树:或者您是否需要类似数据库的东西,以便您可以根据需要将部分信息加载到内存中,并根据需要更新记录。
如果您想要后者,您的一些选择还将取决于结构的静态程度。例如,如果某个特定节点有 N 个子节点,那么该数字是恒定的还是会发生变化?如果有变化,儿童的最大数量有限制吗?
如果您确实希望能够遍历磁盘上的结构,一种明显的可能性是在将其写出时,用适当数据的文件偏移量代替您在内存中使用的迭代器。
或者,由于单个节点中的数据(至少大部分)看起来具有固定大小,因此您可以创建一个类似数据库的固定大小记录结构,并在每个记录中记录父/子的记录号。
提前知道总体尺寸并不是特别重要(临时,即使提前知道尺寸,我也想不出任何使用该尺寸的方法)。
Boost Serialization has already been suggested, and it's certainly a reasonable possibility.
A great deal depends on how you're going to use the data -- the fact that you're using a multiway tree in memory doesn't mean you necessarily have to store it as a multiway tree on disk. Since you're (apparently) already pushing the limits of what you can store in memory, the obvious question is whether you're just interested in serializing the data so you can re-constitute the same tree when needed, or whether you want something like a database so you can load parts of the information into memory as needed, and update records as needed.
If you want the latter, some of your choices will also depend on how static the structure is. For example, if a particular node has N children, is that number constant or is it subject to change? If it's subject to change, is there a limit on the maximum number of children?
If you do want to be able to traverse the structure on disk, one obvious possibility would be as you write it out, substitute the file offset of the appropriate data in place of the iterator you're using in memory.
Alternatively, since it looks like (at least most of) the data in an individual node has a fixed size, you might create a database-like structure of fixed-size records, and in each record record the record numbers of the parent/children.
Knowing the overall size in advance isn't particularly important (offhand, I can't think of any way I'd use the size even if it was known in advance).
实际上,我认为最好的选择是将整个数据结构移动到数据库表中。这样,您就可以从比您(或我)处理序列化问题更聪明的人那里受益。它还可以让您不必担心该结构是否适合内存。
Actually, I think your best option is to move the entire data structure into database tables. That way you get the benefit of people much smarter then you (or me) having dealt with issues of serialization. It will also prevent you from having to worry about whether the structure can fit into memory.
我之前已经回答过类似的问题,所以我总结一下:
1.使用数据库。
2. 用文件偏移量替换链接(指针)。
3. 将没有树结构的数据存储在记录中,像数据库一样。
4. 使用 XML 创建树结构,使用节点名称而不是链接。
5. 如果您使用 SqLite 或 MySQL 这样的数据库,这会容易得多。
当您在“序列化”上花费太多时间而在项目的主要目的上花费较少时间时,您需要使用数据库。
I've answered something like this on SO before, so I will summarize:
1. Use a database.
2. Substitute file offsets for links (pointers).
3. Store the data without the tree structure, in records, as a database would.
4. Use XML to create the tree structure, using node names instead of links.
5. This would be soooo much easier if you used a database like SqLite or MySQL.
When you spend too much time on the "serialization" and less on the primary purpose of your project, you need to use a database.
如果您这样做是为了持久性,那么您可以从网络上使用几种解决方案,即谷歌“persist std::list”,或者您可以使用 mmap 来创建自己的文件支持内存区域。
If you're doing it for persistence then there are several solutions you can use from the web i.e. google "persist std::list" or you can roll your own using mmap to create a file backed memory area.