如何加载/保存 C++类实例（使用 STL 容器）到磁盘

发布于 2024-08-29 23:01:10 字数 905 浏览 12 评论 0原文

我有一个 C++ 类，表示一个非常大的分层组织的数据树（~Gb，基本上与我在内存中可以容纳的一样大）。它使用 STL 列表来存储每个节点的信息以及其他节点的迭代器。每个节点只有一个父节点，但有 0-10 个子节点。抽象起来，它看起来像：

struct node {
public:
    node_list_iterator parent;              // iterator to a single parent node
    double node_data_array[X];
    map<int,node_list_iterator> children;   // iterators to child nodes
};

class strategy {
private:
    list<node> tree;        // hierarchically linked list of nodes
    struct some_other_data;
public:
    void build();           // build the tree
    void save();            // save the tree from disk
    void load();            // load the tree from disk
    void use();             // use the tree
};

我想将 load() 和 save() 实现到磁盘，并且它应该相当快，但是明显的问题是：

我事先不知道大小；< /p>
数据包含迭代器，其中是易失性的；
我对 C++ 的无知是惊人的。

有人可以建议一个纯 C++ 解决方案吗？

原文

I have a C++ class representing a hierarchically organised data tree which is very large (~Gb, basically as large as I can get away with in memory). It uses an STL list to store information at each node plus iterators to other nodes. Each node has only one parent, but 0-10 children.
Abstracted, it looks something like:

struct node {
public:
    node_list_iterator parent;              // iterator to a single parent node
    double node_data_array[X];
    map<int,node_list_iterator> children;   // iterators to child nodes
};

class strategy {
private:
    list<node> tree;        // hierarchically linked list of nodes
    struct some_other_data;
public:
    void build();           // build the tree
    void save();            // save the tree from disk
    void load();            // load the tree from disk
    void use();             // use the tree
};

I would like to implement the load() and save() to disk, and it should be fairly fast, however the obvious problems are:

I don't know the size in advance;
The data contains iterators, which
are volatile;
My ignorance of C++ is prodigious.

Could anyone suggest a pure C++ solution please?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

忆梦 2024-09-05 23:01:10

似乎您可以使用以下语法保存数据：

File = Meta-data Node
Node = Node-data ChildCount NodeList
NodeList = sequence (int, Node)

也就是说，当序列化时，根节点包含所有节点，直接（子节点）或间接（其他后代）。编写格式相当简单：只需从根节点开始递归写入函数即可。

读书并没有那么难。 std::list 迭代器是稳定的。一旦插入根节点，它的迭代器就不会改变，即使插入其子节点时也是如此。因此，当您读取每个节点时，您已经可以设置父迭代器。当然，这会给您留下子迭代器，但这些都是微不足道的：每个节点都是其父节点的子节点。因此，在读取所有节点后，您将修复子迭代器。从第二个节点、第一个子节点（第一个节点是根节点）开始，迭代到最后一个子节点。然后，对于每个子 C，将其父级和子级获取到其父级的集合。现在，这意味着您必须在读取时将 int 子 ID 设置在一边，但您可以在与 std::list< 并行的简单 std::vector 中执行此操作/代码>。一旦您修补了相应父项中的所有子 ID，您就可以丢弃该向量。

It seems like you could save the data in the following syntax:

File = Meta-data Node
Node = Node-data ChildCount NodeList
NodeList = sequence (int, Node)

That is to say, when serialized the root node contains all nodes, either directly (children) or indirectly (other descendants). Writing the format is fairly straightforward: just have a recursive write function starting at the root node.

Reading isn't that much harder. std::list<node> iterators are stable. Once you've inserted the root node, its iterator will not change, not even when inserting its children. Hence, when you're reading each node you can already set the parent iterator. This of course leaves you with the child iterators, but those are trivial: each node is a child of its parents. So, after you've read all nodes you'll fix up the child iterators. Start with the second node, the first child (The first node one was the root) and iterate to the last child. Then, for each child C, get its parent and the child to its parent's collection. Now, this means that you have to set the int child IDs aside while reading, but you can do that in a simple std::vector parallel to the std::list<node>. Once you've patched all child IDs in the respective parents, you can discard the vector.

回复收藏 0 原文

我一直都在从未离去 2024-09-05 23:01:10

您可以使用 boost.serialization 库。这将保存容器的整个状态，甚至是迭代器。

回复收藏 0 原文

人间☆小暴躁 2024-09-05 23:01:10

boost.serialization 是一种解决方案，或者恕我直言，您可以使用 SQLite + Visitor 模式来加载和保存这些节点，但这并不像听起来那么容易。

回复收藏 0 原文

小红帽 2024-09-05 23:01:10

Boost Serialization 已经被建议，而且这当然是一个合理的可能性。

很大程度上取决于您将如何使用数据——您在内存中使用多路树这一事实并不意味着您必须将其作为多路树存储在磁盘上。由于您（显然）已经突破了内存中存储内容的限制，因此明显的问题是您是否只是对序列化数据感兴趣，以便在以下情况下可以重新构建同一棵树：或者您是否需要类似数据库的东西，以便您可以根据需要将部分信息加载到内存中，并根据需要更新记录。

如果您想要后者，您的一些选择还将取决于结构的静态程度。例如，如果某个特定节点有 N 个子节点，那么该数字是恒定的还是会发生变化？如果有变化，儿童的最大数量有限制吗？

如果您确实希望能够遍历磁盘上的结构，一种明显的可能性是在将其写出时，用适当数据的文件偏移量代替您在内存中使用的迭代器。

或者，由于单个节点中的数据（至少大部分）看起来具有固定大小，因此您可以创建一个类似数据库的固定大小记录结构，并在每个记录中记录父/子的记录号。

提前知道总体尺寸并不是特别重要（临时，即使提前知道尺寸，我也想不出任何使用该尺寸的方法）。

回复收藏 0 原文

明媚殇 2024-09-05 23:01:10

实际上，我认为最好的选择是将整个数据结构移动到数据库表中。这样，您就可以从比您（或我）处理序列化问题更聪明的人那里受益。它还可以让您不必担心该结构是否适合内存。

回复收藏 0 原文

故笙诉离歌 2024-09-05 23:01:10

我之前已经回答过类似的问题，所以我总结一下：
1.使用数据库。
2. 用文件偏移量替换链接（指针）。
3. 将没有树结构的数据存储在记录中，像数据库一样。
4. 使用 XML 创建树结构，使用节点名称而不是链接。
5. 如果您使用 SqLite 或 MySQL 这样的数据库，这会容易得多。

当您在“序列化”上花费太多时间而在项目的主要目的上花费较少时间时，您需要使用数据库。