将 trie 持久保存到文件中 - C

发布于 2024-08-27 09:46:42 字数 310 浏览 4 评论 0原文

我有一个 trie,我用它来进行一些字符串处理。我有一个简单的编译器,它可以从一些数据生成 trie 。一旦生成,我的 trie 在运行时就不会改变。

我正在寻找一种方法,可以将特里结构保存在文件中并有效地加载它。我查看了sqllite来了解它们如何持久化b-tree,但它们的文件格式看起来有点高级,我可能不需要所有这些。

如果有人可以提供一些坚持并阅读 trie 的想法,那将会很有帮助。我正在使用 C 进行编程。

I have a trie which I am using to do some string processing. I have a simple compiler which generates trie from some data. Once generated, my trie won't change at run time.

I am looking for an approach where I can persist the trie in a file and load it effectively. I have looked at sqllite to understand how they are persisting b-treebut their file format looks bit advanced and I may not need all of those.

It'd be helpful if someone can provide some ideas to persist and read the trie. I am programming using C.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

不再见 2024-09-03 09:46:42

我做了一些研究并在网上发现了以下小宝石:

  1. trie.h
  2. trie.c

具有序列化和反序列化功能的工作 trie。它最初是为在 Python 中使用而编写的(有一个相应的 triemodule.c 用于将其绑定到 Python),但它是纯 C 语言;你可以挖掘它的想法或按照你的意愿使用它。

更新

链接似乎不再有效。我会保留原件,但这里是回程机中的链接:

  1. trie .h
  2. trie.c

I did some research and found the following little gems online:

  1. trie.h
  2. trie.c

A working trie with serialization and deserialization. It was originally written for use in Python (there's a corresponding triemodule.c for tying it to Python), but it's pure C; you could mine it for ideas or use it as you wish.

Update:

It appears the links are no longer working. I'll keep the originals up, but here are the links in the wayback machine:

  1. trie.h
  2. trie.c
多像笑话 2024-09-03 09:46:42

假设整个数据结构都适合内存,递归序列化方法是最简单的。 Sqllite 使用不适合内存的数据结构,因此尝试复制它们的方法可能有点过分了。

这是用于读/写节点的示例伪代码。它通过递归地读取/写入子节点来工作。它没有任何特定于 trie 的内容,并且也应该适用于其他树数据结构。

void writeNode(Node *node)
    write node data to file
    write node.numOfChildren to file
    for each child:
        writeNode(child)

Node *readNode()
    Node *node = allocateNewNode()
    read node data from file
    read node.numOfChildren from file
    for (i=0; i<node.numOfChildren; i++)
        Node *child = readNode()
        node.addChild(child)

Assuming your entire data structure fits in memory, a recursive serialization approach is simplest. Sqllite works with data structures that won't fit in memory, so it is probably overkill to try copying their methods.

Here is example pseudocode for reading/writing a node. It works by recursively reading/writing the child nodes. It has nothing trie-specific, and should work for other tree data structures as well.

void writeNode(Node *node)
    write node data to file
    write node.numOfChildren to file
    for each child:
        writeNode(child)

Node *readNode()
    Node *node = allocateNewNode()
    read node data from file
    read node.numOfChildren from file
    for (i=0; i<node.numOfChildren; i++)
        Node *child = readNode()
        node.addChild(child)
是你 2024-09-03 09:46:42

如果所有节点的大小相同,那么您只需枚举节点 (root = 0) 并将每个节点写入其索引处的文件中即可。不过,在编写它们时,您必须将对其他节点的引用更改为这些节点的索引。您可能还需要一个 NULL 值。您可以使用 -1 或使用 (root = 1) 和 (NULL = 0)。

您可能还可以压缩这些节点通过将它们的指针字段更改为更小的类型来稍微改变。

如果您的节点大小不同,那么情况会更复杂。

If all of your nodes are the same size then you can just enumerate your nodes (root = 0) and write each of them to a file at their index. While writing them you will have to change their references to other nodes to those nodes' indexes, though. You will probably also need a NULL value. You could use -1 or you could use (root = 1) and (NULL = 0).

You will probably also be able to compress these nodes somewhat by changing their pointer fields to be smaller types.

If your nodes are different sizes then it's more complicated.

烙印 2024-09-03 09:46:42

一旦在内存中生成了trie,将其写入文件并不复杂,使用mmap,就像我们在连续的内存块中重新生成trie,或者可以用递归函数将每个节点写入文件中。

Once you generated a trie in memory, it's not complex to write it into a file, with mmap, it's like that we re-generate the trie in a continous memory block, or you can write each node into a file in a recursive function.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文