节省空间的特里树

发布于 2024-11-16 10:48:53 字数 619 浏览 4 评论 0原文

我正在尝试在 C 中实现一个节省空间的 trie。这是我的结构：

struct node {
char val; //character stored in node
int key; //key value if this character is an end of word
struct node* children[256];
};

当我添加一个节点时，它的索引是字符的 unsigned char 转换。例如，如果我想添加“c”，则

children[(unsigned char)'c']

是指向新添加的节点的指针。但是，此实现要求我声明一个包含 256 个元素的 node* 数组。我想要做的是：

struct node** children;

然后在添加节点时，只需为该节点分配空间并

children[(unsigned char)'c']

指向新节点。问题是，如果我不首先为子级分配空间，那么我显然无法引用任何索引，否则这是一个很大的错误。

所以我的问题是：如何实现一个 trie，使其仅存储指向其子级的非空指针？

原文

I'm trying to implement a space efficient trie in C. This is my struct:

struct node {
char val; //character stored in node
int key; //key value if this character is an end of word
struct node* children[256];
};

When I add a node, it's index is the unsigned char cast of the character. For example, if I want to add "c", then

children[(unsigned char)'c']

is the pointer to the newly added node. However, this implementation requires me to declare a node* array of 256 elements. What I want to do is:

struct node** children;

and then when adding a node, just malloc space for the node and have

children[(unsigned char)'c']

point to the new node. The issue is that if I don't malloc space for children first, then I obviously can't reference any index or else that's a big error.

So my question is: how do I implement a trie such that it only stores the non-null pointers to its children?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

溺孤伤于心 2024-11-23 10:48:53

您可以尝试使用 de la Briandais trie，其中每个节点只有一个子指针，并且每个节点还有一个指向“兄弟节点”的指针，因此所有兄弟节点都有效地存储为链表，而不是直接由父节点指向。

回复收藏 0 原文

拥抱没勇气 2024-11-23 10:48:53

您实际上不可能同时拥有这两种方法，既能节省空间，又能在子节点中进行 O(1) 查找。

当您只为实际添加的条目而不是空指针分配空间时，您不能再这样做

children[(unsigned char)'c']

，因为您不能再直接索引到数组中。

一种替代方法是简单地对子级进行线性搜索。并存储 children 数组有多少条目的附加计数，即

children[(unsigned char)'c'] = ...;

必须成为

for(i = 0; i < len; i++) {
  if(children[i] == 'c')
     break;
} 
if(i == len) {
  //...reallocate and add space for one item in children
}
children[i] = ...;

如果您的树最终在某一层有很多非空条目，您可以按排序顺序插入子条目，并且进行二分搜索。或者您可以将子级添加为链接列表而不是数组。

You can't really have it both ways and be both space efficient and have O(1) lookup in the children nodes.

When you only allocate space for the entries that's actually added, and not the null pointers, you can no longer do

children[(unsigned char)'c']

As you can no longer index directly into the array.

One alternative is to simply do a linear search through the children. and store an additional count of how many entries the children array has i.e.

children[(unsigned char)'c'] = ...;

Have to become

for(i = 0; i < len; i++) {
  if(children[i] == 'c')
     break;
} 
if(i == len) {
  //...reallocate and add space for one item in children
}
children[i] = ...;

If your tree ends up with a lot of non-empty entries at one level, you might insert the children in sorted order and do a binary search. Or you might add the childrens as a linked list instead of an array.

回复收藏 0 原文

你是暖光i 2024-11-23 10:48:53

如果你只想进行英文关键词搜索，我认为你可以最小化你的孩子的大小，从 256 到仅仅 26 - 足以覆盖 26 个字母 az。

此外，您可以使用链表来保持子级的数量更小，这样我们就可以进行更有效的迭代。

我还没有浏览过这些库，但我认为 trie 实现会有所帮助。

回复收藏 0 原文

尴尬癌患者 2024-11-23 10:48:53

通过将每个节点的子节点设置为节点的哈希表，您可以既节省空间又保持恒定的查找时间。特别是当涉及到 Unicode 字符并且字典中可以包含的字符集不限于 52 个以上时，这更多的是一个要求而不是一个细节。这样您就可以保留使用 trie 的优点，同时提高时间和空间效率。

我还必须补充一点，如果您使用的字符集接近无界，那么有一个节点链接列表可能就可以了。如果您喜欢难以管理的噩梦，您可以选择一种混合方法，其中前几个级别将其子级保留在哈希表中，而较低级别则有一个它们的链接列表。对于真正的错误农场，请选择动态的错误农场，其中当每个链接列表通过阈值时，您可以将其动态转换为哈希表。您可以轻松地摊销成本。

可能性是无限的！

回复收藏 0 原文

~没有更多了~