如何在 B+ 中实现字符串键树?
许多b+树示例都是使用整数键实现的,但我见过一些同时使用整数键和字符串键的其他示例,我学习了b+树基础,但我不明白字符串键是如何工作的?
Many b+ tree examples are implemented using integer key, but i had seen some other examples using both integer key and string key, i learned the b+ tree basis, but i don't understand how string key works?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我还使用多级 B 树。假设 test 是一个字符串,可以将其视为 [t,e,s,t] 数组。现在想想一棵树中的树。每个节点在某一位置只能容纳一个字符。您还需要考虑某个键/值数组实现,例如不断增长的数组、树或其他内容的链接列表。它还可以使节点大小动态化(字母数量有限)。
如果所有键都适合叶子,则将其存储在叶子中。如果叶子变大,您可以添加新节点。
现在,由于每个节点都知道其字母和位置,因此您可以从叶子中的键中删除这些字符,并在搜索时或如果您知道叶子+叶子中的位置则重建它们。
如果现在创建树后,以某种格式编写树,则最终会进行字符串压缩,其中每个字母组合(前缀)仅存储一次,即使它由 1000 个字符串共享。
简单压缩通常会导致普通文本(任何语言!)的压缩率为 1:10,内存中的压缩率为 1:4。你也可以搜索任何给定的单词(这是你使用 B+Tree 的字典中的字符串。
这是你可以使用多级的一个极端。
数据库通常使用特定的前缀树(前 x 个字符并存储叶中休息并在叶中使用二分搜索)。此外,还有一些实现基于实际密度使用可变前缀长度,因此最终它是非常具体的实现,并且
如果树应该提供帮助的话 。找到确切的字符串。通常添加长度并使用每个字符的低位的哈希值可以实现这一点,例如,您可以生成长度为(8位)+ 4位* 6个字符= 32位的哈希值 - >它是您的哈希码。或者您可以使用第一个、最后一个和中间的字符,因为长度是最具选择性的字符之一,因此您在搜索字符串时不会发现很多冲突。
此解决方案非常适合查找特定字符串,但会破坏字符串的自然顺序。字符串,因此您没有机会回答范围查询等。但当你搜索特定的用户名/电子邮件或地址时,这些树会更优越(但问题是为什么不使用哈希图)。
I also use a multi level B-Tree. Having a string lets say test can be seen as an array of [t,e,s,t]. Now think about a tree of trees. Each node can only hold one character for a certain position. You also need to think about a certain key /value array implementation like a growing linked list of arrays, trees or whatever. It also can make the node size dynamic (limited amount of letters).
If all keys fit the leaf, you store it in the leaf. If the leaf gets to big, you can add new nodes.
And now since each node knows its letter and position, you can strip those characters from the keys in the leaf and reconstruct them as you search or if you know the leaf + the position in the leaf.
If you now, after you have created the tree, write the tree in a certain format, you end up having string compression where you store each letter combination (prefix) only once even if it is shared by 1000ends of strings.
Simple compression often results in a 1:10 compression for normal text (in any language!) and in memory in 1:4. And also you can search for any given word (which are the strings in your dictionary you used the B+Tree for.
This is one extrem where you can use multilevel.
Databases usually use a certain prefix tree (the first x characters and store the rest in the leafs and use binary search within the leaf). Also there are implementations that use variable prefix lengths based on the actual density. So in the end it is very implementation specific and a lot of options exist.
If the tree should aid in finding the exact string. Often adding the length and using hash of lower bits of each characters do the trick. For example you could generate a hash out of length(8bit) + 4bit * 6 characters = 32Bit -> its your hash code. Or you can use the first, last and middle characters along with it. Since the length is one of the most selective you wont find many collisions while search your string.
This solution is very good for finding a particular string but destroyes the natural order of the strings so giving you no chance of answering range queries and alike. But for times where you search for a particular username / email or address those tree would be supperior (but question is why not use a hashmap).
字符串键可以是指向字符串的指针(很可能)。
或者可以调整琴键的尺寸以适合大多数琴弦。 64 位可容纳 8 字节字符串,甚至 16 字节密钥也不算太荒谬。
选择密钥实际上取决于您计划如何使用它。
The string key can be a pointer to a string (very likely).
Or the key could be sized to fit most strings. 64 bits holds 8 byte strings and even 16 byte keys aren't too ridiculous.
Choosing a key really depends on how you plan to use it.