可变大小键的 B 树实现
我希望为“一次使用”索引实现一个 B 树(用 Java 实现),其中插入几百万个键,然后对每个键进行几次查询。键是 <= 40 字节的 ascii 字符串,关联的数据始终占用 6 字节。选择B树结构是因为我的内存预算不允许我将整个临时索引保留在内存中。
我的问题是关于选择分支因子和在磁盘上存储节点的实际细节。在我看来,有两种方法:
- 一个节点始终适合一个块。通过选择分支因子 k 来实现,即使对于最坏情况的密钥长度,密钥、数据和控制结构的存储要求也小于系统块大小。 k 可能很低,并且节点在大多数情况下会有很多空闲空间。
- 一个节点可以存储在多个块上。分支因子的选择与密钥大小无关。加载单个节点可能需要加载多个块。
那么问题是:
- 第二种方法通常用于可变长度密钥吗?或者我错过了一些完全不同的方法?
- 鉴于我的用例,您会推荐不同的整体解决方案吗?
最后我应该提到,我知道 jdbm3 项目,并且正在考虑使用它。在任何情况下都会尝试实现我自己的,既作为学习练习,又看看特定情况的优化是否可以产生更好的性能。
编辑:目前阅读有关 SB-Trees 的内容:
I'm looking to implement a B-tree (in Java) for a "one use" index where a few million keys are inserted, and queries are then made a handful of times for each key. The keys are <= 40 byte ascii strings, and the associated data always takes up 6 bytes. The B-tree structure has been chosen because my memory budget does not allow me to keep the entire temporary index in memory.
My issue is about the practical details in choosing a branching factor and storing nodes on disk. It seems to me that there are two approaches:
- One node always fit within one block. Achieved by choosing a branching factor k so that even for the worst case key-length the storage requirement for keys, data and control structures are <= the system block size. k is likely to be low, and nodes will in most cases have a lot of empty room.
- One node can be stored on multiple blocks. Branching factor is chosen independent of key size. Loading a single node may require that multiple blocks are loaded.
The questions are then:
- Is the second approach what is usually used for variable-length keys? or is there some completely different approach I have missed?
- Given my use case, would you recommend a different overall solution?
I should in closing mention that I'm aware of the jdbm3 project, and is considering using it. Will attempt to implement my own in any case, both as a learning exercise and to see if case specific optimization can yield better performance.
Edit: Reading about SB-Trees at the moment:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我在这里缺少选项 C:
对于所有选项,您都有一些变体:B* 或 B+ 树(请参阅维基百科)。
I'm missing option C here:
And for all options you have some variants: B* or B+ Trees (see Wikipedia).
JDBM BTree 已经是自我平衡的。它还具有非常快的碎片整理功能,可以解决上述所有问题。
没有必要。 JDBM3 使用映射内存,因此它永远不会将整个块从磁盘读取到内存。它在块顶部创建“视图”,并且仅读取实际需要的部分数据。因此,它可能只读取 2x128 字节,而不是读取完整的 4KB 块。这取决于底层操作系统块大小。
我认为您忽略了一点:增加磁盘大小会降低性能,因为必须读取更多数据。并且单棵树可以共享两种方法(首先是新插入的节点,其次是碎片整理后的节点)。
无论如何,带有映射内存缓冲区的平面文件可能最适合您的问题。由于您的记录大小是固定的并且只有几百万条记录。
还可以看看leveldb。它有新的java端口,几乎击败了JDBM:
https://github.com/dain/leveldb
http://code.google.com/p/leveldb/
JDBM BTree is already self balancing. It also have defragmentation which is very fast and solves all problems described above.
Not necessary. JDBM3 uses mapped memory, so it never reads full block from disk to memory. It creates 'a view' on top of block and only read partial data as actually needed. So instead of reading full 4KB block, it may read just 2x128 bytes. This depends on underlying OS block size.
I think you missed point that increasing disk size decreases performance, as more data have to be read. And single tree can have share both approaches (newly inserted nodes first, second after defragmentation).
Anyway, flat-file with mapped memory buffer is probably best for your problem. Since you have fixed record size and just a few million records.
Also have look at leveldb. It has new java port which almost beats JDBM:
https://github.com/dain/leveldb
http://code.google.com/p/leveldb/
如果您使用一些嵌入式数据库,您可以避免这种麻烦。这些已经为您解决了这些问题以及更多问题。
您还可以写:“几百万个键”...“[最大] 40 字节 ascii 字符串”和“6 字节[相关数据]”。这算不上正确。一千兆 RAM 可以让您存储超过“几百万”的条目。
You could avoid this hassle if you use some embedded database. Those have solved these problems and some more for you already.
You also write: "a few million keys" ... "[max] 40 byte ascii strings" and "6 bytes [associated data]". This does not count up right. One gig of RAM would allow you more then "a few million" entries.