是否有可能编写一个真正通用的磁盘烘焙 B+Tree 实现?
几年前,我用 C++ 编写了一个通用的内存中 B+Tree 实现,我正在考虑将其持久保存在磁盘上(这就是最初设计 B+Tree 的原因)。 我的第一个想法是使用 mmap (我在 Linux 下)能够像普通内存一样操作文件,只需重写节点类的 new 运算符,以便它返回映射部分中的指针并创建一个智能指针,它可以将 RAM 地址转换为文件偏移量,以将我的节点与其他节点链接起来。 但我希望我的实现是通用的,因此用户可以在 B+ 树中存储 int、std::string 或他想要的任何自定义类。 这就是问题发生的地方:对于不包含指针的原始类型或聚合类型来说这一切都很好,但是一旦对象包含对堆分配对象的指针/引用,这种方法就不再有效。
所以我的问题是:有没有一些已知的方法可以克服这个困难?我对这个主题的个人搜索最终没有成功,但也许我错过了一些东西。
I wrote a generic in-memory B+Tree implementation in C++ few times ago, and I'm thinking about making it persistent on disk (which is why B+Tree have been designed for initially).
My first thought was to use mmap (I'm under Linux) to be able to manipulate the file as normal memory and just rewrite the new operator of my nodes classes so that it returns pointers in the mapped portion and create a smart pointer which can convert RAM adresses to file offset to link my nodes with others.
But I want my implementation to be generic, so the user can store an int, an std::string, or whatever custom class he wants in the B+tree.
That's where the problem occurs: for primitive types or aggregated types that do not contain pointers that's all good, but as soon as the object contains a pointer/reference to an heap allocated object, this approach no longer works.
So my question is: is there some known way to overcome this difficulty? My personnal searches on the topic end up unsuccessful, but maybe I missed something.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
据我所知,有三种(稍微)简单的方法可以解决这个问题。
方法 1:编写一个指向某些预分配内存的
std::streambuf
。这种方法允许您使用
operator<<
并使用任何已经存在的现有代码来获取您想要的字符串表示形式。operator<<
如何吐出内容。方法 2:编写自己的(多次重载)输出函数。
方法 3:编写
btree_traits
模板。btree_traits<>
默认值来使用operator<<
除非有人覆盖这些特征?As far as I know, there are three (somewhat) easy ways to solve this.
Approach 1: write a
std::streambuf
that points to some pre-allocated memory.This approach allows you to use
operator<<
and use whatever existing code already exists to get a string representation of what you want.operator<<
spits out content.Approach 2: write your own (many times overloaded) output function.
Approach 3: write a
btree_traits<>
template.btree_traits<>
detault to useoperator<<
unless someone overrides the traits?您无法编写真正通用且透明的版本,因为如果非平凡项中的指针是使用 malloc (或 new 和 new[])分配的,那么它已经在堆中。
非透明的解决方案可能是序列化类是一种选择,并且这可以相对容易地完成。在存储类之前,您必须调用序列化函数,在拉取它之前,您需要调用反序列化。 Boost 具有良好的序列化功能,您可以将其与 B+Tree 一起使用。
You cannot write a truly generic and transparent version since if the pointer in a non-trivial item was allocated with malloc (or new and new[]), then it's already in the heap.
A non-transparent sollution may be serializing the class is an option, and this can be done relatively easy. Before you store the class you'd have to call the serialization function and before pulling it you'd call the deserialize. Boost has good serialization features that you could make work with your B+Tree.
以通用方式处理指针和引用意味着您需要检查要存储的结构的类型及其字段。 C++ 是一种不以其反射性而闻名的语言。
但即使是一种具有强大反射能力的语言,这个问题的通用解决方案也很困难。您也许能够让它适用于 Python、Ruby 等高级语言中类型的子集。一个相关且更强大的范例是 持久编程语言。
您想要的功能通常是通过将写入数据块的责任委托给目标类型本身来实现的。它称为序列化。它只是意味着编写一个带有转储数据的方法和加载数据的方法的接口。任何想要保留在 B 树中的类只需实现此接口即可。
Handling pointers and references in a generic way means you will need to inspect the type of the structure you're trying to store, and its fields. C++ is a language not known for its reflectiveness.
But even in a language with powerful reflection, a generic solution to this problem is difficult. You might be able to get it to work for a subset of types in higher level languages like Python, Ruby, etc. A related and more powerful paradigm is the persistent programming language.
The function you want is usually implemented by delegating responsibility for writing the data block to the target type itself. It's called serialization. It simply means writing an interface with a method to dump data, and a method to load data. Any class that wants to be persisted in your B-tree then simply implements this interface.