是否有可能编写一个真正通用的磁盘烘焙 B+Tree 实现?

发布于 2024-10-04 10:34:49 字数 405 浏览 7 评论 0原文

几年前,我用 C++ 编写了一个通用的内存中 B+Tree 实现,我正在考虑将其持久保存在磁盘上(这就是最初设计 B+Tree 的原因)。 我的第一个想法是使用 mmap (我在 Linux 下)能够像普通内存一样操作文件,只需重写节点类的 new 运算符,以便它返回映射部分中的指针并创建一个智能指针,它可以将 RAM 地址转换为文件偏移量,以将我的节点与其他节点链接起来。 但我希望我的实现是通用的,因此用户可以在 B+ 树中存储 int、std::string 或他想要的任何自定义类。 这就是问题发生的地方:对于不包含指针的原始类型或聚合类型来说这一切都很好,但是一旦对象包含对堆分配对象的指针/引用,这种方法就不再有效。

所以我的问题是:有没有一些已知的方法可以克服这个困难?我对这个主题的个人搜索最终没有成功,但也许我错过了一些东西。

I wrote a generic in-memory B+Tree implementation in C++ few times ago, and I'm thinking about making it persistent on disk (which is why B+Tree have been designed for initially).
My first thought was to use mmap (I'm under Linux) to be able to manipulate the file as normal memory and just rewrite the new operator of my nodes classes so that it returns pointers in the mapped portion and create a smart pointer which can convert RAM adresses to file offset to link my nodes with others.
But I want my implementation to be generic, so the user can store an int, an std::string, or whatever custom class he wants in the B+tree.
That's where the problem occurs: for primitive types or aggregated types that do not contain pointers that's all good, but as soon as the object contains a pointer/reference to an heap allocated object, this approach no longer works.

So my question is: is there some known way to overcome this difficulty? My personnal searches on the topic end up unsuccessful, but maybe I missed something.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

微暖i 2024-10-11 10:34:49

据我所知,有三种(稍微)简单的方法可以解决这个问题。

方法 1:编写一个指向某些预分配内存的 std::streambuf

这种方法允许您使用operator<<并使用任何已经存在的现有代码来获取您想要的字符串表示形式。

  • 优点:重用大量现有代码。
  • 缺点:无法控制 operator<< 如何吐出内容。
  • 缺点:仅基于文本的表示。

方法 2:编写自己的(多次重载)输出函数。

  • 优点:可以提出二进制表示。
  • 优点:精确控制每种输出格式。
  • 缺点:重写如此多的输出函数...客户端为新类型编写重载是一件痛苦的事情,因为他们不应该编写属于您的库名称空间的函数...除非您求助于 Koenig(依赖参数)查找!

方法 3:编写 btree_traits 模板。

  • 优点:可以提出二进制表示。
  • 优点:精确控制每种输出格式。
  • 优点:对函数的输出和格式有更多控制,可能包含元数据等。
  • 缺点:仍然需要您/您的库的用户编写大量自定义重载。
  • 优点:拥有 btree_traits<> 默认值来使用 operator<< 除非有人覆盖这些特征?

As far as I know, there are three (somewhat) easy ways to solve this.

Approach 1: write a std::streambuf that points to some pre-allocated memory.

This approach allows you to use operator<< and use whatever existing code already exists to get a string representation of what you want.

  • Pro: re-use loads of existing code.
  • Con: no control over how operator<< spits out content.
  • Con: text-based representations only.

Approach 2: write your own (many times overloaded) output function.

  • Pro: can come up with binary representation.
  • Pro: exact control over every single output format.
  • Con: re-write so many output functions... writing overloads for new types by clients is a pain because they shouldn't write functions that fall in your library's namespace... unless you resort to Koenig (argument dependant) lookup!

Approach 3: write a btree_traits<> template.

  • Pro: can come up with binary representation.
  • Pro: exact control over every single output format.
  • Pro: more control on output and format that a function, may contain meta data and all.
  • Con: still requires you / your library's users to write lots of custom overloads.
  • Pro: have the btree_traits<> detault to use operator<< unless someone overrides the traits?
把时间冻结 2024-10-11 10:34:49

您无法编写真正通用且透明的版本,因为如果非平凡项中的指针是使用 malloc (或 new 和 new[])分配的,那么它已经在堆中。

非透明的解决方案可能是序列化类是一种选择,并且这可以相对容易地完成。在存储类之前,您必须调用序列化函数,在拉取它之前,您需要调用反序列化。 Boost 具有良好的序列化功能,您可以将其与 B+Tree 一起使用。

You cannot write a truly generic and transparent version since if the pointer in a non-trivial item was allocated with malloc (or new and new[]), then it's already in the heap.

A non-transparent sollution may be serializing the class is an option, and this can be done relatively easy. Before you store the class you'd have to call the serialization function and before pulling it you'd call the deserialize. Boost has good serialization features that you could make work with your B+Tree.

朮生 2024-10-11 10:34:49

以通用方式处理指针和引用意味着您需要检查要存储的结构的类型及其字段。 C++ 是一种不以其反射性而闻名的语言。

但即使是一种具有强大反射能力的语言,这个问题的通用解决方案也很困难。您也许能够让它适用于 Python、Ruby 等高级语言中类型的子集。一个相关且更强大的范例是 持久编程语言

您想要的功能通常是通过将写入数据块的责任委托给目标类型本身来实现的。它称为序列化。它只是意味着编写一个带有转储数据的方法和加载数据的方法的接口。任何想要保留在 B 树中的类只需实现此接口即可。

Handling pointers and references in a generic way means you will need to inspect the type of the structure you're trying to store, and its fields. C++ is a language not known for its reflectiveness.

But even in a language with powerful reflection, a generic solution to this problem is difficult. You might be able to get it to work for a subset of types in higher level languages like Python, Ruby, etc. A related and more powerful paradigm is the persistent programming language.

The function you want is usually implemented by delegating responsibility for writing the data block to the target type itself. It's called serialization. It simply means writing an interface with a method to dump data, and a method to load data. Any class that wants to be persisted in your B-tree then simply implements this interface.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文