C/C++:如何在B树中的文件中存储数据
在我看来,将数据作为文件存储在 B 树中的一种方法可以使用 C 语言使用带有结构序列(数组)的二进制文件来有效完成,每个结构代表一个节点。因此,我们可以使用类似于使用数组创建链表的方法来连接各个节点。但随之而来的问题是删除节点,因为在一个大文件中只删除中间的几个字节是不可能的。
删除的一种方法可能是跟踪“空”节点,直到达到阈值截止,然后创建另一个文件来丢弃空节点。但这很乏味。
从简单/效率的角度来看,是否有更好的方法来删除,甚至在文件中表示 B 树?
TIA, -斯维亚
It appears to me that one way of storing data in a B-tree as a file can be done efficiently with C using binary file with a sequence (array) of structs, with each struct representing a node. One can thus connect the individual nodes with approach that will be similar to creating linked lists using arrays. But then the problem that props up would be deletion of a node, as erasing only a few bytes in the middle in a huge file is not possible.
One way of deleting could be to keep track of 'empty' nodes until a threshold cutoff is reached and then make another file that will discard the empty nodes. But this is tedious.
Is there a better approach from the simplicity/efficiency point of view for deleting, or even representing a B-tree in a file?
TIA,
-Sviiya
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
要在文件中实现 B 树,可以使用文件偏移量而不是指针。此外,您还可以实现“文件内存管理器”,以便您可以重新使用文件中已删除的项目。
为了完全恢复 B-Tree 文件中已删除的块,您必须在新文件中重新创建 B-Tree。另请记住,大多数操作系统没有截断文件的方法。截断文件的一种可移植方法是写入一个新文件并销毁旧文件。
另一个建议是将文件分区为B-Tree分区和数据(项)分区。 B 树分区将包含页面。叶页将包含数据项的偏移量。数据分区将是文件中包含数据项的部分。您最终可能会为每个分区创建多个分区,并且这些分区可能会交错。
我花了很多时间研究基于文件的 B 树,直到我放弃并决定让数据库程序(或服务器)为我处理数据。
For implementing B-Trees in a file, you can use the file offset instead of pointers. Also, you can implement a "file memory manager", so that you can re-use deleted items in the file.
In order to fully recover the deleted blocks in a B-Tree file, you will have to recreate the B-Tree in a new file. Also remember the most OSes have no methods for truncating files. A portable method for truncating a file is to write a new file and destroy the old.
Another suggestion is to partition the file into B-Tree partition and data (item) partition. A B-Tree partition will contain the pages. The leaf pages will contain offsets to the data items. The data partition will be a section in the file containing data items. You may end up creating more than one of each partition and the partitions may be interleaved.
I spent much time playing with a file based B-Tree, until I gave up and decided to let a database program (or server) handle the data for me.
我做了一个非常快速的搜索并挖出了这个: http://people.csail.mit.edu /jaffer/WB C 源代码:http://cvs. savannah.gnu.org/viewvc/wb/wb/c/ - 它似乎提供基于磁盘的 B 树样式数据库 - 尽管查看“delete.c”,它似乎暗示如果您删除一个节点下的所有内容都将被取出 - 如果这是正确的行为,那么听起来可能会有帮助?
另外 - B 树经常在文件系统中使用 - 你能不看看一些文件系统代码吗?
我自己的倾向是文件系统 - 如果你有一个固定大小的 B 树,每当你“删除”一个节点而不是尝试删除引用时,只需将值设置为在代码中没有任何意义的值。然后,运行一个清理线程,检查是否有人打开该文件进行读取,以及是否一切正常都会阻止该文件并进行清理。
I did a very quick search and dug up this: http://people.csail.mit.edu/jaffer/WB C source: http://cvs.savannah.gnu.org/viewvc/wb/wb/c/ - it seems to offer disk-based B-tree style databases - although taking a look at "delete.c" it seemed to imply if you delete a node everything down from it would be taken out - if that's the correct behaviour then it sounds like something that might help?
Also - B-trees are often used in filesystems - could you not take a look at some filesystem code?
My own inclination is that of a file-system - if you have a B-tree of fixed-size, whenever you "delete" a node rather than attempting to remove the reference, just set the value to whatever means nothing in your code. Then, have a clean-up thread running that checks if anyone has the file open for reading and if all's quiet blocks the file and tidies up.
您也可以使用 Berkley DB。它与 C 程序配合良好,并实现了 B+ 树。
You can use Berkley DB as well. It works well with C programs and implements B+ tree.