使用 B 树索引器进行磁盘访问
iv'e 实现了一个 B+tree ,我的叶节点指向行(记录)位置的开头 在 CSV 文件中,
我的问题是:
我的树被设计为除了树顺序值,即(每个树节点中的指针数量),
据我了解,顺序值应该通过能够读取整个树来优化磁盘访问在一次磁盘访问操作中将磁盘块放入内存。
我不明白这是如何发挥作用的,假设我知道磁盘的块大小 我根据一些计算给订单一个适当的值
,例如: (order * sizeof( Record ) ) < block_size
访问数据:
正如我所说,指针保存文件路径和到行(记录)开头的偏移量
StreamReader reader ;
reader.BaseStream.Position = leaf.Pointers[i].offset ; // leaf is a leaf node in the tree
string record = reader.ReadLine();
1)ReadLine()操作相当于一次磁盘访问吗? 如果是这样,我访问数据的方式将是相同的(磁盘访问明智而不是搜索明智),不会受到树节点的顺序(大小)的影响。
2)如何更改磁盘访问方法以根据磁盘块大小进行优化?
iv'e implemented a B+tree , my leaf nodes point to start of line(record) positions
in a CSV file ,
my question is :
my tree is designed to except a tree - ORDER value i.e. ( The number of Pointers in each tree Node )
to my understanding the order value is suppose to optimize disk access by being able to read an entire disk block into memory in one disk access operation .
what i don't understand is how this comes into play , lets say i know the disk's block size
i give the Order an appropriate value according to some calculation
for example :
(order * sizeof( Record ) ) < block_size
Accessing the Data :
The pointer as i said holds a file path and an offset to the beginning of the line(Record)
StreamReader reader ;
reader.BaseStream.Position = leaf.Pointers[i].offset ; // leaf is a leaf node in the tree
string record = reader.ReadLine();
1) is the ReadLine() operation equivalent to one disk Access ?
if so the way in witch i access my data would be the same (Disk Access wise not search wise ) ,would not be affected by the ORDER (size) of my tree nodes .
2) how would one change the disk access method to be optimized according to disk block size ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,你的ReadLine()是磁盘访问;然而,这并不真正适用于你。您本质上是在“行外”存储所有数据。典型的 BTree 或 B+Tree 将直接将数据存储在树结构中,而不是相关文件中。您对 BTree 机制本身的细节了解甚少,因此我无法告诉您顺序/大小会产生什么影响。您是存储 BTree 结构还是只是构建内存索引?如果它被存储那么你如何保存/加载图中的节点?
此回应的第一部分可能会对此有所启发。基本上,您需要不再使用相关的 CSV 文件,并将数据存储在树本身中。
Yes, your ReadLine() is a disk access; however, this doesn't really apply to you. You are essentially storing all data 'out of row'. A typical BTree or B+Tree would store the data directly inside tree structure not in a related file. You are a little light on details about the BTree mechanics itself so I can't tell you what the order/size would impact. Are you storing the BTree structure or just building an in-memory index? If it is stored then how are you saving/loading a node in the graph?
The first part of this response might shed some light on it. Basically you would need to move away from using the related CSV file and store the data in the tree itself.