改变 inode 行为
我正在尝试修改 ext3 文件系统。基本上我想确保文件的索引节点保存在与其存储元数据的文件相同(或相邻)的块中。希望这应该有助于提高磁盘访问性能
我获取了内核源代码,对其进行了编译,阅读了一些有关 inode 的信息,并查看了 fs 子目录中的 inode.c 文件。但是,我只是不确定如何确保正在创建的任何新文件以及该文件的索引节点可以保存在相同或相邻的块中。任何帮助或进一步阅读的指示将不胜感激。谢谢!
I am trying to modify the ext3 file system. Basically I want to ensure that the inode for a file is saved in the same (or adjacent) block as the file that it stores metadata for. Hopefully this should help disk access performance
I grabbed the kernel source, compiled it, read a bunch about inodes and looked the inode.c file in the fs subdirectory. However, I am just not sure how I can ensure that any new file being created, and the inode for this file, can be saved in the same or adjacent blocks. Any help or pointers to further readings would be appreciated. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有趣的想法。
我对 ext3 不太熟悉,但我可以给你一些一般性的指导。
目前 ext3 将 inode 存储在预定位置。每个块组都有自己的索引节点表,即索引节点数组。因此,当您有一个 inode 编号(即,作为在目录中查找文件名的结果)时,您可以通过首先使用 inode 编号选择正确的块组,然后索引到该块来找到磁盘上相应的 inode组的索引节点表。
如果您想将索引节点放在相应的文件数据旁边,则需要一个新的方案来在磁盘上查找索引节点。如果您愿意为每个 inode 分配一个块,那么一种可能的方案是每次需要 inode 时分配一个新块,然后使用该块编号作为 inode 编号。这可能有一个好处,即对于小文件,您可以将数据存储在同一个块中。
为了实现这种情况,创建新文件(即分配 inode)的工作方式必须与当前 ext3 文件系统中的工作方式非常不同。您必须分配一个空块并自行初始化,而不是使用位图来查找未使用的、预先分配的和预先初始化的索引节点。因此,您可能想了解文件系统在写入文件时如何分配块,然后模仿它来分配 inode。
另一种方案是将索引节点存储在目录内。因此,您保存 I/O 并不是因为 inode 位于其数据旁边,而是因为当您查找文件名时,您也读取了 inode。这是在 90 年代在 BSD 的 FFS 文件系统中进行的一项实验,并以出色的 USENIX 论文。这些想法从未被纳入 FFS 或我所知道的任何其他主流文件系统中,因此了解它们在 ext3 中的工作原理可能会很有趣。
无论您是采用其中一种方案还是提出自己的方案,您都还必须修改mke2fs以按照新文件系统变体将要初始化的方式初始化磁盘上的文件系统。理解。
祝你好运!这听起来是一个有趣的项目。
Interesting idea.
I'm not deeply familiar with ext3, but I can give you some general pointers.
Currently ext3 stores inodes in predetermined places. Each block group has its own inode table, an array of inodes. So when you have an inode number (i.e., as the result of looking up a filename in a directory), you can find the corresponding inode on disk by using the inode number first to select the correct block group and then to index into that block group's inode table.
If you want to put the inodes next to the corresponding file data, you'll need a new scheme for finding an inode on disk. If you're willing to dedicate a block for each inode, then one possible scheme would be to allocate a new block every time you need an inode and then use the block number as the inode number. This might have the benefit that for small files you could store the data in that same block.
To make something like this happen, creating a new file (i.e., allocating an inode) would have to work very differently than in the current ext3 file system. Instead of using a bitmap to find an unused, pre-allocated and pre-initialized inode, you would have to allocate an empty block and initialize it yourself. So, you'll probably want to look at how the file system allocates blocks when it's writing to a file, then mimic that for allocating an inode.
An alternative scheme would be to store the inode inside the directory. So you save an I/O not because the inode is next to its data, but because when you lookup the filename you also read the inode. This was done back in the 90s as an experiment in BSD's FFS file system, and was written up in an excellent USENIX Paper. Those ideas never made it into FFS, or into any other main stream file system that I'm aware of, so it might be interesting to see how they work in ext3.
Regardless of whether you pursue one of these schemes or come up with something of your own, you'll also have to modify mke2fs to initialize the file system on disk in a way that your new file system variant will understand.
Good luck! It sounds like a fun project.
感谢您进入文件系统设计!
首先,在您深入研究黑客攻击之前,请先了解一些工程建议:复制 ext3 树并将文件系统重命名为其他名称。我发现,当向文件系统引入实验性更改时,您确实不希望将其用于主系统。即使您引入了随机丢失文件的错误(它最终会发生),您的系统仍然应该启动。您还需要对 ext3 用户空间工具进行分支才能与您的新系统配合使用。
其次,获取一份了解 Linux 内核,第 3 版,作者:播威和切萨蒂。它提供了内核子系统的组织视图,我发现它的解释是有价值的。它是为较旧的内核编写的(2.6.x 对于某些 x <15;我具体忘记了),但它在很多地方仍然准确。通读其对文件系统的描述。我相信它涵盖了 ext3。
第三,关于您的实际项目,您并不是建议对 ext3 进行简单的修改。该文件系统有一种非常简单的方法将索引节点号映射到磁盘块。您需要找到一种新的方法来进行此映射。我预计 ext3 的其余部分不会发生任何变化。解决这一挑战可能是您的架构的关键设计点之一。请注意,保留一个大的 inode 数组 ->磁盘块映射并不能解决您的问题:它可能并不比现有的 ext3 更好。
Kudos for getting into file system design!
First, a bit of engineering advice before you get too deep into hacking: make a copy of the ext3 tree and rename the file system to something else. I've found that when introducing experimental changes into a file system, you really don't want it to be used for your main system. Your system should still boot even if you introduce a bug that randomly loses files (it will eventually happen). You'll also need to branch the ext3 userspace tools to work with your new system.
Second, go get a copy of Understanding the Linux Kernel, 3 ed. by Bovet and Cesati. It presents an organized view of kernel subsystems, and I've found its explanations to be worthwhile. It's written for an older kernel (2.6.x for some x < 15; I forget exactly), but it's still accurate in many places. Read through its descriptions of file systems. I believe it covers ext3.
Third, about your actual project, you aren't proposing a simple modification to ext3. That file system has a pretty straightforward way of mapping an inode number to a disk block. You'll need to find a new way of doing this mapping. I would not anticipate any changes to the rest of ext3. Solving this challenge may be one of the key design points of your architecture. Note that keeping around a big array of inode -> disk block maps doesn't solve your problem: it's probably no better than existing ext3.