Linux 文件系统的历史视角

发布于 2024-07-21 03:10:53 字数 802 浏览 9 评论 0原文

Jonathan Leffler 在问题 “如何找到某些指定文件的大小？”耐人寻味。我将把它分成几部分进行分析。

——文件存储在页面上；
你通常会得到更多的空间使用比计算给出的因为 1 字节文件（通常）占用一页（可能 512 字节）。
该确切的值有所不同 - 更容易第七版 Unix 文件的日子系统（尽管那时也不是微不足道的）
4-5。如果你想考虑引用的间接块 inode 以及原始数据块）。

有关部分的问题

“页面”的定义是什么？
为什么事后想到“也许”这个词是“一页（也许512字节）”？
为什么在“第七版 Unix 文件系统”中测量精确大小更容易？
“间接块”的定义是什么？
如何通过“inode”和“原始数据块”这两个东西来引用？

出现的历史问题

一、莱弗勒所说的历史背景是什么？

二. 有定义随着时间的推移而改变？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

音栖息无 2024-07-28 03:10:53

我认为他的意思是块而不是页，块是文件系统上的最小可寻址单元。
我认为他的意思是块而不是页，块
块大小可以变化
不知道为什么，但也许是文件系统接口暴露了API允许更精确的测量。
间接块是由指针引用的块
inode 与原始数据一样占用空间（块）。这就是作者的意思。

回复收藏 0 原文

寻梦旅人 2024-07-28 03:10:53

与维基百科页面一样，块（数据存储）提供了丰富的信息，尽管距离很远过于热衷于链接所有关键字。

在计算中（特别是数据传输和数据存储)，块是字节或位，具有标称长度（a 块大小）。这样构造的数据被认为是被阻止的。将数据放入块的过程称为阻塞。阻塞用于促进接收数据的计算机程序对数据流的处理。块数据通常一次读取整个块。将数据存储到 9 轨磁带、旋转介质（例如软盘，硬盘、光盘以及NAND 闪存。
大多数文件系统都基于阻止设备，其级别为负责存储和检索指定块的硬件的抽象数据，尽管文件系统中的块大小可能是物理块大小的倍数。在经典文件系统中，单个块可能仅包含单个文件的一部分。由于内部碎片，这会导致空间效率低下，因为文件长度通常不是块大小的倍数，因此最后一个文件块将保持部分为空。这将创建松弛空间，平均每个文件有半个块。一些较新的文件系统尝试通过称为块子分配和尾部合并。

还有对经典 Unix 文件系统的合理概述。

传统上，硬盘几何结构（磁盘本身上块的布局）一直是CHS。

磁头：盘片各（面）上的磁性读取器/写入器；可以移入和移出以访问不同的柱面
柱面：当盘片旋转时从磁头下方经过的磁道
扇区：连续存储在某个部分上的恒定大小的数据量气缸；如今，驱动器可以处理的最小数据单元

CHS 已不再使用，因为

硬盘不再使用每个柱面固定数量的扇区。通过使用每个扇区的恒定弧长而不是恒定的旋转角度，可以将更多数据压缩到盘片上，因此外圆柱上的扇区比内圆柱上的扇区多。
根据 ATA 规范，驱动器每个磁头不得超过 2¹⁶ 柱面、2⁴ 磁头以及每个柱面 2⁸ 扇区；对于 512B 扇区，限制为 128GB。通过 BIOS INT13，无论如何都无法通过 CHS 访问超过 7.88GB 的任何内容。
为了向后兼容，较大的驱动器仍然声称具有 CHS 几何结构（否则 DOS 将无法启动），但获取任何更高的数据都需要使用 LBA 寻址。
CHS 在 RAID 或非旋转介质上甚至没有意义。

但由于历史原因，这影响了块大小：因为扇区大小几乎总是 512B，所以文件系统块大小始终是 512B 的倍数。（正在采取行动引入 1kB 和 4kB 扇区大小的驱动器，开销

一般来说，较小的文件系统块大小会在存储许多小文件时减少浪费的空间（除非使用尾部合并等先进技术），而较大的块大小会减少外部碎片并在大磁盘上具有较低的。文件系统块大小通常是 2 的幂，低于块设备的扇区大小，并且通常高于操作系统的页面大小。

页面大小因操作系统和平台而异（对于 Linux ，也可能因配置而异）。与块大小一样，较小的块大小可以减少内部碎片，但需要更多的管理开销。 32 位平台上 4kB 的页面大小很常见。

现在，继续描述间接块。在UFS设计中，

inode描述了一个文件。
在UFS设计中，一个inode可以容纳的指向数据块的指针数量非常有限（少于16个）。具体数字在派生实现中似乎有所不同。
对于小文件，指针可以直接指向组成文件的数据块。
对于较大的文件，必须有间接指针，它指向一个只包含更多指向块的指针的块。这些可能是指向属于文件的数据块的直接指针，或者如果文件非常大，它们甚至可能是间接指针。

因此，当使用间接指针时，文件所需的存储量可能大于包含其数据的块。

并非所有文件系统都使用此方法来跟踪属于文件的数据块。 FAT 仅使用单个文件分配表，该表实际上是一系列巨大的链表，并且许多现代文件系统使用范围。

As usual for Wikipedia pages, Block (data storage) is informative despite being far too exuberant about linking all keywords.

In computing (specifically data transmission and data storage), a block is a sequence of bytes or bits, having a nominal length (a block size). Data thus structured is said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data. Blocked data is normally read a whole block at a time. Blocking is almost universally employed when storing data to 9-track magnetic tape, to rotating media such as floppy disks, hard disks, optical discs and to NAND flash memory.
Most file systems are based on a block device, which is a level of abstraction for the hardware responsible for storing and retrieving specified blocks of data, though the block size in file systems may be a multiple of the physical block size. In classical file systems, a single block may only contain a part of a single file. This leads to space inefficiency due to internal fragmentation, since file lengths are often not multiples of block size, and thus the last block of files will remain partially empty. This will create slack space, which averages half a block per file. Some newer file systems attempt to solve this through techniques called block suballocation and tail merging.

There's also a reasonable overview of the classical Unix File System.

Traditionally, hard disk geometry (the layout of blocks on the disk itself) has been CHS.

Head: the magnetic reader/writer on each (side of a) platter; can move in and out to access different cylinders
Cylinder: a track that passes under a head as the platter rotates
Sector: a constant-sized amount of data stored contiguously on a portion the cylinder; the smallest unit of data that the drive can deal with

CHS isn't used much these days, as

Hard disks no longer use a constant number of sectors per cylinder. More data is squeezed onto a platter by using a constant arclength per sector rather than a constant rotational angle, so there are more sectors on the outer cylinders than there are on the inner cylinders.
By the ATA specification, a drive may have no more than 2¹⁶ cylinders per head, 2⁴ heads, and 2⁸ sectors per cylinder; with 512B sectors, this is a limit of 128GB. Through BIOS INT13, it is not possible to access anything beyond 7.88GB through CHS anyways.
For backwards-compatibility, larger drives still claim to have a CHS geometry (otherwise DOS wouldn't be able to boot), but getting to any of the higher data requires using LBA addressing.
CHS doesn't even make sense on RAID or non-rotational media.

but for historical reasons, this has affected block sizes: because sector sizes were almost always 512B, filesystem block sizes have always been multiples of 512B. (There is a movement afoot to introduce drives with 1kB and 4kB sector sizes, but compatibility looks rather painful.)

Generally speaking, smaller filesystem block sizes result in less wasted space when storing many small files (unless advanced techniques like tail merging are in use), while larger block sizes reduce external fragmentation and have lower overhead on large disks. The filesystem block size is usually a power of 2, is limited below by the block device's sector size, and is often limited above by the OS's page size.

The page size varies by OS and platform (and, in the case of Linux, can vary by configuration as well). Like block size, smaller block sizes reduce internal fragmentation but require more administrative overhead. 4kB page sizes on 32-bit platforms is common.

Now, on to describe indirect blocks. In the UFS design,

An inode describes a file.
In the UFS design, the number of pointers to data blocks that an inode could hold is very limited (less than 16). The specific number appears to vary in derived implementations.
For small files, the pointers can directly point to the data blocks that compose a file.
For larger files, there must be indirect pointers, which point to a block which only contains more pointers to blocks. These may be direct pointers to data blocks belonging to the file, or if the file is very large, they may be even more indirect pointers.

Thus the amount of storage required for a file may be greater than just the blocks containing its data, when indirect pointers are in use.

Not all filesystems use this method for keeping track of the data blocks belong to a file. FAT simply uses a single file allocation table which is effectively a gigantic series of linked lists, and many modern filesystems use extents.

回复收藏 0 原文

~没有更多了~