文件指针宽度和最大文件大小之间的关系是什么
只是对 Linux 上一些流行的文件系统提供的最大文件大小限制感到好奇,我发现有些文件系统达到了 TB 级。
我的问题是,如果文件指针是 32 位宽,就像我们今天遇到的大多数 Linux 一样,这是否意味着我们可以寻址的最大距离是 2^32-1 字节?那么我们如何存储大于4GB的文件呢?
此外,即使我们可以存储这样的文件,我们如何定位超出2^32范围的位置呢?
Just curious about the maximum file size limit provided by some popular file systems on Linux, I have seen some are up to TB scale.
My question is what if the file pointer is 32 bits wide, like most Linux we meet today, doesn't that mean that the maximum distance we can address is 2^32-1 bytes? Then how can we store a file larger than 4GB?
Furthermore, even if we can store such a file, how can we locate a position beyond the 2^32 range?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
要使用大于 4 GB 的文件,您需要 Linux 上的“大文件支持”(LFS)。 LFS 引入的更改之一是文件偏移量是 64 位数字。这与 Linux 本身运行在 32 位模式还是 64 位模式无关(例如 x86 与 x86-64)。参见 http://www.suse.de/~aj/linux_lfs.html
LFS 主要在 glibc 2.2 和 kernel 2.4.0 中引入(大约在 2000-2001 年),因此任何最近的 Linux 发行版都会有它。
要在Linux上使用它,您可以使用特殊函数(例如
lseek64
而不是lseek
),或设置#define _FILE_OFFSET_BITS 64
,然后常规函数将使用 64 位偏移量。To use files larger than 4 GB, you need "large file support" (LFS) on Linux. One of the changes LFS introduced was that file offsets are 64bit numbers. This is independent of whether Linux itself is running in 32 or 64bit mode (e.g. x86 vs. x86-64). See e.g. http://www.suse.de/~aj/linux_lfs.html
LFS was introduced mostly in glibc 2.2 and kernel 2.4.0 (roughly in 2000-2001), so any recent Linux distribution will have it.
To use it on Linux, you can either use special functions (e.g.
lseek64
instead oflseek
), or set#define _FILE_OFFSET_BITS 64
, then the regular functions will use 64bit offsets.至少在 Linux 中,编写程序来显式处理较大的文件是很简单的(即,不仅仅是使用 kohleHydrat 建议的流方法)。
请参阅 此页面。诀窍通常归结为在包含一些系统头文件之前有一个神奇的#define,它“打开”“大文件支持”。这通常会将文件偏移类型的大小加倍到 64 位,这是相当多的。
In Linux, at least, it's trivial to write programs to work with larger files explicitly (i.e., not just using a streaming approach as suggested by kohlehydrat).
See this page, for instance. The trick usually comes down to having a magic
#define
before including some of the system headers, which "turn on" the "large file support". This typically doubles the size of the file offset type to 64 bits, which is quite a lot.没有任何关系。 C stdio 中的 FILE * 指针是一个不透明的句柄,与磁盘文件的大小无关,并且它指向的内存也可能比指针本身大得多。用于重新定位读取和写入位置的函数
fseek()
已经花费了long
、fgetpos()
和fsetpos()
使用不透明的fpos_t
。使处理大文件变得困难的是
off_t
在各种系统调用中用作偏移量。幸运的是,人们意识到这将是一个问题,并提出了“大文件支持”(LFS),这是一种经过修改的 ABI,对于偏移类型off_t
具有更宽的宽度。 (通常这是通过引入新的 API,并#define
旧名称来调用这个新 API 来完成的。)There is no relation whatsoever. The
FILE *
pointer from C stdio is an opaque handle that has no relation to the size of the on-disk file, and the memory it points too can be much bigger than the pointer itself. The functionfseek()
, to reposition where we read from and write to, already takes along
, andfgetpos()
andfsetpos()
use an opaquefpos_t
.What can make working with large files difficult is
off_t
used as an offset in various system calls. Fortunately, people realized this would be an issue, and came up with "Large File Support" (LFS), which is an altered ABI with a wider width for the offset typeoff_t
. (Typically this is done by introducing a new API, and#define
ing the old names to invoke this new API.)您可以使用 lseek64 来处理大文件。 Ext4 可以处理 16 TiB 文件。
You can use
lseek64
to handle big files. Ext4 can handle 16 TiB files.只需重复调用 read(int fd, void *buf, size_t count);
(因此不需要指向文件的“指针”。)
从文件系统设计的角度来看,您基本上有一个索引树(索引节点),它指向形成实际文件的几部分数据(块)。使用此模型,理论上您可以拥有无限大小的文件。
Just call repeatedly
read(int fd, void *buf, size_t count);
(So there's no need for a 'pointer' into the file.)
From the filesystem-design-point-of-view, you're basically having an index tree (Inodes), which points to several pieces of that data (blocks), that form the actual file. Using this model, you can theoretically have infinte sizes of files.
UNIX 对文件大小有实际的物理限制,该限制由 32 位文件指针可以索引的字节数决定,大约为 2.4 GB。
考虑在第一个文件达到 0x7fffffff 字节长度之前关闭它,然后打开另一个新文件。
ext2 文件系统存在一些限制的原因是数据的文件格式和操作系统的内核。这些因素大多在文件系统构建时就确定了。它们取决于块大小以及块和索引节点数量的比率。在 Linux 中,块大小受到架构页面大小的限制。
还有一些用户空间程序无法处理大于 2 GB 的文件。
由于
i_block(数组),最大文件大小限制为
和min( (b/4)3+(b/4)2+b/4+12, 232*b )
EXT2_N_BLOCKS)i_blocks( 32 位整数值)
表示文件中 b 字节“块”的数量。UNIX has actual physical limits to file size determined by the number of bytes a 32 bit file pointer can index, about 2.4 GB.
consider closing the first file just before it reaches 0x7fffffff bytes in length, and opening an additional new file.
The reason for some limits of the ext2-file system are the file format of the data and the operating system's kernel. Mostly these factors will be determined once when the file system is built. They depend on the block size and the ratio of the number of blocks and inodes. In Linux the block size is limited by the architecture page size.
There are also some userspace programs that can't handle files larger than 2 GB.
The maximum file size is limited to
min( (b/4)3+(b/4)2+b/4+12, 232*b )
due to thei_block (an array of EXT2_N_BLOCKS)
andi_blocks( 32-bits integer value )
representing the amount of b-bytes "blocks" in the file.