文件指针宽度和最大文件大小之间的关系是什么

发布于 2024-10-16 19:45:21 字数 238 浏览 7 评论 0原文

只是对 Linux 上一些流行的文件系统提供的最大文件大小限制感到好奇,我发现有些文件系统达到了 TB 级。

我的问题是,如果文件指针是 32 位宽,就像我们今天遇到的大多数 Linux 一样,这是否意味着我们可以寻址的最大距离是 2^32-1 字节?那么我们如何存储大于4GB的文件呢?

此外,即使我们可以存储这样的文件,我们如何定位超出2^32范围的位置呢?

Just curious about the maximum file size limit provided by some popular file systems on Linux, I have seen some are up to TB scale.

My question is what if the file pointer is 32 bits wide, like most Linux we meet today, doesn't that mean that the maximum distance we can address is 2^32-1 bytes? Then how can we store a file larger than 4GB?

Furthermore, even if we can store such a file, how can we locate a position beyond the 2^32 range?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

天邊彩虹 2024-10-23 19:45:21

要使用大于 4 GB 的文件,您需要 Linux 上的“大文件支持”(LFS)。 LFS 引入的更改之一是文件偏移量是 64 位数字。这与 Linux 本身运行在 32 位模式还是 64 位模式无关(例如 x86 与 x86-64)。参见 http://www.suse.de/~aj/linux_lfs.html

LFS 主要在 glibc 2.2 和 kernel 2.4.0 中引入(大约在 2000-2001 年),因此任何最近的 Linux 发行版都会有它。

要在Linux上使用它,您可以使用特殊函数(例如lseek64而不是lseek),或设置#define _FILE_OFFSET_BITS 64,然后常规函数将使用 64 位偏移量。

To use files larger than 4 GB, you need "large file support" (LFS) on Linux. One of the changes LFS introduced was that file offsets are 64bit numbers. This is independent of whether Linux itself is running in 32 or 64bit mode (e.g. x86 vs. x86-64). See e.g. http://www.suse.de/~aj/linux_lfs.html

LFS was introduced mostly in glibc 2.2 and kernel 2.4.0 (roughly in 2000-2001), so any recent Linux distribution will have it.

To use it on Linux, you can either use special functions (e.g. lseek64 instead of lseek), or set #define _FILE_OFFSET_BITS 64, then the regular functions will use 64bit offsets.

青衫儰鉨ミ守葔 2024-10-23 19:45:21

至少在 Linux 中,编写程序来显式处理较大的文件是很简单的(即,不仅仅是使用 kohleHydrat 建议的流方法)。

请参阅 此页面。诀窍通常归结为在包含一些系统头文件之前有一个神奇的#define,它“打开”“大文件支持”。这通常会将文件偏移类型的大小加倍到 64 位,这是相当多的。

In Linux, at least, it's trivial to write programs to work with larger files explicitly (i.e., not just using a streaming approach as suggested by kohlehydrat).

See this page, for instance. The trick usually comes down to having a magic #define before including some of the system headers, which "turn on" the "large file support". This typically doubles the size of the file offset type to 64 bits, which is quite a lot.

梦巷 2024-10-23 19:45:21

没有任何关系。 C stdio 中的 FILE * 指针是一个不透明的句柄,与磁盘文件的大小无关,并且它指向的内存也可能比指针本身大得多。用于重新定位读取和写入位置的函数 fseek() 已经花费了 longfgetpos()fsetpos() 使用不透明的 fpos_t

使处理大文件变得困难的是 off_t 在各种系统调用中用作偏移量。幸运的是,人们意识到这将是一个问题,并提出了“大文件支持”(LFS),这是一种经过修改的 ABI,对于偏移类型 off_t 具有更宽的宽度。 (通常这是通过引入新的 API,并#define旧名称来调用这个新 API 来完成的。)

There is no relation whatsoever. The FILE * pointer from C stdio is an opaque handle that has no relation to the size of the on-disk file, and the memory it points too can be much bigger than the pointer itself. The function fseek(), to reposition where we read from and write to, already takes a long, and fgetpos() and fsetpos() use an opaque fpos_t.

What can make working with large files difficult is off_t used as an offset in various system calls. Fortunately, people realized this would be an issue, and came up with "Large File Support" (LFS), which is an altered ABI with a wider width for the offset type off_t. (Typically this is done by introducing a new API, and #defineing the old names to invoke this new API.)

分分钟 2024-10-23 19:45:21

您可以使用 lseek64 来处理大文件。 Ext4 可以处理 16 TiB 文件。

You can use lseek64 to handle big files. Ext4 can handle 16 TiB files.

梦途 2024-10-23 19:45:21

只需重复调用 read(int fd, void *buf, size_t count);

(因此不需要指向文件的“指针”。)

从文件系统设计的角度来看,您基本上有一个索引树(索引节点),它指向形成实际文件的几部分数据(块)。使用此模型,理论上您可以拥有无​​限大小的文件。

Just call repeatedly read(int fd, void *buf, size_t count);

(So there's no need for a 'pointer' into the file.)

From the filesystem-design-point-of-view, you're basically having an index tree (Inodes), which points to several pieces of that data (blocks), that form the actual file. Using this model, you can theoretically have infinte sizes of files.

懵少女 2024-10-23 19:45:21

UNIX 对文件大小有实际的物理限制,该限制由 32 位文件指针可以索引的字节数决定,大约为 2.4 GB。

考虑在第一个文件达到 0x7fffffff 字节长度之前关闭它,然后打开另一个新文件。

ext2 文件系统存在一些限制的原因是数据的文件格式和操作系统的内核。这些因素大多在文件系统构建时就确定了。它们取决于块大小以及块和索引节点数量的比率。在 Linux 中,块大小受到架构页面大小的限制。

还有一些用户空间程序无法处理大于 2 GB 的文件

由于 i_block(数组),最大文件大小限制为 min( (b/4)3+(b/4)2+b/4+12, 232*b ) EXT2_N_BLOCKS)i_blocks( 32 位整数值) 表示文件中 b 字节“块”的数量。

UNIX has actual physical limits to file size determined by the number of bytes a 32 bit file pointer can index, about 2.4 GB.

consider closing the first file just before it reaches 0x7fffffff bytes in length, and opening an additional new file.

The reason for some limits of the ext2-file system are the file format of the data and the operating system's kernel. Mostly these factors will be determined once when the file system is built. They depend on the block size and the ratio of the number of blocks and inodes. In Linux the block size is limited by the architecture page size.

There are also some userspace programs that can't handle files larger than 2 GB.

The maximum file size is limited to min( (b/4)3+(b/4)2+b/4+12, 232*b ) due to the i_block (an array of EXT2_N_BLOCKS) and i_blocks( 32-bits integer value ) representing the amount of b-bytes "blocks" in the file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文