如何为编辑的文件分配磁盘空间

发布于 2024-10-31 20:35:59 字数 789 浏览 14 评论 0原文

假设我在 HDD 磁盘存储中保存一个文本文件（假设磁盘存储是新的并且已进行碎片整理），文件名为 A，文件大小为 10MB

我假设文件 A 占用了磁盘中的一些空间，如图所示，其中 x 是磁盘上未占用的空间/内存

AAAAAAAAAAAAAAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

现在，我创建并保存另一个一定大小的文件 B。因此 B 将保存为

AAAAAAAAAAAAABBBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx - 由于磁盘已进行碎片整理，我假设存储将是连续的。

这里，如果我编辑文件A并将文件大小减小到2MB会怎样。你能说一下现在内存将如何分配吗？

我能想到的一些选择是
AAAAAAxxxxxxxxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx

或
AAxxxAAxxxAxAxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx

或一个全新的位置，为其他文件释放更大的块。
xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBAAAAAAxxxxxxxxxxxxxxxxxxxxx

或者是基于任何算法或数据结构的任何其他方式。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时常饿 2024-11-07 20:35:59

这很大程度上取决于您使用的文件系统类型（以及操作系统如何与其交互）。对于同一组逻辑操作，Windows 中 NTFS 文件系统的行为可能与 Ubuntu 中 ext3 文件系统的行为完全不同。

然而，一般来说，大多数现代文件系统将文件定义为一系列指向磁盘上块的指针。最小块大小描述了最小的可分配块（通常范围从 512 字节到 4 KB），因此小于此大小或不是此大小的精确倍数的文件将分配给它们一定量的额外空间。

那么当您分配 10 MB 的文件“A”时会发生什么？文件系统为文件内容保留 10MB 的块（甚至可能在末尾允许一些额外的块以容纳对文件或其元数据进行的任何细微编辑）。理想情况下，这些块将是连续的，如您的示例所示。当您编辑“A”并使其变小时，文件系统将释放部分或全部（很可能是全部，因为在大多数情况下编辑“A”涉及将“A”的全部内容再次写入磁盘，因此没有理由文件系统更愿意将“A”保留在分配给“A”的块的同一物理位置，而不是将数据写入磁盘上其他位置的新位置，并更新其引用以包括分配的任何新块（如有必要）。

话虽如此，在典型情况下并使用现代文件系统和操作系统，我希望您的示例在磁盘上产生以下最终状态（“b”和“a”代表分配给“B”和“A”的额外字节，不包含任何有意义的数据）：

xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBbbAAAAAAaaxxxxxxxxxxxxxxxxxxxxxx

但实际结果当然会因文件系统、操作系统和潜在的其他因素而异（例如，当使用 SSD 时，数据碎片变得无关紧要，因为磁盘的任何部分都可以以非常快的速度访问）低延迟并且没有寻道惩罚，但同时最小化写入周期变得很重要，以便设备不会过早磨损，因此操作系统可能倾向于在这种情况下尽可能将“A”保留在适当的位置以尽量减少需要覆盖的扇区数量）。

所以简短的回答是“这取决于”。

A lot of this would depend upon what type of filesystem you are using (and also how the OS interacts with it). The behavior of an NTFS filesystem in Windows may be nothing like the behavior of an ext3 filesystem in Ubuntu for the same set of logical operations.

Generally speaking, however, most modern filesystems define a file as a series of pointers to blocks on the disk. There is a minimum block size that describes the smallest allocatable block (typically ranging from 512 bytes to 4 KBytes), so files that are less than this size or not some exact multiple of this size will have some amount of extra space allocated to them.

So what happens when you allocate a 10 MB file 'A'? The filesystem reserves 10MB worth of blocks (perhaps even allowing for a few extra blocks at the end to accommodate any minor edits that are made to the file or its metadata) for the file contents. Ideally these blocks will be contiguous, as in your example. When you edit 'A' and make it smaller, the filesystem will release some or all (most likely all since in most cases editing 'A' involves writing out the entire contents of 'A' to disk again, so there's little reason for the filesystem to prefer keeping 'A' in the same physical location over writing the data to a new location somewhere else on the disk) of the blocks allocated to 'A', and update its reference to include any new blocks that were allocated, if necessary.

With that said, in the typical case and using a modern filesystem and OS, I would expect your example to produce the following final state on disk ('b' and 'a' represent extra bytes allocated to 'B' and 'A' that do not contain any meaningful data):

xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBbbAAAAAAaaxxxxxxxxxxxxxxxxxxxxxx

But real-world results will of course vary by filesystem, OS, and potentially other factors (for instance, when using an SSD data fragmentation becomes irrelevant because any section of the disk can be accessed at very low latency and with no seek penalty but at the same time it becomes important to minimize write cycles so that the device doesn't wear-our prematurely, so the OS may favor leaving 'A' in place as much as possible in that case in order to minimize the number of sectors that need to be overwritten).

So the short answer is, "it depends".

回复收藏 0 原文

以往的大感动 2024-11-07 20:35:59

如何分配完全取决于文件系统类型（例如FAT32、NTFS、jfs、reiser等）和驱动程序软件。您关于文件将连续存储的假设不一定正确 - 根据硬件的不同，以不同的模式存储它可能会更具性能。例如，假设您有一个具有 16 个柱面磁头且块大小为 512 字节的磁盘，那么在 16 个不同的柱面上存储 8k 数据量可能是最有效的。
OTOH，随着最近的硬件不涉及旋转机械部件，故事发生了巨大的变化 - 像“碎片”这样的概念突然变得毫无意义，因为每个块的访问时间是相同的 - 无论以什么顺序完成。

回复收藏 0 原文