如何为编辑的文件分配磁盘空间
假设我在 HDD 磁盘存储中保存一个文本文件(假设磁盘存储是新的并且已进行碎片整理),文件名为 A,文件大小为 10MB
我假设文件 A 占用了磁盘中的一些空间,如图所示,其中 x 是磁盘上未占用的空间/内存
AAAAAAAAAAAAAAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
现在,我创建并保存另一个一定大小的文件 B。因此 B 将保存为
AAAAAAAAAAAAABBBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx - 由于磁盘已进行碎片整理,我假设存储将是连续的。
这里,如果我编辑文件A并将文件大小减小到2MB会怎样。你能说一下现在内存将如何分配吗?
我能想到的一些选择是
AAAAAAxxxxxxxxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx
或
AAxxxAAxxxAxAxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx
或 一个全新的位置,为其他文件释放更大的块。
xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBAAAAAAxxxxxxxxxxxxxxxxxxxxx
或者是基于任何算法或数据结构的任何其他方式。
Assume I save a text file in the HDD disk storage(assume the disk storage is new and so defragmented) and the file name is A with a file size of say 10MB
I presume, the file A occupies some space in the disk as shown, where x is an unoccupied space/memory on the disk
AAAAAAAAAAAAAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Now, I create and save another file B of some size. So B will be saved as
AAAAAAAAAAAAABBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx - as the disk is defragmented, I assume the storage will be contiguous.
Here, what if I edit the file A and reduce the file size to 2MB. Can you say how the memory will be allocated now.
Some options I could think of are
AAAAAAxxxxxxxxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxxx
or
AAxxxAAxxxAxAxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxxx
or
a totally new location freeing up the bigger chunk for other files.
xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBAAAAAAxxxxxxxxxxxxxxxxxxxxxx
or is it any other way based on any algorithm or data-structure.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这很大程度上取决于您使用的文件系统类型(以及操作系统如何与其交互)。对于同一组逻辑操作,Windows 中 NTFS 文件系统的行为可能与 Ubuntu 中 ext3 文件系统的行为完全不同。
然而,一般来说,大多数现代文件系统将文件定义为一系列指向磁盘上块的指针。最小块大小描述了最小的可分配块(通常范围从 512 字节到 4 KB),因此小于此大小或不是此大小的精确倍数的文件将分配给它们一定量的额外空间。
那么当您分配 10 MB 的文件“A”时会发生什么?文件系统为文件内容保留 10MB 的块(甚至可能在末尾允许一些额外的块以容纳对文件或其元数据进行的任何细微编辑)。理想情况下,这些块将是连续的,如您的示例所示。当您编辑“A”并使其变小时,文件系统将释放部分或全部(很可能是全部,因为在大多数情况下编辑“A”涉及将“A”的全部内容再次写入磁盘,因此没有理由文件系统更愿意将“A”保留在分配给“A”的块的同一物理位置,而不是将数据写入磁盘上其他位置的新位置,并更新其引用以包括分配的任何新块(如有必要) 。
话虽如此,在典型情况下并使用现代文件系统和操作系统,我希望您的示例在磁盘上产生以下最终状态(“b”和“a”代表分配给“B”和“A”的额外字节,不包含任何有意义的数据):
xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBbbAAAAAAaaxxxxxxxxxxxxxxxxxxxxxx
但实际结果当然会因文件系统、操作系统和潜在的其他因素而异(例如,当使用 SSD 时,数据碎片变得无关紧要,因为磁盘的任何部分都可以以非常快的速度访问)低延迟并且没有寻道惩罚,但同时最小化写入周期变得很重要,以便设备不会过早磨损,因此操作系统可能倾向于在这种情况下尽可能将“A”保留在适当的位置以尽量减少需要覆盖的扇区数量)。
所以简短的回答是“这取决于”。
A lot of this would depend upon what type of filesystem you are using (and also how the OS interacts with it). The behavior of an NTFS filesystem in Windows may be nothing like the behavior of an ext3 filesystem in Ubuntu for the same set of logical operations.
Generally speaking, however, most modern filesystems define a file as a series of pointers to blocks on the disk. There is a minimum block size that describes the smallest allocatable block (typically ranging from 512 bytes to 4 KBytes), so files that are less than this size or not some exact multiple of this size will have some amount of extra space allocated to them.
So what happens when you allocate a 10 MB file 'A'? The filesystem reserves 10MB worth of blocks (perhaps even allowing for a few extra blocks at the end to accommodate any minor edits that are made to the file or its metadata) for the file contents. Ideally these blocks will be contiguous, as in your example. When you edit 'A' and make it smaller, the filesystem will release some or all (most likely all since in most cases editing 'A' involves writing out the entire contents of 'A' to disk again, so there's little reason for the filesystem to prefer keeping 'A' in the same physical location over writing the data to a new location somewhere else on the disk) of the blocks allocated to 'A', and update its reference to include any new blocks that were allocated, if necessary.
With that said, in the typical case and using a modern filesystem and OS, I would expect your example to produce the following final state on disk ('b' and 'a' represent extra bytes allocated to 'B' and 'A' that do not contain any meaningful data):
xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBbbAAAAAAaaxxxxxxxxxxxxxxxxxxxxxx
But real-world results will of course vary by filesystem, OS, and potentially other factors (for instance, when using an SSD data fragmentation becomes irrelevant because any section of the disk can be accessed at very low latency and with no seek penalty but at the same time it becomes important to minimize write cycles so that the device doesn't wear-our prematurely, so the OS may favor leaving 'A' in place as much as possible in that case in order to minimize the number of sectors that need to be overwritten).
So the short answer is, "it depends".
如何分配完全取决于文件系统类型(例如FAT32、NTFS、jfs、reiser等)和驱动程序软件。您关于文件将连续存储的假设不一定正确 - 根据硬件的不同,以不同的模式存储它可能会更具性能。例如,假设您有一个具有 16 个柱面磁头且块大小为 512 字节的磁盘,那么在 16 个不同的柱面上存储 8k 数据量可能是最有效的。
OTOH,随着最近的硬件不涉及旋转机械部件,故事发生了巨大的变化 - 像“碎片”这样的概念突然变得毫无意义,因为每个块的访问时间是相同的 - 无论以什么顺序完成。
How allocation is done depends entirely on the file system type (e.g. FAT32, NTFS, jfs, reiser, etc. etc.) and the driver software. Your assumption that the file will be stored contiguously is not necessarily true - it may be more performant to store it in a different pattern, depending on hardware. For example, let's say you have a disk with 16 cylinder heads and a blocksize of 512 bytes, then it could be most efficient to store an amount of 8k data on 16 different cylinders.
OTOH, with recent hardware that does not involve rotating mechanical parts, the story changes dramatically - a concept like "fragmentation" becomes suddenly meaningless, because the access time to each block is the same - no matter in which order it is done.
不,是这样的:
首先创建文件A:(这里big A代表A实际使用的数据,'a'代表A的保留数据,x代表免费)。
AAAAAAAAAAAAAaaaaaaXXXXXXXXXXXXXXXXXXX
然后添加 B:
AAAAAAAAAAAAAaaaaaaaBBBBbbbbbbbbbbbb
然后添加 C,但没有剩余空间:
AAAAAAAAAAAAAaaaaaaaBBBBbbbbCCCcccc
如果 A 被截断,将会发生这种情况
AAAAAaaaaaaaxxxxxxxBBBBbbbbCCCcccc
如果现在扩展 B,将会发生这种情况:
AAAaaaaaaaBBBBxxxxxBBBBBBBBCCCcccc
你看到B的数据是没有彼此距离越来越近,这就是所谓的碎片化。当您运行碎片整理工具时,数据会再次靠近放置。
No it's like this:
First you create file A: (here big A stands for data actually used for A and 'a' for reserved data for A, x stands for free).
AAAAAAAAAAAAAaaaaaaaXXXXXXXXXXXXXXXXXXX
Then B is added:
AAAAAAAAAAAAAaaaaaaaBBBBbbbbbbbbbb
Then C is added, but there is no unreserved space left:
AAAAAAAAAAAAAaaaaaaaBBBBbbbbCCCccc
If A is truncated this is what will happen
AAAAAaaaaaaaxxxxxxxxBBBBbbbbCCCccc
If B is now expanded this will happen:
AAAAAaaaaaaaBBBBxxxxxBBBBBBBBCCCccc
You see that the data for B is no longer close to each other, this is called fragmentation. When you run a defragmentation tool the data is placed close together again.