如何在嵌入式Linux中高效地在VFAT分区上创建大文件

发布于 2024-10-06 10:32:34 字数 517 浏览 5 评论 0原文

我正在尝试在嵌入式 Linux 机器中使用“dd”命令在 VFAT 分区上创建一个大的空文件:

dd if=/dev/zero of=/mnt/flash/file bs=1M count=1 seek=1023

目的是跳过前 1023 个块并在文件末尾仅写入 1 个块,这应该在本机 EXT3 分区上速度非常快,确实如此。然而,这个操作在 VFAT 分区上结果相当慢,并出现以下消息:

lowmem_shrink:: nr_to_scan=128, gfp_mask=d0, other_free=6971, min_adj=16
// ... more `lowmem_shrink' messages

另一种尝试是 fopen() VFAT 分区上的文件,然后 fseek() 到末尾写入数据,这也事实证明速度很慢,并且来自内核的消息也相同。

那么基本上,有没有一种快速方法可以在 VFAT 分区上创建文件(无需遍历前 1023 个块)?

谢谢。

I'm trying to create a large empty file on a VFAT partition by using the `dd' command in an embedded linux box:

dd if=/dev/zero of=/mnt/flash/file bs=1M count=1 seek=1023

The intention was to skip the first 1023 blocks and write only 1 block at the end of the file, which should be very quick on a native EXT3 partition, and it indeed is. However, this operation turned out to be quite slow on a VFAT partition, along with the following message:

lowmem_shrink:: nr_to_scan=128, gfp_mask=d0, other_free=6971, min_adj=16
// ... more `lowmem_shrink' messages

Another attempt was to fopen() a file on the VFAT partition and then fseek() to the end to write the data, which has also proved slow, along with the same messages from the kernel.

So basically, is there a quick way to create the file on the VFAT partition (without traversing the first 1023 blocks)?

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

万劫不复 2024-10-13 10:32:34

为什么 VFAT“跳过”写入如此慢?

除非 VFAT 文件系统驱动程序在这方面“作弊”,否则在 FAT 类型文件系统上创建大文件总是需要很长时间。为了遵守 FAT 规范,驱动程序必须分配所有数据块并对它们进行零初始化,即使您“跳过”写入也是如此。这是因为 FAT 具有“簇链接”功能。

造成这种行为的原因是 FAT 无法支持以下任一功能:

  • 文件中的 UN*X 样式“漏洞”(又名“稀疏文件”)
    这就是您在 ext3 上使用测试用例创建的内容 - 一个没有数据块分配给其前 1GB-1MB 的文件,最后是一个实际提交的 1MB 零初始化块。
  • NTFS 样式的“有效数据长度”信息。
    在 NTFS 上,文件可以分配未初始化的块,但文件的元数据将保留两个大小的字段 - 一个表示文件的总大小,另一个表示实际写入文件的字节数(从文件的开头开始)。

如果没有支持这两种技术的规范,如果跳过一个范围,文件系统将始终必须分配和清零所有“中间”数据块。

另请记住,在 ext3 上,您使用的技术实际上并没有为文件分配块(除了最后 1MB)。如果您需要预先分配的块(不仅仅是设置大文件的大小),您还必须在那里执行完整写入。

如何修改 VFAT 驱动程序来处理这个问题?

目前,驱动程序使用 Linux 内核函数 cont_write_begin() 来启动对文件的异步写入;这个函数看起来像:

/*
 * For moronic filesystems that do not allow holes in file.
 * We may have to extend the file.
 */
int cont_write_begin(struct file *file, struct address_space *mapping,
                    loff_t pos, unsigned len, unsigned flags,
                    struct page **pagep, void **fsdata,
                    get_block_t *get_block, loff_t *bytes)
{
    struct inode *inode = mapping->host;
    unsigned blocksize = 1 << inode->i_blkbits;
    unsigned zerofrom;
    int err;

    err = cont_expand_zero(file, mapping, pos, bytes);
    if (err)
            return err;

    zerofrom = *bytes & ~PAGE_CACHE_MASK;
    if (pos+len > *bytes && zerofrom & (blocksize-1)) {
            *bytes |= (blocksize-1);
            (*bytes)++;
    }

    return block_write_begin(mapping, pos, len, flags, pagep, get_block);
}

这是一个简单的策略,但也是一个页面缓存回收器(您的日志消息是调用 cont_expand_zero() 的结果,它完成了所有工作,并且不是异步的)。如果文件系统将这两项操作分开——一项任务进行“真正的”写入,另一项任务进行零填充,那么它会显得更快。

在仍然使用默认的 Linux 文件系统实用程序接口的情况下实现这一点的方法是在内部创建两个“虚拟”文件 - 一个用于要填零的区域,另一个用于实际要写入的数据。只有在后台任务实际完成后,真实文件的目录条目和 FAT 簇链才会更新,方法是将其最后一个簇与“zerofill 文件”的第一个簇链接,并将该文件的最后一个簇与“zerofill 文件”的第一个簇链接。实际写入文件”。人们还希望进行直接写入来进行零填充,以避免破坏页面缓存。

注意:虽然这一切在技术上都是可行的,但问题是进行这样的改变有多值得?谁一直需要这种手术?会有什么副作用?
现有的(简单)代码对于较小的跳过写入来说是完全可以接受的,如果您创建一个 1MB 文件并在末尾写入一个字节,您将不会真正注意到它的存在。只有当您选择的文件大小符合 FAT 文件系统允许您执行的操作的限制时,它才会对您产生影响。

其他选项...

在某些情况下,手头的任务涉及两个(或更多)步骤:

  1. 用 FAT 重新格式化(例如)SD 卡,
  2. 将一个或多个大文件放入其中以“预填充”卡
  3. (应用程序-依赖,可选)
    预填充文件,或
    将环回文件系统映像放入其中

我处理过的一个案例我们折叠了前两个 - 即修改mkdosfs以在创建(FAT32)文件系统时预分配/预创建文件。这非常简单,在写入 FAT 表时,只需创建分配的簇链,而不是填充“空闲”标记的簇。它还具有保证数据块是连续的优点,以防您的应用程序从中受益。您可以决定让mkdosfs清除数据块的先前内容。例如,如果您知道准备步骤之一涉及写入整个数据或执行 FAT 上的 ext3-in-file-on-FAT(非常常见的事情 - Linux 设备、用于与 Windows 应用程序/GUI 进行数据交换的 SD 卡),那么就不需要将任何东西清零/双重写入(一次用零,一次用其他任何东西)。如果您的用例符合此要求(即格式化卡是“初始化使用”过程的有用/正常步骤),请尝试一下;经过适当修改的 mkdosfs的一部分TomTom 的 dosfsutils 源代码,请参阅 mkdosfs.c 搜索 -N 命令行选项处理

正如前面提到的,在谈论预分配时,还有 posix_fallocate()。目前在 Linux 上使用 FAT 时,这基本上与手动 dd ... 相同,即等待填零。但函数的规范并不要求它是同步的。块分配(FAT 簇链生成)必须同步完成,但 VFAT 磁盘上的不同大小更新和数据块填零可以后台/延迟(即在后台以低优先级完成,或者仅在明确时完成)通过 fdsync() / sync() 请求,以便应用程序可以分配块、写入非零本身的内容...)。但这就是技术/设计;我不知道有人做过内核修改,即使只是为了实验。

Why are VFAT "skipping" writes so slow ?

Unless the VFAT filesystem driver were made to "cheat" in this respect, creating large files on FAT-type filesystems will always take a long time. The driver, to comply with FAT specification, will have to allocate all data blocks and zero-initialize them, even if you "skip" the writes. That's because of the "cluster chaining" FAT does.

The reason for that behaviour is FAT's inability to support either:

  • UN*X-style "holes" in files (aka "sparse files")
    that's what you're creating on ext3 with your testcase - a file with no data blocks allocated to the first 1GB-1MB of it, and a single 1MB chunk of actually committed, zero-initialized blocks) at the end.
  • NTFS-style "valid data length" information.
    On NTFS, a file can have uninitialized blocks allocated to it, but the file's metadata will keep two size fields - one for the total size of the file, another for the number of bytes actually written to it (from the beginning of the file).

Without a specification supporting either technique, the filesystem would always have to allocate and zerofill all "intermediate" data blocks if you skip a range.

Also remember that on ext3, the technique you used does not actually allocate blocks to the file (apart from the last 1MB). If you require the blocks preallocated (not just the size of the file set large), you'll have to perform a full write there as well.

How could the VFAT driver be modified to deal with this ?

At the moment, the driver uses the Linux kernel function cont_write_begin() to start even an asynchronous write to a file; this function looks like:

/*
 * For moronic filesystems that do not allow holes in file.
 * We may have to extend the file.
 */
int cont_write_begin(struct file *file, struct address_space *mapping,
                    loff_t pos, unsigned len, unsigned flags,
                    struct page **pagep, void **fsdata,
                    get_block_t *get_block, loff_t *bytes)
{
    struct inode *inode = mapping->host;
    unsigned blocksize = 1 << inode->i_blkbits;
    unsigned zerofrom;
    int err;

    err = cont_expand_zero(file, mapping, pos, bytes);
    if (err)
            return err;

    zerofrom = *bytes & ~PAGE_CACHE_MASK;
    if (pos+len > *bytes && zerofrom & (blocksize-1)) {
            *bytes |= (blocksize-1);
            (*bytes)++;
    }

    return block_write_begin(mapping, pos, len, flags, pagep, get_block);
}

That is a simple strategy but also a pagecache trasher (your log messages are a consequence of the call to cont_expand_zero() which does all the work, and is not asynchronous). If the filesystem were to split the two operations - one task to do the "real" write, and another one to do the zero filling, it'd appear snappier.

The way this could be achieved while still using the default linux filesystem utility interfaces were by internally creating two "virtual" files - one for the to-be-zerofilled area, and another for the actually-to-be-written data. The real file's directory entry and FAT cluster chain would only be updated once the background task is actually complete, by linking its last cluster with the first one of the "zerofill file" and the last cluster of that one with the first one of the "actual write file". One would also want to go for a directio write to do the zerofilling, in order to avoid trashing the pagecache.

Note: While all this is technically possible for sure, the question is how worthwhile would it be to do such a change ? Who needs this operation all the time ? What would side effects be ?
The existing (simple) code is perfectly acceptable for smaller skipping writes, you won't really notice its presence if you create a 1MB file and write a single byte at the end. It'll bite you only if you go for filesizes on the order of the limits of what the FAT filesystem allows you to do.

Other options ...

In some situations, the task at hand involves two (or more) steps:

  1. freshly format (e.g.) a SD card with FAT
  2. put one or more big files onto it to "pre-fill" the card
  3. (app-dependent, optional)
    pre-populate the files, or
    put a loopback filesystem image into them

One of the cases I've worked on we've folded the first two - i.e. modified mkdosfs to pre-allocate/ pre-create files when making the (FAT32) filesystem. That's pretty simple, when writing the FAT tables just create allocated cluster chains instead of clusters filled with the "free" marker. It's also got the advantage that the data blocks are guaranteed to be contiguous, in case your app benefits from this. And you can decide to make mkdosfs not clear the previous contents of the data blocks. If you know, for example, that one of your preparation steps involves writing the entire data anyway or doing ext3-in-file-on-FAT (pretty common thing - linux appliance, sd card for data exchange with windows app/gui), then there's no need to zero out anything / double-write (once with zeroes, once with whatever-else). If your usecase fits this (i.e. formatting the card is a useful / normal step of the "initialize it for use" process anyway) then try it out; a suitably-modified mkdosfs is part of TomTom's dosfsutils sources, see mkdosfs.c search for the -N command line option handling.

When talking about preallocation, as mentioned, there's also posix_fallocate(). Currently on Linux when using FAT, this will do essentially the same as a manual dd ..., i.e. wait for the zerofill. But the specification of the function doesn't mandate it being synchronous. The block allocation (FAT cluster chain generation) would have to be done synchronously, but the VFAT on-disk dirent size update and the data block zerofills could be backgrounded / delayed (i.e. either done at low-prio in background or only done if explicitly requested via fdsync() / sync() so that the app can e.g. alloc blocks, write the contents with non-zeroes itself ...). That's technique / design though; I'm not aware of anyone having done that kernel modification yet, if only for experimenting.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文