当前位置：文江博客话题详情

如何使文件稀疏？

发布于 2024-11-06 19:25:53 字数 222 浏览 0 评论 0原文

如果我有一个包含许多零的大文件，我如何有效地将其变成稀疏文件？

读取整个文件（包括所有零，可能会稀疏存储）并使用seek将其重写到新文件以跳过零区域是唯一的可能性吗？

或者是否有可能在现有文件中进行此操作（例如 File.setSparse(long start, long end)）？

我正在寻找 Java 或一些 Linux 命令的解决方案，文件系统将是 ext3 或类似的。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

那伤。 2024-11-13 19:25:53

8年来发生了很多变化。

Fallocate

fallocate -dfilename 可用于在现有文件中打孔。来自 fallocate(1) 手册页：（

-d, --dig-holes
  Detect and dig holes.  This makes the file sparse in-place,
  without using extra disk space.  The minimum size of the hole
  depends on filesystem I/O block size (usually 4096 bytes).
  Also, when using this option, --keep-size is implied.  If no
  range is specified by --offset and --length, then the entire
  file is analyzed for holes.

  You can think of this option as doing a "cp --sparse" and then
  renaming the destination file to the original, without the
  need for extra disk space.

  See --punch-hole for a list of supported filesystems.

该列表：）

Supported for XFS (since Linux 2.6.38), ext4 (since Linux
3.0), Btrfs (since Linux 3.7) and tmpfs (since Linux 3.5).

tmpfs 位于该列表中是我发现最有趣的一个。文件系统本身足够高效，只需消耗存储其内容所需的 RAM，但使内容稀疏可能会进一步提高效率。

GNU cp

此外，GNU cp 在此过程中的某个地方获得了对稀疏文件的理解。引用 cp(1) 手册页关于其默认模式 --sparse=auto：

稀疏的 SOURCE 文件通过粗略的启发式检测，并且相应的 DEST 文件也变得稀疏。

但还有 --sparse=always，它激活文件复制，相当于 fallocate -d 就地执行的操作：

指定 --sparse=always 在源文件包含足够长的零字节序列时创建稀疏 DEST 文件。

我终于能够退役我的 tar cpSf - SOURCE | (cd DESTDIR && tar xpSf -) 一行代码，这是我 20 年来复制稀疏文件并保留其稀疏性的方式。

A lot's changed in 8 years.

Fallocate

fallocate -dfilename can be used to punch holes in existing files. From the fallocate(1) man page:

-d, --dig-holes
  Detect and dig holes.  This makes the file sparse in-place,
  without using extra disk space.  The minimum size of the hole
  depends on filesystem I/O block size (usually 4096 bytes).
  Also, when using this option, --keep-size is implied.  If no
  range is specified by --offset and --length, then the entire
  file is analyzed for holes.

  You can think of this option as doing a "cp --sparse" and then
  renaming the destination file to the original, without the
  need for extra disk space.

  See --punch-hole for a list of supported filesystems.

(That list:)

Supported for XFS (since Linux 2.6.38), ext4 (since Linux
3.0), Btrfs (since Linux 3.7) and tmpfs (since Linux 3.5).

tmpfs being on that list is the one I find most interesting. The filesystem itself is efficient enough to only consume as much RAM as it needs to store its contents, but making the contents sparse can potentially increase that efficiency even further.

GNU `cp`

Additionally, somewhere along the way GNU cp gained an understanding of sparse files. Quoting the cp(1) man page regarding its default mode, --sparse=auto:

sparse SOURCE files are detected by a crude heuristic and the corresponding DEST file is made sparse as well.

But there's also --sparse=always, which activates the file-copy equivalent of what fallocate -d does in-place:

Specify --sparse=always to create a sparse DEST file whenever the SOURCE file contains a long enough sequence of zero bytes.

I've finally been able to retire my tar cpSf - SOURCE | (cd DESTDIR && tar xpSf -) one-liner, which for 20 years was my graybeard way of copying sparse files with their sparseness preserved.

回复收藏 0 原文