如何使文件稀疏?

发布于 2024-11-06 19:25:53 字数 222 浏览 0 评论 0原文

如果我有一个包含许多零的大文件,我如何有效地将其变成稀疏文件?

读取整个文件(包括所有零,可能会稀疏存储)并使用seek将其重写到新文件以跳过零区域是唯一的可能性吗?

或者是否有可能在现有文件中进行此操作(例如 File.setSparse(long start, long end))?

我正在寻找 Java 或一些 Linux 命令的解决方案,文件系统将是 ext3 或类似的。

If I have a big file containing many zeros, how can i efficiently make it a sparse file?

Is the only possibility to read the whole file (including all zeroes, which may patrially be stored sparse) and to rewrite it to a new file using seek to skip the zero areas?

Or is there a possibility to make this in an existing file (e.g. File.setSparse(long start, long end))?

I'm looking for a solution in Java or some Linux commands, Filesystem will be ext3 or similar.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

那伤。 2024-11-13 19:25:53

8年来发生了很多变化。

Fallocate

fallocate -dfilename 可用于在现有文件中打孔。来自 fallocate(1) 手册页:(

-d, --dig-holes
  Detect and dig holes.  This makes the file sparse in-place,
  without using extra disk space.  The minimum size of the hole
  depends on filesystem I/O block size (usually 4096 bytes).
  Also, when using this option, --keep-size is implied.  If no
  range is specified by --offset and --length, then the entire
  file is analyzed for holes.

  You can think of this option as doing a "cp --sparse" and then
  renaming the destination file to the original, without the
  need for extra disk space.

  See --punch-hole for a list of supported filesystems.

该列表:)

Supported for XFS (since Linux 2.6.38), ext4 (since Linux
3.0), Btrfs (since Linux 3.7) and tmpfs (since Linux 3.5).

tmpfs 位于该列表中是我发现最有趣的一个。文件系统本身足够高效,只需消耗存储其内容所需的 RAM,但使内容稀疏可能会进一步提高效率。

GNU cp

此外,GNU cp 在此过程中的某个地方获得了对稀疏文件的理解。引用 cp(1) 手册页 关于其默认模式 --sparse=auto

稀疏的 SOURCE 文件通过粗略的启发式检测,并且相应的 DEST 文件也变得稀疏。

但还有 --sparse=always,它激活文件复制,相当于 fallocate -d 就地执行的操作:

指定 --sparse=always 在源文件包含足够长的零字节序列时创建稀疏 DEST 文件。

我终于能够退役我的 tar cpSf - SOURCE | (cd DESTDIR && tar xpSf -) 一行代码,这是我 20 年来复制稀疏文件并保留其稀疏性的方式。

A lot's changed in 8 years.

Fallocate

fallocate -dfilename can be used to punch holes in existing files. From the fallocate(1) man page:

-d, --dig-holes
  Detect and dig holes.  This makes the file sparse in-place,
  without using extra disk space.  The minimum size of the hole
  depends on filesystem I/O block size (usually 4096 bytes).
  Also, when using this option, --keep-size is implied.  If no
  range is specified by --offset and --length, then the entire
  file is analyzed for holes.

  You can think of this option as doing a "cp --sparse" and then
  renaming the destination file to the original, without the
  need for extra disk space.

  See --punch-hole for a list of supported filesystems.

(That list:)

Supported for XFS (since Linux 2.6.38), ext4 (since Linux
3.0), Btrfs (since Linux 3.7) and tmpfs (since Linux 3.5).

tmpfs being on that list is the one I find most interesting. The filesystem itself is efficient enough to only consume as much RAM as it needs to store its contents, but making the contents sparse can potentially increase that efficiency even further.

GNU cp

Additionally, somewhere along the way GNU cp gained an understanding of sparse files. Quoting the cp(1) man page regarding its default mode, --sparse=auto:

sparse SOURCE files are detected by a crude heuristic and the corresponding DEST file is made sparse as well.

But there's also --sparse=always, which activates the file-copy equivalent of what fallocate -d does in-place:

Specify --sparse=always to create a sparse DEST file whenever the SOURCE file contains a long enough sequence of zero bytes.

I've finally been able to retire my tar cpSf - SOURCE | (cd DESTDIR && tar xpSf -) one-liner, which for 20 years was my graybeard way of copying sparse files with their sparseness preserved.

哽咽笑 2024-11-13 19:25:53

Linux / UNIX 上的某些文件系统具有在现有文件中“打孔”的能力。请参阅:

它不是很便携,并且没有全面采用相同的方式;截至目前,我相信 Java 的 IO 库还没有为此提供接口。

如果可以通过 fcntl(F_FREESP) 或任何其他机制进行打孔,那么它应该比复制/查找循环快得多。

Some filesystems on Linux / UNIX have the ability to "punch holes" into an existing file. See:

It's not very portable and not done the same way across the board; as of right now, I believe Java's IO libraries do not provide an interface for this.

If hole punching is available either via fcntl(F_FREESP) or via any other mechanism, it should be significantly faster than a copy/seek loop.

只为守护你 2024-11-13 19:25:53

您可以在 Linux 终端上使用 $ truncate -s filename filesize 来创建

仅包含元数据的稀疏文件。

注意——文件大小以字节为单位。

You can use $ truncate -s filename filesize on linux teminal to create sparse file having

only metadata.

NOTE --Filesize is in bytes.

俏︾媚 2024-11-13 19:25:53

根据这篇文章,目前似乎没有简单的解决方案,除了使用FIEMAP ioctl。但是,我不知道如何将“非稀疏”零块变成“稀疏”块。

According to this article, it seems there is currently no easy solution, except for using FIEMAP ioctl. However, I don't know how you can make "non sparse" zero blocks into "sparse" ones.

北笙凉宸 2024-11-13 19:25:53

我认为你最好预先分配整个文件并维护占用的页面/部分的表/位集。

如果文件被重复使用,那么使文件变得稀疏会导致这些部分变得碎片化。也许节省几 TB 的磁盘空间并不值得高度碎片化的文件带来的性能损失。

I think you would be better off pre-allocating the whole file and maintaining a table/BitSet of the pages/sections which are occupied.

Making a file sparse would result in those sections being fragmented if they were ever re-used. Perhaps saving a few TB of disk space is not worth the performance hit of a highly fragmented file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文