如何使文件稀疏?
如果我有一个包含许多零的大文件,我如何有效地将其变成稀疏文件?
读取整个文件(包括所有零,可能会稀疏存储)并使用seek将其重写到新文件以跳过零区域是唯一的可能性吗?
或者是否有可能在现有文件中进行此操作(例如 File.setSparse(long start, long end))?
我正在寻找 Java 或一些 Linux 命令的解决方案,文件系统将是 ext3 或类似的。
If I have a big file containing many zeros, how can i efficiently make it a sparse file?
Is the only possibility to read the whole file (including all zeroes, which may patrially be stored sparse) and to rewrite it to a new file using seek to skip the zero areas?
Or is there a possibility to make this in an existing file (e.g. File.setSparse(long start, long end))?
I'm looking for a solution in Java or some Linux commands, Filesystem will be ext3 or similar.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
8年来发生了很多变化。
Fallocate
fallocate -d
filename
可用于在现有文件中打孔。来自fallocate(1)
手册页:(该列表:)
tmpfs 位于该列表中是我发现最有趣的一个。文件系统本身足够高效,只需消耗存储其内容所需的 RAM,但使内容稀疏可能会进一步提高效率。
GNU cp
此外,GNU cp 在此过程中的某个地方获得了对稀疏文件的理解。引用
cp(1)
手册页 关于其默认模式--sparse=auto
:但还有
--sparse=always
,它激活文件复制,相当于fallocate -d
就地执行的操作:我终于能够退役我的 tar cpSf - SOURCE | (cd DESTDIR && tar xpSf -) 一行代码,这是我 20 年来复制稀疏文件并保留其稀疏性的方式。
A lot's changed in 8 years.
Fallocate
fallocate -d
filename
can be used to punch holes in existing files. From thefallocate(1)
man page:(That list:)
tmpfs being on that list is the one I find most interesting. The filesystem itself is efficient enough to only consume as much RAM as it needs to store its contents, but making the contents sparse can potentially increase that efficiency even further.
GNU
cp
Additionally, somewhere along the way GNU
cp
gained an understanding of sparse files. Quoting thecp(1)
man page regarding its default mode,--sparse=auto
:But there's also
--sparse=always
, which activates the file-copy equivalent of whatfallocate -d
does in-place:I've finally been able to retire my
tar cpSf - SOURCE | (cd DESTDIR && tar xpSf -)
one-liner, which for 20 years was my graybeard way of copying sparse files with their sparseness preserved.Linux / UNIX 上的某些文件系统具有在现有文件中“打孔”的能力。请参阅:
它不是很便携,并且没有全面采用相同的方式;截至目前,我相信 Java 的 IO 库还没有为此提供接口。
如果可以通过 fcntl(F_FREESP) 或任何其他机制进行打孔,那么它应该比复制/查找循环快得多。
Some filesystems on Linux / UNIX have the ability to "punch holes" into an existing file. See:
It's not very portable and not done the same way across the board; as of right now, I believe Java's IO libraries do not provide an interface for this.
If hole punching is available either via
fcntl(F_FREESP)
or via any other mechanism, it should be significantly faster than a copy/seek loop.您可以在 Linux 终端上使用 $ truncate -s filename filesize 来创建
仅包含元数据的稀疏文件。
注意——文件大小以字节为单位。
You can use
$ truncate -s filename filesize
on linux teminal to create sparse file havingonly metadata.
NOTE --Filesize is in bytes.
根据这篇文章,目前似乎没有简单的解决方案,除了使用FIEMAP ioctl。但是,我不知道如何将“非稀疏”零块变成“稀疏”块。
According to this article, it seems there is currently no easy solution, except for using FIEMAP ioctl. However, I don't know how you can make "non sparse" zero blocks into "sparse" ones.
我认为你最好预先分配整个文件并维护占用的页面/部分的表/位集。
如果文件被重复使用,那么使文件变得稀疏会导致这些部分变得碎片化。也许节省几 TB 的磁盘空间并不值得高度碎片化的文件带来的性能损失。
I think you would be better off pre-allocating the whole file and maintaining a table/BitSet of the pages/sections which are occupied.
Making a file sparse would result in those sections being fragmented if they were ever re-used. Perhaps saving a few TB of disk space is not worth the performance hit of a highly fragmented file.