GUNZIP /“逐部分”提取文件
我在一个磁盘空间有限的共享服务器上,并且我有一个 gz 文件,它可以超级扩展为一个巨大的文件,比我拥有的还要多。我怎样才能“部分”地提取它(假设一次10MB),并处理每个部分,甚至暂时不提取整个内容!
不,这只是一个超级巨大的压缩文件,而不是一组文件请...
嗨大卫,你的解决方案看起来非常优雅,但如果我准备得正确,似乎每次gunzip都会从文件的开头提取(并且我确信它的输出会被丢弃)。这将对我所在的共享服务器造成巨大的压力(我根本不认为它是“预读”)-您对如何使gunzip“跳过”必要数量的块有任何见解吗?
I'm on a shared server with restricted disk space and i've got a gz file that super expands into a HUGE file, more than what i've got. How can I extract it "portion" by "portion (lets say 10 MB at a time), and process each portion, without extracting the whole thing even temporarily!
No, this is just ONE super huge compressed file, not a set of files please...
Hi David, your solution looks quite elegant, but if i'm readying it right, it seems like every time gunzip extracts from the beginning of the file (and the output of that is thrown away). I'm sure that'll be causing a huge strain on the shared server i'm on (i dont think its "reading ahead" at all) - do you have any insights on how i can make gunzip "skip" the necessary number of blocks?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您使用 (Unix/Linux) shell 工具执行此操作,则可以使用
gunzip -c
解压缩到 stdout,然后使用dd
和skip< /code> 和
count
选项仅复制一个块。例如:
则skip=1、skip=2等。
If you're doing this with (Unix/Linux) shell tools, you can use
gunzip -c
to uncompress to stdout, then usedd
with theskip
andcount
options to copy only one chunk.For example:
then skip=1, skip=2, etc.
不幸的是,我不知道现有的 Unix 命令可以完全满足您的需要。您可以使用任何语言的小程序轻松完成此操作,例如Python,
cutter.py
(当然,任何语言也可以):现在
gunzipFifthone
将在文件fifthone
中放入正好一百万个字节,跳过未压缩流中的前 400 万个字节。Unfortunately I don't know of an existing Unix command that does exactly what you need. You could do it easily with a little program in any language, e.g. in Python,
cutter.py
(any language would do just as well, of course):Now
gunzip <huge.gz | python cutter.py 1000000 5 > fifthone
will put in filefifthone
exactly a million bytes, skipping the first 4 million bytes in the uncompressed stream.