如何使用命令行工具DEFLATE来提取git对象?
我正在寻找 DEFLATE 算法的命令行包装器。
我有一个使用 DEFLATE 压缩的文件 (git blob),我想解压缩它。 gzip命令似乎没有直接使用DEFLATE算法的选项,而不是gzip格式。
理想情况下,我正在寻找一个可以做到这一点的标准 Unix/Linux 工具。
编辑:这是我尝试使用 gzip 解决问题时得到的输出:
$ cat .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7 | gunzip
gzip: stdin: not in gzip format
I'm looking for a command line wrapper for the DEFLATE algorithm.
I have a file (git blob) that is compressed using DEFLATE, and I want to uncompress it. The gzip command does not seem to have an option to directly use the DEFLATE algorithm, rather than the gzip format.
Ideally I'm looking for a standard Unix/Linux tool that can do this.
edit: This is the output I get when trying to use gzip for my problem:
$ cat .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7 | gunzip
gzip: stdin: not in gzip format
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(22)
您可以使用 OpenSSL 命令行工具执行此操作:
不幸的是,至少在 Ubuntu 上,
zlib
子命令在默认构建配置中被禁用 (--no-zlib
--no-zlib-dynamic
),因此您需要从源代码编译openssl
才能使用它。但例如,它在 Arch 上默认启用。编辑:Arch 似乎不再支持
zlib
命令。这个答案可能不再有用:(You can do this with the OpenSSL command line tool:
Unfortunately, at least on Ubuntu, the
zlib
subcommand is disabled in the default build configuration (--no-zlib
--no-zlib-dynamic
), so you would need to compileopenssl
from source to use it. But it is enabled by default on Arch, for example.Edit: Seems like the
zlib
command is no longer supported on Arch either. This answer might not be useful anymore :(类似下面的内容将打印原始内容,包括“$type $length\0”标头:
Something like the following will print the raw content, including the "$type $length\0" header:
pythonic one-liner(针对 python3 文本和二进制数据之间的明显区别进行了更新):
pythonic one-liner (updated for python3's sharp distinction between text and binary data):
更新:Mark Adler 指出 git blob 不是原始 DEFLATE 流,而是 zlib 流。这些可以通过
pigz
工具解压,该工具预先打包在多个 Linux 发行版中:由 kriegaex 编辑:Git Bash for Windows 用户会注意到 pigz< /em> 默认情况下不可用。您可以在此处找到预编译的 32/64 位版本。我尝试了 64 位版本,效果很好。例如,您可以将pigz.exe直接复制到
c:\Program Files\Git\usr\bin
,以便将其放在路径上。由 mjaggard 编辑: Homebrew 和 Macports 都有
pigz
可用,因此您可以使用brew install pigz
或sudo port install pigz
进行安装code>(如果您还没有,您可以按照其网站上的说明安装 Homebrew)我原来的答案,出于历史原因保留:
如果我理解维基百科文章<中的提示< Marc van Kempen 提到的 /a> ,您可以使用
puff.c
直接来自 zlib 。这是一个小例子:
UPDATE: Mark Adler noted that git blobs are not raw DEFLATE streams, but zlib streams. These can be unpacked by the
pigz
tool, which comes pre-packaged in several Linux distributions:Edit by kriegaex: Git Bash for Windows users will notice that pigz is unavailable by default. You can find precompiled 32/64-bit versions here. I tried the 64-bit version and it works nicely. You can e.g. copy pigz.exe directly to
c:\Program Files\Git\usr\bin
in order to put it on the path.Edit by mjaggard: Homebrew and Macports both have
pigz
available so you can install withbrew install pigz
orsudo port install pigz
(if you do not have it already, you can install Homebrew by following the instructions on their website)My original answer, kept for historical reasons:
If I understand the hint in the Wikipedia article mentioned by Marc van Kempen, you can use
puff.c
from zlib directly.This is a small example:
您可以使用 zlib-flate,如下所示:
它默认存在于我的计算机上,但如果您需要安装它,它是
qpdf - 用于转换和检查 PDF 文件的工具
的一部分。我在命令末尾弹出了一个
echo
,因为这样更容易读取输出。You can use zlib-flate, like this:
It's there by default on my machine, but it's part of
qpdf - tools for and transforming and inspecting PDF files
if you need to install it.I've popped an
echo
on the end of the command, as it's easier to read the output that way.尝试以下命令:
不需要外部工具。
来源:如何在 UNIX 中解压缩 zlib 数据? 在 Unix SE
Try the following command:
No external tools are needed.
Source: How to uncompress zlib data in UNIX? at unix SE
这是一个 Ruby 单行代码(首先 cd .git/ 并识别任何对象的路径):
Here is a Ruby one-liner ( cd .git/ first and identify path to any object ):
我厌倦了没有一个好的解决方案,所以我在 NPM 上放了一些东西:
https://github.com/ jezell/zlibber
现在可以通过管道来执行 inflate / deflate 命令。
I got tired of not having a good solution for this, so I put something on NPM:
https://github.com/jezell/zlibber
Now can just pipe to inflate / deflate command.
下面是在 Python 中打开提交对象的示例:
您将看到的内容与“git cat-file -p [hash]”的输出几乎相同,只是该命令不打印标题(后面跟着“commit”)由内容的大小和空字节)。
Here's a example of breaking open a commit object in Python:
What you will see there is almost identical to the output of 'git cat-file -p [hash]', except that command doesn't print the header ('commit' followed by the size of the content and a null byte).
git 对象是通过
zlib
而不是gzip
压缩的,因此要么使用zlib
来解压缩它,要么使用 git 命令,即git cat- file -p
,打印内容。git objects are compressed by
zlib
rather thangzip
, so either usingzlib
to uncompress it, or git command, i.e.git cat-file -p <SHA1>
, to print content.看起来 Mark Adler 已经想到了我们,并写了一个如何执行此操作的示例: http://www .zlib.net/zpipe.c
它只使用
gcc -lz
和安装的 zlib 标头进行编译。我在使用 git 内容时将生成的二进制文件复制到我的/usr/local/bin/zpipe
中。Looks like Mark Adler has us in mind and wrote an example of just how to do this with: http://www.zlib.net/zpipe.c
It compiles with nothing more than
gcc -lz
and the zlib headers installed. I copied the resulting binary to my/usr/local/bin/zpipe
while working with git stuff.pigz 可以做到:
pigz can do it:
git 对象是 zlib 流(不是原始 deflate)。 pigz 将使用
-dz
选项解压缩这些文件。git objects are zlib streams (not raw deflate). pigz will decompress those with the
-dz
option.Python3 oneliner:
这种方式将内容作为二进制数据处理,避免与 unicode 之间的转换。
Python3 oneliner:
This way the contents is handled as binary data, avoiding conversion to/from unicode.
我多次遇到这个问题,似乎互联网上几乎所有的答案都是错误的,需要编译一些不太理想的代码,或者下载系统未跟踪的大量依赖项!但我找到了真正的解决方案。它使用 PERL,因为 PERL 在大多数系统上都很容易使用。
从类似 Bash 的 shell:
或者,如果您手动执行/fork(没有 shell 引号,但行分隔):
perl
-mIO::Uncompress::RawInflate=rawinflate
-erawinflate"-","-"
重要警告:如果流不是作为有效的 DEFLATE 流(例如,未压缩的数据)开始的,那么此命令将很高兴不受影响地通过管道传输所有数据。仅当流以有效的 DEFLATE 流开始(我想带有有效的字典?我不太确定......),然后这个命令会以某种方式出错。然而,在某些情况下这可能是理想的。
参考文献:
PERL IO::Uncompress::RawInflate::rawinflate
I have repeatedly come across this problem and it seems almost all of answers on the Internet are either wrong, require compiling some less than ideal code, or downloading a whole slew of dependencies untracked by the system! But I found a real solution. It uses PERL since PERL is readily available on most systems.
From a Bash-alike shell:
Or, if you're exec/fork-ing manually (without shell quotes, but line separated):
perl
-mIO::Uncompress::RawInflate=rawinflate
-erawinflate"-","-"
Big caveat: If the stream doesn't start off as a valid DEFLATE stream (such as say, uncompressed data), then this command will happily pipe all the data through untouched. Only if the stream begins as a valid DEFLATE stream (with a valid dictionary I suppose? I'm not too sure...), then this command will error somehow. In some situations this may be desirable however.
References:
PERL IO::Uncompress::RawInflate::rawinflate
请参阅 http://en.wikipedia.org/wiki/DEFLATE#Encoder_implementations
它列出了许多软件实现,包括 gzip,所以应该可以工作。您是否尝试在文件上运行 gzip ?它不会自动识别格式吗?
你怎么知道它是使用 DEFLATE 压缩的?使用什么工具来压缩文件?
See http://en.wikipedia.org/wiki/DEFLATE#Encoder_implementations
It lists a number of software implementations, including gzip, so that should work. Did you try just running gzip on the file? Does it not recognize the format automatically?
How do you know it is compressed using DEFLATE? What tool was used to compress the file?
为什么不直接使用 git 的工具来访问数据呢?这应该能够读取任何 git 对象:
Why don't you just use git's tools to access the data? This should be able to read any git object:
这就是我使用 Powershell 的方法。
然后,您可以创建一个别名,例如:
This is how I do it with Powershell.
You can then create an alias like:
我发现这个问题正在寻找我刚刚安装的新版本的
hadoop dfs
客户端中的-text
实用程序错误的解决方法。-text
实用程序的工作方式与cat
类似,只不过如果读取的文件是压缩的,它会透明地解压缩并输出纯文本(因此得名)。已经发布的答案肯定很有帮助,但其中一些答案在处理 Hadoop 大小的数据量时存在一个问题 - 他们在解压缩之前将所有内容读入内存。
因此,以下是我对上面的
Perl
和Python
答案的变体,它们没有该限制:Python:
Perl:
注意
-cat
的使用code> 子命令,而不是-text
。这样我的解决方法就不会在他们修复错误后中断。对 python 版本的可读性表示歉意。I found this question looking for a work-around with a bug with the
-text
utility in the new version of thehadoop dfs
client I just installed. The-text
utility works likecat
, except if the file being read is compressed, it transparently decompresses and outputs the plain-text (hence the name).The answers already posted were definitely helpful, but some of them have one problem when dealing with Hadoop-sized amounts of data - they read everything into memory before decompressing.
So, here are my variations on the
Perl
andPython
answers above that do not have that limitation:Python:
Perl:
Note the use of the
-cat
sub-command, instead of-text
. This is so that my work-around does not break after they've fixed the bug. Apologies for the readability of the python version.为了添加到这个集合中,这里有用于 deflate/inflate/raw deflate/raw inflate 的 perl 单行代码。
放气
充气
原始放气
原始充气
To add to the collection, here are perl one-liners for deflate/inflate/raw deflate/raw inflate.
Deflate
Inflate
Raw deflate
Raw inflate