查找正文是否包含 gzip 压缩数据
我有一个程序,它会搜索来自特定字符串的curl 请求的回复。我有时会得到 gzip 压缩的数据。有没有办法确定回复是文本格式还是压缩格式? 标头有时包含 gziipped,deflate 标头,但其不一致。有没有办法搜索字符串并查找其是否经过 gzip 压缩?
i have a program wherein it searches the reply from a curl request for specific strings. i sometimes get gzipped data. is there a way to find whether the reply is text or gzipped format?
header sometimes contain gziipped,deflate header, but its not consistent. is there a way to search the string and find if its gzipped?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以尝试查看数据的前两个字节。对于 gzip 压缩数据,它们应该是 0x1f、0x8b。
You could try taking a look at the first two bytes of data. For gzipped data, they should be 0x1f, 0x8b.
您可以查看文件的第一个字节。也许它们包含一个魔法数字。
You could look at the first bytes of the file. Perhaps they containt a magic number.
gzip 文件格式以一些“魔法字节”开头。您可以检查主体是否以这些开头,如果是,则将字节推回流中并开始解压缩。
The gzip file format starts with some "magic bytes". You can check whether the body starts with these, and if it does, push back the bytes into the stream and start unzipping it.
您可以通过 zcat 进行管道传输,如果失败,请按原样使用该字符串。我知道这很马虎,但它应该是可靠的;纯文本文件永远不会包含有效的 gzip 压缩数据。
You could pipe it through zcat, and if it fails, use the string as is. Sloppy I know, but it ought to be reliable; a plain text file would never contain valid gzipped data.
符合标准的 HTTP 响应将包含 Content-Encoding: 或 Transfer-Encoding: 标头,为压缩响应指定“gzip”,从而无需通过查看幻数进行猜测。不幸的是,许多网站的这些标题都是错误的。
Standards-compliant HTTP responses will contain a Content-Encoding: or Transfer-Encoding: header specifying "gzip" for compressed responses, eliminating the need to guess by looking at magic number. Unfortunately, lots of sites get these headers wrong, though.