gzcompress 是如何工作的?

发布于 2024-09-08 14:33:14 字数 818 浏览 0 评论 0原文

我想知道为什么在使用 gzcompress() 后需要截掉最后 4 个字符。

这是我的代码:

header("Content-Encoding: gzip");
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";
$index = $smarty->fetch("design/templates/main.htm") ."\n<!-- Compressed by gzip -->";
$this->content_size = strlen($index);
$this->content_crc = crc32($index);
$index = gzcompress($index, 9);
$index = substr($index, 0, strlen($index) - 4); // Why cut off ??
echo $index;
echo pack('V', $this->content_crc) . pack('V', $this->content_size);

当我不剪切最后 4 个字符时,源代码的结尾如下:

[...]
<!-- Compressed by gzip -->N

当我将它们剪切掉时,它显示为:

[...]
<!-- Compressed by gzip -->

我只能在 Chrome 代码检查器中看到附加的 N(不在 Firefox 中,不在 IE 源中) )。但代码末尾似乎有四个附加字符。

谁能解释一下,为什么我需要剪掉4个字符?

I'm wondering about why I need to cut off the last 4 Characters, after using gzcompress().

Here is my code:

header("Content-Encoding: gzip");
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";
$index = $smarty->fetch("design/templates/main.htm") ."\n<!-- Compressed by gzip -->";
$this->content_size = strlen($index);
$this->content_crc = crc32($index);
$index = gzcompress($index, 9);
$index = substr($index, 0, strlen($index) - 4); // Why cut off ??
echo $index;
echo pack('V', $this->content_crc) . pack('V', $this->content_size);

When I don't cut of the last 4 chars, the source ends like:

[...]
<!-- Compressed by gzip -->N

When I cut them off it reads:

[...]
<!-- Compressed by gzip -->

I could see the additional N only in Chromes Code inspector (not in Firefox and not in IEs source). But there seams to be four additional characters at the end of the code.

Can anyone explain me, why I need to cut off 4 chars?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

昇り龍 2024-09-15 14:33:14

gzcompress 实现 ZLIB 压缩数据格式,具有 以下结构

     0   1
   +---+---+
   |CMF|FLG|   (more-->)
   +---+---+

(if FLG.FDICT set)

     0   1   2   3
   +---+---+---+---+
   |     DICTID    |   (more-->)
   +---+---+---+---+

   +=====================+---+---+---+---+
   |...compressed data...|    ADLER32    |
   +=====================+---+---+---+---+

在这里你可以看到最后四个字节是一个Adler-32 校验和

与此相反,GZIP 文件格式 是所谓的成员列表以下结构:

   +---+---+---+---+---+---+---+---+---+---+
   |ID1|ID2|CM |FLG|     MTIME     |XFL|OS | (more-->)
   +---+---+---+---+---+---+---+---+---+---+

(if FLG.FEXTRA set)

   +---+---+=================================+
   | XLEN  |...XLEN bytes of "extra field"...| (more-->)
   +---+---+=================================+

(if FLG.FNAME set)

   +=========================================+
   |...original file name, zero-terminated...| (more-->)
   +=========================================+

(if FLG.FCOMMENT set)

   +===================================+
   |...file comment, zero-terminated...| (more-->)
   +===================================+

(if FLG.FHCRC set)

   +---+---+
   | CRC16 |
   +---+---+

   +=======================+
   |...compressed blocks...| (more-->)
   +=======================+

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   |     CRC32     |     ISIZE     |
   +---+---+---+---+---+---+---+---+

如您所见,GZIP 使用 CRC-32 校验和 进行完整性检查。

因此,要分析您的代码:

  • echo "\x1f\x8b\x08\x00\x00\x00\x00\x00"; – 输出以下标头字段:
    • 0x1f 0x8b – ID1和ID2,标识数据格式的标识符(这些是固定值)
    • 0x08 – CM,使用的压缩方法; 8 表示使用 DEFLATE 数据压缩格式 (RFC 1951)
    • 0x00 – FLG,标志
    • 0x00000000 – MTIME,修改时间
    • 字段 XFL(额外标志)和 OS(操作系统)由 DEFLATE 数据压缩格式设置
  • echo $index; – 根据 DEFLATE 数据压缩输出压缩数据格式
  • echo pack('V', $this->content_crc) 。 pack('V', $this->content_size); – 以二进制形式输出 CRC-32 校验和以及未压缩输入数据的大小

gzcompress implements the ZLIB compressed data format that has the following structure:

     0   1
   +---+---+
   |CMF|FLG|   (more-->)
   +---+---+

(if FLG.FDICT set)

     0   1   2   3
   +---+---+---+---+
   |     DICTID    |   (more-->)
   +---+---+---+---+

   +=====================+---+---+---+---+
   |...compressed data...|    ADLER32    |
   +=====================+---+---+---+---+

Here you see that the last four bytes is a Adler-32 checksum.

In contrast to that, the GZIP file format is a list of of so called members with the following structure:

   +---+---+---+---+---+---+---+---+---+---+
   |ID1|ID2|CM |FLG|     MTIME     |XFL|OS | (more-->)
   +---+---+---+---+---+---+---+---+---+---+

(if FLG.FEXTRA set)

   +---+---+=================================+
   | XLEN  |...XLEN bytes of "extra field"...| (more-->)
   +---+---+=================================+

(if FLG.FNAME set)

   +=========================================+
   |...original file name, zero-terminated...| (more-->)
   +=========================================+

(if FLG.FCOMMENT set)

   +===================================+
   |...file comment, zero-terminated...| (more-->)
   +===================================+

(if FLG.FHCRC set)

   +---+---+
   | CRC16 |
   +---+---+

   +=======================+
   |...compressed blocks...| (more-->)
   +=======================+

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   |     CRC32     |     ISIZE     |
   +---+---+---+---+---+---+---+---+

As you can see, GZIP uses a CRC-32 checksum for the integrity check.

So to analyze your code:

  • echo "\x1f\x8b\x08\x00\x00\x00\x00\x00"; – puts out the following header fields:
    • 0x1f 0x8b – ID1 and ID2, identifiers to identify the data format (these are fixed values)
    • 0x08 – CM, compression method that is used; 8 denotes the use of the DEFLATE data compression format (RFC 1951)
    • 0x00 – FLG, flags
    • 0x00000000 – MTIME, modification time
    • the fields XFL (extra flags) and OS (operation system) are set by the DEFLATE data compression format
  • echo $index; – puts out compressed data according to the DEFLATE data compression format
  • echo pack('V', $this->content_crc) . pack('V', $this->content_size); – puts out the CRC-32 checksum and the size of the uncompressed input data in binary
江南月 2024-09-15 14:33:14

gzcompress 产生此处描述的输出 RFC1950 ,您要截断的最后 4 个字节是adler32 校验和。这是“deflate”编码,因此您应该只设置“Content-Encoding: deflate”而不操作任何内容。

如果您想使用 gzip,请使用 gzencode() ,它使用 gzip 格式

gzcompress produces output described here RFC1950 , the last 4 bytes you're chopping off is the adler32 checksum. This is the "deflate" encoding, so you should just set "Content-Encoding: deflate" and not manipulate anything.

If you want to use gzip, use gzencode() , which uses the gzip format.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文