与膨胀算法作斗争

发布于 2024-12-04 13:45:21 字数 456 浏览 1 评论 0原文

即使在阅读了 RFC 并审查了 c 和 javascript 实现之后，我仍然很难理解 inflate 算法的工作原理。我用文本“TestingTesting”压缩了一个文件，并得到了以下十六进制结果： 0B 49 2D 2E C9 CC 4B 0F 81 50 00

我尝试在 16 位和 32 位字节序交换后读取数据，但在读取前 3 位后我无法继续下去，因为接下来的 5 位没有意义。我做错了什么以及如何解析？

我用过的参考资料： RFC 1951 Javascript C

需要登录才能够评论，你可以免费注册一个本站的账号。

压缩器的输出是字节流。为什么要进行字节序交换？

查看二进制形式的前几个字节：

0B  = 00001011
49  = 01001001
2D  = 00101101  
2E  = 00101110
...

来自 RFC 中的第 3.1.1 节：

位是从右向左读取的，因此标头的第一位 BFINAL 是 1：
<前><代码> 00001011
^
数字被压缩为 LSB首先，我们从右到左读取，因此 BTYPE 是 01：
<前><代码> 00001011
^^
这是固定的 Huffman代码块类型，因此我们接下来期望的是霍夫曼代码。霍夫曼代码首先打包 MSB，因此第一个代码是 10000100（我们在这里继续处理下一个字节）：
<前><代码> 00001011
^^^^^
01001001
^^^
查看第 3.2.6 节中的表格，00110000 code> 到 10111111 表示文字字节 0 - 143，因此 10000100 (= 0x84) 是文字值 0x54，它是“T”的 ASCII 码。
继续，下一个代码是10010101 (= 0x95)，它是文字值0x65，即“e”。

...等等。

The output from the compressor is a stream of bytes. Why are you doing an endian swap?

Looking at the first few bytes, as binary:

0B  = 00001011
49  = 01001001
2D  = 00101101  
2E  = 00101110
...

From section 3.1.1 in the RFC:

bits are read from right to left, so the first bit of the header, BFINAL, is 1:
```
 00001011
        ^
```
numbers are packed LSB first, and we read from right-to-left, so BTYPE is 01:
```
 00001011
      ^^
```
That's the fixed Huffman code block type, so we expect a Huffman code next. Huffman codes are packed MSB first, so the first code is 10000100 (we continue on to the next byte here):
```
 00001011
 ^^^^^
 01001001
      ^^^
```
Looking at the table in section 3.2.6, 00110000 to 10111111 represent literal bytes 0 - 143, so 10000100 (= 0x84) is the literal value 0x54, which is the ASCII code for "T".
Continuing, the next code is 10010101 (= 0x95) which is literal value 0x65 which is "e".