压缩解码
我目前正在阅读有关编码/解码数据的 DEFLATE 方法。 组成:
据我了解,该过程由两部分 将重复信息(在指定窗口内)替换为对前一个相同片段的引用。
二.使用霍夫曼编码来减少最常出现的符号的大小。
我有一个关于(i)的问题。 DEFLATE 使用 LZ77,它基于大小窗口搜索信息,如果发现任何重复信息,则用“指针”替换它。这是完全有道理的。
然而,当使用LZ77解码时,DEFLATE如何识别指针呢? (指针是长度-距离对;您如何辨别它是指针还是初始数据中存在的数字?)
参考:http://en.wikipedia.org/wiki/DEFLATE#Duplicate_string_elimination
I am currently reading about the DEFLATE method for encoding/decoding data. I understand that the process is composed of two parts:
i. Replace duplicate information (within a specified window) with a reference back to the previous identical piece.
ii. Use Huffman coding to reduce the size of the most commonly occurring symbols.
I have a question with regards to (i). DEFLATE uses LZ77 which, based on a size window, searches through the information and, if it finds any duplicate information, replaces it with a "pointer". That makes perfect sense.
However, when decoding using LZ77 how does DEFLATE recognize a pointer? (Pointers are length-distance pairs; how can you discern if it's a pointer or just a number that was present in the initial data?)
Reference: http://en.wikipedia.org/wiki/DEFLATE#Duplicate_string_elimination
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
建议阅读更精确的 Deflate RFC 1951 规范,并回答这样的问题问题。
您将在 => 中看到什么3.2.5.压缩块(长度和距离代码)
“文字和长度字母表合并为一个字母表”,
这意味着,只需检索下一个符号,您就可以立即知道它是文字 (0..255) 还是匹配项长度(257..285),甚至是块的末尾(256)。在匹配长度的情况下,还必须解码参考(偏移量)。偏移量使用单独的树进行编码。
It's recommended to read the Deflate RFC 1951 specification, which is much more precise, and answer such questions.
What you'll see in => 3.2.5. Compressed blocks (length and distance codes)
"the literal and length alphabets are merged into a single alphabet"
which means that, by simply retrieving the next symbol, you immediately know if it is a literal (0..255), or a match length (257..285), or even an end of block (256). In case of a match length, a reference (offset) must be decoded too. Offset are encoded using a separate tree.