压缩后将哈夫曼树写入文件
在插入所有实际的压缩文件数据后,我试图将霍夫曼树写入压缩文件。但是,我刚刚意识到一个问题,假设我决定一旦所有实际数据都写入文件,我将放入 2 个换行字符,然后写入树。 这意味着,当我读回内容时,这两个换行符(或实际上的任何字符)是我的分隔符。问题是,实际数据完全有可能也有两个相继的换行符,在这种情况下,我的分隔符检查将失败。 我在这里举了两个换行的例子,但对于任何字符串都是如此,我可以通过采用更长的字符串作为分隔符来颠覆问题,但这会产生两个不良影响: 1. 压缩数据中出现长字符串的可能性仍然很小。 2. 不必要地膨胀需要压缩的文件。
有人对如何将压缩数据与树数据分离有任何建议吗?
I'm trying to write a Huffman tree to the compressed file after all the actual compressed file data has been inserted. But , i just realized a bit of a problem , suppose I decide that once all my actual data has been written to file , I will put in 2 linefeed characters and then write the tree.
That means , when I read stuff back, those two linefeeds (or any character really) are my delimiters. The problem is , that its entirely possible that the actual data also has 2 linefeeds one after the other, in such a scenario, my delimiter check would fail.
I've taken the example of two linefeeds here , but the same is true for any character string, I could subvert the problem by maybe taking a longer string as the delimiter , but that would have two undersirable effects:
1. There is still a remote chance that the long string is by some coincidence present in the compressed data.
2. Un-necessarily bloating a file which needs to be compressed.
Does anyone have any suggestions on how to separate the compressed data from the tree data ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,以字节为单位写入树的大小。然后,编写树本身,然后编写内容本身。
读取时,首先读取大小,然后读取树(现在您知道要读取多少个字符),然后读取内容。
大小可以写为字符串,以换行符结尾 - 这样,您就知道第一个数字和换行符属于树的大小。
First, write the size of the tree in bytes. Then, write the tree itself, and then the contents itself.
When reading, first read the size, then the tree (now you know how many characters to read), and then the contents.
The size can be written as a string, ending with a line feed - this way, you know that the first number and line feeds belong to the size of the tree.
为什么不在前 8 个字节(各 4 个)上写入大小和长度,然后再写入数据?
然后是这样的:
应该有效。
您可以缩小数据以获得更好的压缩效果。
Why not write the size and len on the first 8 bytes (4 each) and then the data?
Then something like:
Should work.
You could deflate the data for better compression.