读取并霍夫曼压缩 4 字节二进制字符串 STD C++ Linux环境

发布于 2024-11-10 07:32:25 字数 800 浏览 5 评论 0原文

我正在做一些霍夫曼编码的作业。我已经完成了霍夫曼算法,但需要稍微改变它才能处理二进制文件。我花了一些时间阅读相关问题,也许由于我对数据类型和二进制文件缺乏了解,我仍然有点挣扎,所以希望我不会重复之前的问题(我不会发布相关代码到程序的霍夫曼部分)。

这是关键短语:“你可以假设将映射到码字的每个符号是一个 4 字节的二进制字符串。”,我想我知道的是 Char 代表一个字节,unsigned int 代表四个字节,所以我猜我应该一次将输入四个字节读取到 unsigned int 缓冲区中,然后收集程序的霍夫曼部分的数据。

int main() {
    unsigned int buffer;
    fstream input;
    input.open("test.txt", ios::in | ios::binary);


    while(input) {
        input.read(reinterpret_cast<char *>(&buffer), 4);
        //if buffer does not exist as unique symbol in collection of data add it
        //if buffer exists update statistics of symbol
    }
    input.close();
}

这看起来是处理数据的好方法吗?如果只剩下 1、2 或 3 个字节,我应该如何处理文件的末尾?那么我只是将缓冲区作为 unsigned int 存储在结构中。只是出于好奇,我如何将缓冲区重新转换为字符串?
编辑:存储霍夫曼压缩文件的标头的最佳方式是什么?

I am working on some homework for Huffman coding. I already have the Huffman algorithm completed, but need to slightly alter it to work with binary files. I have some spent some time reading related problems, and perhaps due to my lack of understanding of data types and binary files, I am still struggling a bit, so hopefully I am not repeating a prior question (I won't be posting code related to the huffman part of the program).

Here is the key phrase: "You can assume that each symbol, which will be mapped to a codeword, is a 4-byte binary string.", and what I think I know is that Char represents one byte and unsigned int represents four byte, so I am guessing I should be reading the input four bytes at a time into a unsigned int Buffer and then collect my data for the Huffman part of the program.

int main() {
    unsigned int buffer;
    fstream input;
    input.open("test.txt", ios::in | ios::binary);


    while(input) {
        input.read(reinterpret_cast<char *>(&buffer), 4);
        //if buffer does not exist as unique symbol in collection of data add it
        //if buffer exists update statistics of symbol
    }
    input.close();
}

Does this look like a good way to handle the data? How should I handle the very end of the file if there are only 1,2, or 3 bytes left? So then I am just storing buffer as unsigned int in a struct. Just out of curiosity how would I recast buffer to a string of characters?
Edit: What's the best way to store the header of a Huffman compressed a file?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

天邊彩虹 2024-11-17 07:32:25

这看起来是处理数据的好方法吗?

我建议使用 intchar [4]union 并将指针传递给 char,而不是强制转换指针code> array 正如你应该的那样。不知道其余的逻辑是什么,所以不能说实际处理(不在您发布的代码中)是否以良好的方式完成,但在我看来相当微不足道。

如果只剩下 1、2 或 3 个字节,我该如何处理文件的末尾?

假设每个符号都是 4 个字节长,我预计这不是有效的输入。

那么我只是将缓冲区作为 unsigned int 存储在结构中。只是出于好奇,我如何将缓冲区重新转换为字符串?

你为什么要这么做?在您的数据中,一个“字符”是 4 个字节。但是,如果您愿意,您可以使用强制转换为字节数组(或者,更好的是,如果顺序很重要,可以使用按位运算来提取实际字节)。

Does this look like a good way to handle the data?

Instead of casting a pointer, I would suggest using union of int and char [4] and passing pointer to the char array as you should be. Don't know what's the rest of the logic, so can't say if the actual handling (which is not in the code you posted) is done in a good way, but it seems to me rather trivial.

How should I handle the very end of the file if there are only 1,2, or 3 bytes left?

Assuming each symbol is 4 bytes long, I would expect that not be a valid input.

So then I am just storing buffer as unsigned int in a struct. Just out of curiosity how would I recast buffer to a string of characters?

Why would you do that? In your data, a "character" is 4 bytes. But you can just use casting to array of bytes if you want (or, better, use bitwise operations to extract the actual bytes, if the order matters).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文