读取并霍夫曼压缩 4 字节二进制字符串 STD C++ Linux环境

发布于 2024-11-10 07:32:25 字数 800 浏览 5 评论 0原文

我正在做一些霍夫曼编码的作业。我已经完成了霍夫曼算法，但需要稍微改变它才能处理二进制文件。我花了一些时间阅读相关问题，也许由于我对数据类型和二进制文件缺乏了解，我仍然有点挣扎，所以希望我不会重复之前的问题（我不会发布相关代码到程序的霍夫曼部分）。

这是关键短语：“你可以假设将映射到码字的每个符号是一个 4 字节的二进制字符串。”，我想我知道的是 Char 代表一个字节，unsigned int 代表四个字节，所以我猜我应该一次将输入四个字节读取到 unsigned int 缓冲区中，然后收集程序的霍夫曼部分的数据。

int main() {
    unsigned int buffer;
    fstream input;
    input.open("test.txt", ios::in | ios::binary);


    while(input) {
        input.read(reinterpret_cast<char *>(&buffer), 4);
        //if buffer does not exist as unique symbol in collection of data add it
        //if buffer exists update statistics of symbol
    }
    input.close();
}

这看起来是处理数据的好方法吗？如果只剩下 1、2 或 3 个字节，我应该如何处理文件的末尾？那么我只是将缓冲区作为 unsigned int 存储在结构中。只是出于好奇，我如何将缓冲区重新转换为字符串？
编辑：存储霍夫曼压缩文件的标头的最佳方式是什么？

原文

I am working on some homework for Huffman coding. I already have the Huffman algorithm completed, but need to slightly alter it to work with binary files. I have some spent some time reading related problems, and perhaps due to my lack of understanding of data types and binary files, I am still struggling a bit, so hopefully I am not repeating a prior question (I won't be posting code related to the huffman part of the program).

Here is the key phrase: "You can assume that each symbol, which will be mapped to a codeword, is a 4-byte binary string.", and what I think I know is that Char represents one byte and unsigned int represents four byte, so I am guessing I should be reading the input four bytes at a time into a unsigned int Buffer and then collect my data for the Huffman part of the program.

int main() {
    unsigned int buffer;
    fstream input;
    input.open("test.txt", ios::in | ios::binary);


    while(input) {
        input.read(reinterpret_cast<char *>(&buffer), 4);
        //if buffer does not exist as unique symbol in collection of data add it
        //if buffer exists update statistics of symbol
    }
    input.close();
}

Does this look like a good way to handle the data? How should I handle the very end of the file if there are only 1,2, or 3 bytes left? So then I am just storing buffer as unsigned int in a struct. Just out of curiosity how would I recast buffer to a string of characters?
Edit: What's the best way to store the header of a Huffman compressed a file?

分享到QQ

分享到微博