逐字节读取二进制 istream

发布于 2024-10-29 13:49:05 字数 930 浏览 1 评论 0原文

我试图使用 ifstream 逐字节读取二进制文件。我之前使用过像 get() 这样的 istream 方法来一次读取二进制文件的整个块,没有任何问题。但我当前的任务适合逐字节进行,并依靠 io 系统中的缓冲来提高效率。问题是我似乎比应有的时间早了几个字节到达文件末尾。所以我编写了以下测试程序:

#include <iostream>
#include <fstream>

int main() {
    typedef unsigned char uint8;
    std::ifstream source("test.dat", std::ios_base::binary);
    while (source) {
        std::ios::pos_type before = source.tellg();
        uint8 x;
        source >> x;
        std::ios::pos_type after = source.tellg();
        std::cout << before << ' ' << static_cast<int>(x) << ' '
                  << after << std::endl;
    }
    return 0;
}

这会转储 test.dat 的内容,每行一个字节,显示前后的文件位置。

果然,如果我的文件碰巧有两字节序列 0x0D-0x0A(对应于回车符和换行符),那么这些字节将被跳过。

  • 我已经以二进制模式打开了流。这不应该阻止它解释行分隔符吗?
  • 提取运算符是否始终使用文本模式?
  • 从二进制 istream 中逐字节读取的正确方法是什么?

Windows 上的 MSVC++ 2008。

I was attempting to read a binary file byte by byte using an ifstream. I've used istream methods like get() before to read entire chunks of a binary file at once without a problem. But my current task lends itself to going byte by byte and relying on the buffering in the io-system to make it efficient. The problem is that I seemed to reach the end of the file several bytes sooner than I should. So I wrote the following test program:

#include <iostream>
#include <fstream>

int main() {
    typedef unsigned char uint8;
    std::ifstream source("test.dat", std::ios_base::binary);
    while (source) {
        std::ios::pos_type before = source.tellg();
        uint8 x;
        source >> x;
        std::ios::pos_type after = source.tellg();
        std::cout << before << ' ' << static_cast<int>(x) << ' '
                  << after << std::endl;
    }
    return 0;
}

This dumps the contents of test.dat, one byte per line, showing the file position before and after.

Sure enough, if my file happens to have the two-byte sequence 0x0D-0x0A (which corresponds to carriage return and line feed), those bytes are skipped.

  • I've opened the stream in binary mode. Shouldn't that prevent it from interpreting line separators?
  • Do extraction operators always use text mode?
  • What's the right way to read byte by byte from a binary istream?

MSVC++ 2008 on Windows.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

孤千羽 2024-11-05 13:49:06

>>提取器用于格式化输入;他们跳过空白(通过
默认)。对于单个字符的无格式输入,您可以使用
istream::get() (返回int,如果读取失败则为 EOF,或者
[0,UCHAR_MAX]) 或 istream::get(char&) 范围内的值(将
在参数中读取的字符,返回转换为的内容
bool,如果读取成功则为 true,如果失败则为 false。

The >> extractors are for formatted input; they skip white space (by
default). For single character unformatted input, you can use
istream::get() (returns an int, either EOF if the read fails, or
a value in the range [0,UCHAR_MAX]) or istream::get(char&) (puts the
character read in the argument, returns something which converts to
bool, true if the read succeeds, and false if it fails.

原谅过去的我 2024-11-05 13:49:06

有一个 read() 成员函数,您可以在其中指定字节数。

there is a read() member function in which you can specify the number of bytes.

可爱咩 2024-11-05 13:49:06

为什么使用格式化提取,而不是 .read()

Why are you using formatted extraction, rather than .read()?

江城子 2024-11-05 13:49:06
source.get()

会给你一个字节。它是无格式输入函数。
运算符>>是格式化输入函数,可能意味着跳过空白字符。

source.get()

will give you a single byte. It is unformatted input function.
operator>> is formatted input function that may imply skipping whitespace characters.

红颜悴 2024-11-05 13:49:06

正如其他人提到的,您应该使用 istream::read() 。但是,如果必须使用格式化提取,请考虑 std::noskipws

As others mentioned, you should use istream::read(). But, if you must use formatted extraction, consider std::noskipws.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文