为什么使用无符号字符写入二进制文件?为什么不应该使用流运算符写入二进制文件?
我的第一个问题是,为什么习惯上使用无符号字符以二进制模式写入文件?在我见过的所有示例中,在写入二进制文件之前,任何其他数值都会被转换为 unsigned char。
我的第二个问题是,使用流运算符写入二进制文件有什么不好?我听说 read() 和 write() 运算符最适合写入二进制文件,但我不太明白为什么会这样。如果我首先将值转换为无符号字符,则使用流运算符写入二进制文件对我来说效果很好。
float num = 500.5;
ostream file("file.txt", ios::binary);
file << num // results in gibberish when I try to read the file later
file << (unsigned char)num // no problems reading the file with stream operators
提前致谢。
My first question is, why is it customary to use unsigned chars for writing to files in binary mode? In all of the examples I have seen, any other numerical value is casted to unsigned char before writing to the binary file.
My second question is, what's so bad about using stream operators to write to binary files? I've heard that read() and write() operators are best used for writing to binary files, but I don't really understand why that's the case. Using stream operators to write to binary files works fine for me IF I first cast the value to unsigned char.
float num = 500.5;
ostream file("file.txt", ios::binary);
file << num // results in gibberish when I try to read the file later
file << (unsigned char)num // no problems reading the file with stream operators
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
chars
是 C/C++ 中最小的类型(根据定义,sizeof( char ) == 1
)。这是将对象视为字节序列的常用方法。unsigned
用于避免有符号算术妨碍,因为它最好地表示二进制内容(0 到 255 之间的值)。为了操作二进制文件,流提供了
read
和write
函数。插入和提取功能已格式化。它只是偶然为您工作,例如,如果您输出一个带有 << 的整数那么它实际上会输出整数值的文本表示形式,而不是其二进制表示形式。在您提供的示例中,您在输出之前将浮点数转换为无符号字符,实际上将实际值转换为小整数。当您尝试从文件中读回浮点数时,您会得到什么?chars
are the smallest type in C/C++ (by definition,sizeof( char ) == 1
). Its the usual way to see objects as a sequence of bytes.unsigned
is used to avoid signed arithmethic to get in the way, and because it best represents binary contents (a value between 0 and 255).To operate on binary files, streams provide the
read
andwrite
functions. The insertion and extraction functionality is formatted. It's working for you just by chance, for instance if you output an integer with << then it will actually output the textual representation of the integer value and not its binary representation. In your provided example, you cast a float to an unsigned char before outputing, actually casting the real value to a small integer. What do you get when you try to read the float back from the file?因为
operator<<
的所有重载都被称为格式化函数。它们在写入输出文件之前格式化数据。换句话说,如果您想将二进制数据写入文件,则不能使用它们。可以使用未格式化函数(不格式化数据的函数)将二进制数据写入文件。std::ostream
提供了一个名为write()
的未格式化输出函数,具有以下签名:它还回答了其他问题:
不,这是错误的。函数
write()
接受const char*
,而不是const unsigned char *
。--
在线文档介绍了
operator<<
:
它说关于
write()
:Because all the overloads of
operator<<
are called formatted functions. They format the data before writing to the output file. In other words, they cannot be used if you want to write binary data to file. Binary data can be written to file with unformatted functions - those which don't format the data.std::ostream
provides one unformatted output function calledwrite()
, with the following signature:which also answers other question that:
No. It is wrong. The function
write()
acceptsconst char*
, notconst unsigned char *
.--
The online doc says about
operator<<
:and it says about
write()
:使用
unsigned char
的原因是它可以保证是unsigned
,这在按位运算时非常理想——在操作二进制时可以派上用场数据。您必须记住char
(也称为普通char
)是 将类型与unsigned char
分开,并且未指定这是有符号类型还是无符号类型。最后,流的格式化函数被设计为输出/解析数据的文本、人类可读表示,例如
123456789
可以< support>1 被表示为九个字符“123456789”
,可以容纳九个字节。作为比较,0x75BCD15
等可能的二进制表示可以容纳四个字节,其紧凑程度是其两倍多。您所做的事情成功并不完全出乎意料,因为某个东西是否是二进制文件仅取决于您正在对它所做的事情。如果您将文本写入文件,稍后检索该文本是正常的。
1:取决于例如语言环境,这是特定于格式化函数的另一个功能。
The reason to use
unsigned char
is that it is guaranteed to beunsigned
, which is very much desirable when it comes to bitwise operations -- which can come in handy when manipulating binary data. You have to keep in mind thatchar
(also known as plainchar
) is a separate type fromunsigned char
and it is not specified whether this is a signed or unsigned type.Finally, the formatted functions of streams are designed to output/parse a textual, human-readable representation of data, where for instance
123456789
could1 be represented as the nine characters"123456789"
, which can fit in nine bytes. For comparison, a possible binary representation as0x75BCD15
can fit in four bytes, which is more than twice as compact.It is not entirely unexpected that what you're doing succeeds, since whether something is a binary file or not is simply determined by what you're doing with it. If you're writing text to the file, it is normal to retrieve that text back later on.
1: depending on e.g. locales, which is another feature specific to formatted functions.