C++本机类型 char 可以保存文件结尾字符吗?
标题是非常不言自明的。
char c = std::cin.peek(); // sets c equal to character in stream
我刚刚意识到也许本机类型 char 无法保存 EOF。
谢谢, 核磁共振
The title is pretty self explanatory.
char c = std::cin.peek(); // sets c equal to character in stream
I just realized that perhaps native type char can't hold the EOF.
thanks,
nmr
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
简短回答:不。使用 int 而不是 char。
答案稍长:不可以。如果您可以从函数中获取字符或值 EOF,例如 C 的 getchar 和 C++ 的 peek,显然,普通的 char 变量不足以同时保存所有有效字符和值EOF。
更长的答案:这取决于情况,但它永远不会像你希望的那样起作用。
C 和 C++ 具有三种字符类型(“宽”类型除外):char、signed char 和 unsigned char。普通 char 可以是有符号的,也可以是无符号的,这在编译器之间有所不同。
值 EOF 是一个负整数,通常为 -1,因此显然您不能将其存储在 unsigned char 或普通 char 中那是未签名的。假设您的系统使用 8 位字符(几乎所有系统都使用 8 位字符),EOF 将转换为(十进制)255,并且您的程序将无法运行。
但是如果你的char类型是有符号的,或者如果你使用signed char类型,那么是的,你可以在其中存储-1,所以是的,它可以保存 >EOF。但是当您从文件中读取代码为 255 的字符时会发生什么?它将被解释为 -1,即 EOF(假设您的实现使用 -1)。因此,您的代码不仅会在文件末尾停止读取,而且在找到 255 个字符时也会停止读取。
Short answer: No. Use int instead of char.
Slightly longer answer: No. If you can get either a character or the value EOF from a function, such as C's getchar and C++'s peek, clearly a normal char variable won't be enough to hold both all valid characters and the value EOF.
Even longer answer: It depends, but it will never work as you might hope.
C and C++ has three character types (except for the "wide" types): char, signed char and unsigned char. Plain char can be signed or unsigned, and this varies between compilers.
The value EOF is a negative integer, usually -1, so clearly you can't store it in an unsigned char or in a plain char that is unsigned. Assuming that your system uses 8-bit characters (which nearly all do), EOF will be converted to (decimal) 255, and your program will not work.
But if your char type is signed, or if you use the signed char type, then yes, you can store -1 in it, so yes, it can hold EOF. But what happens then when you read a character with code 255 from the file? It will be interpreted as -1, that is, EOF (assuming that your implementation uses -1). So your code will stop reading not just at the end of the file, but also as soon as it finds a 255 character.
注意,
std::cin.peek()
的返回值实际上是std::basic_ios::int_type
类型,与相同>std::char_traits::int_type
,它是int
而不是char
。更重要的是,
int
返回的值不一定是从char
到int
的简单转换,而是调用>std::char_traits::to_int_type
位于流中的下一个字符或std::char_traits::eof()
(定义为EOF
) 如果没有字符。通常,这一切的实现方式与
fgetc
将字符转换为unsigned char
,然后转换为int
作为其返回值的方式完全相同这样您就可以区分EOF
中的所有有效字符值。如果将
std::cin.peek()
的返回值存储在char
中,则有可能读取具有正值的字符(例如,在iso-8859-1 编码文件)将比较等于 EOF 。迂腐的做法是。
这可能更常见:
请注意,正确转换字符值很重要。如果执行如下比较:
if (ch == '\xff') ...
其中ch
是上面的int
,您可能无法得到正确的结果。您需要在ch
上使用std::char_traits::to_char_type
或在ch
上使用std::char_traits::to_int_type
字符常数以获得一致的结果。 (不过,使用基本字符集的成员通常是安全的。)Note that the return value of
std::cin.peek()
is actually of typestd::basic_ios<char>::int_type
, which is the same asstd::char_traits<char>::int_type
, which is anint
and not achar
.More important than that, the value returned in that
int
is not necessarily a simple cast fromchar
toint
but is the result of callingstd::char_traits<char>::to_int_type
on the next character in the stream orstd::char_traits<char>::eof()
(which is defined to beEOF
) if there is no character.Typically, this is all implemented in exactly the same way as
fgetc
casts the character to anunsigned char
and then to anint
for its return value so that you can distinguish all valid character values fromEOF
.If you store the return value of
std::cin.peek()
in achar
then there is the possiblity that reading a character with a positive value (say ÿ in a iso-8859-1 encoded file) will compare equal toEOF
.The pedantic thing to do would be.
This would probably be more usual:
Note that it is important to convert the character value correctly. If you perform a comparison like this:
if (ch == '\xff') ...
wherech
is anint
as above, you may not get the correct results. You need to usestd::char_traits<char>::to_char_type
onch
orstd::char_traits<char>::to_int_type
on the character constant to get a consistent result. (You are usually safe with members of the basic character set, though.)