C++本机类型 char 可以保存文件结尾字符吗?

发布于 2024-08-14 10:50:30 字数 162 浏览 5 评论 0原文

标题是非常不言自明的。

char c = std::cin.peek(); // sets c equal to character in stream

我刚刚意识到也许本机类型 char 无法保存 EOF。

谢谢, 核磁共振

The title is pretty self explanatory.

char c = std::cin.peek(); // sets c equal to character in stream

I just realized that perhaps native type char can't hold the EOF.

thanks,
nmr

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

前事休说 2024-08-21 10:50:31

简短回答:不。使用 int 而不是 char

答案稍长:不可以。如果您可以从函数中获取字符或值 EOF,例如 C 的 getchar 和 C++ 的 peek,显然,普通的 char 变量不足以同时保存所有有效字符EOF

更长的答案:这取决于情况,但它永远不会像你希望的那样起作用。

C 和 C++ 具有三种字符类型(“宽”类型除外):charsigned charunsigned char。普通 char 可以是有符号的,也可以是无符号的,这在编译器之间有所不同。

EOF 是一个负整数,通常为 -1,因此显然您不能将其存储在 unsigned char 或普通 char 中那是未签名的。假设您的系统使用 8 位字符(几乎所有系统都使用 8 位字符),EOF 将转换为(十进制)255,并且您的程序将无法运行。

但是如果你的char类型是有符号的,或者如果你使用signed char类型,那么是的,你可以在其中存储-1,所以是的,它可以保存 >EOF。但是当您从文件中读取代码为 255 的字符时会发生什么?它将被解释为 -1,即 EOF(假设您的实现使用 -1)。因此,您的代码不仅会在文件末尾停止读取,而且在找到 255 个字符时也会停止读取。

Short answer: No. Use int instead of char.

Slightly longer answer: No. If you can get either a character or the value EOF from a function, such as C's getchar and C++'s peek, clearly a normal char variable won't be enough to hold both all valid characters and the value EOF.

Even longer answer: It depends, but it will never work as you might hope.

C and C++ has three character types (except for the "wide" types): char, signed char and unsigned char. Plain char can be signed or unsigned, and this varies between compilers.

The value EOF is a negative integer, usually -1, so clearly you can't store it in an unsigned char or in a plain char that is unsigned. Assuming that your system uses 8-bit characters (which nearly all do), EOF will be converted to (decimal) 255, and your program will not work.

But if your char type is signed, or if you use the signed char type, then yes, you can store -1 in it, so yes, it can hold EOF. But what happens then when you read a character with code 255 from the file? It will be interpreted as -1, that is, EOF (assuming that your implementation uses -1). So your code will stop reading not just at the end of the file, but also as soon as it finds a 255 character.

渡你暖光 2024-08-21 10:50:31

注意,std::cin.peek()的返回值实际上是std::basic_ios::int_type类型,与相同>std::char_traits::int_type,它是 int 而不是 char

更重要的是, int 返回的值不一定是从 charint 的简单转换,而是调用 >std::char_traits::to_int_type 位于流中的下一个字符或 std::char_traits::eof() (定义为 EOF) 如果没有字符。

通常,这一切的实现方式与 fgetc 将字符转换为 unsigned char,然后转换为 int 作为其返回值的方式完全相同这样您就可以区分 EOF 中的所有有效字符值。

如果将 std::cin.peek() 的返回值存储在 char 中,则有可能读取具有正值的字符(例如,在iso-8859-1 编码文件)将比较等于 EOF 。

迂腐的做法是。

typedef std::istream::traits_type traits_type;

traits_type::int_type ch;
traits_type::char_type c;

while (!traits_type::eq_int_type((ch = std::cin.peek()), traits_type::eof()))
{
    c = traits_type::to_char_type(ch);
    // ...
}

这可能更常见:

int ch;
char c;

while ((ch = std::cin.peek()) != EOF)
{
    c = std::iostream::traits_type::to_char_type(ch);
    // ...
}

请注意,正确转换字符值很重要。如果执行如下比较: if (ch == '\xff') ... 其中 ch 是上面的 int,您可能无法得到正确的结果。您需要在 ch 上使用 std::char_traits::to_char_type 或在 ch 上使用 std::char_traits::to_int_type字符常数以获得一致的结果。 (不过,使用基本字符集的成员通常是安全的。)

Note that the return value of std::cin.peek() is actually of type std::basic_ios<char>::int_type, which is the same as std::char_traits<char>::int_type, which is an int and not a char.

More important than that, the value returned in that int is not necessarily a simple cast from char to int but is the result of calling std::char_traits<char>::to_int_type on the next character in the stream or std::char_traits<char>::eof() (which is defined to be EOF) if there is no character.

Typically, this is all implemented in exactly the same way as fgetc casts the character to an unsigned char and then to an int for its return value so that you can distinguish all valid character values from EOF.

If you store the return value of std::cin.peek() in a char then there is the possiblity that reading a character with a positive value (say ÿ in a iso-8859-1 encoded file) will compare equal to EOF .

The pedantic thing to do would be.

typedef std::istream::traits_type traits_type;

traits_type::int_type ch;
traits_type::char_type c;

while (!traits_type::eq_int_type((ch = std::cin.peek()), traits_type::eof()))
{
    c = traits_type::to_char_type(ch);
    // ...
}

This would probably be more usual:

int ch;
char c;

while ((ch = std::cin.peek()) != EOF)
{
    c = std::iostream::traits_type::to_char_type(ch);
    // ...
}

Note that it is important to convert the character value correctly. If you perform a comparison like this: if (ch == '\xff') ... where ch is an int as above, you may not get the correct results. You need to use std::char_traits<char>::to_char_type on ch or std::char_traits<char>::to_int_type on the character constant to get a consistent result. (You are usually safe with members of the basic character set, though.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文