一个字符被签名意味着什么？

发布于 2024-07-11 05:42:51 字数 147 浏览 8 评论 0原文

鉴于有符号和无符号整数使用相同的寄存器等，并且只是以不同的方式解释位模式，并且 C 字符基本上只是 8 位整数，那么 C 中的有符号和无符号字符之间有什么区别？我知道 char 的符号是实现定义的，我根本无法理解它如何产生影响，至少当 char 用于保存字符串而不是做数学时。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

水中月 2024-07-18 05:42:52

符号在 char 中的工作方式与在其他整数类型中的工作方式几乎相同。正如您所注意到的，字符实际上只是一字节整数。（不一定是 8 位，但是！这是有区别的；在某些平台上，字节可能大于 8 位，并且由于char 和 sizeof(char) 的定义 CHAR_BIT 宏，在中定义或C++ 的会告诉您 char 中有多少位。）。

至于为什么你想要一个带有符号的字符：在 C 和 C++ 中，没有称为 byte 的标准类型。对于编译器来说，char 是字节，反之亦然，并且它不区分它们。不过，有时您希望 -- 有时您希望该 char 是一个单字节数字，在这些情况下（特别是一个字节的范围有多小）），您通常还关心该数字是否有签名。我个人使用有符号（或无符号）来表示某个 char 是（数字）“字节”而不是字符，并且它将以数字形式使用。如果没有指定的符号，该 char 实际上是一个字符，并且旨在用作文本。

相反，我曾经这样做过。现在，较新版本的 C 和 C++ 具有 (u?)int_least8_t （当前在或中进行类型定义） code>），它们是更明确的数字（尽管它们通常只是有符号和无符号 char 类型的 typedef）。

回复收藏 0 原文

海风掠过北极光 2024-07-18 05:42:52

我能想象这是一个问题的唯一情况是您选择对字符进行数学计算。编写以下代码是完全合法的。

char a = (char)42;
char b = (char)120;
char c = a + b;

根据字符的符号，c 可以是两个值之一。如果 char 是无符号的，则 c 将为 (char)162。如果它们被签名，那么它将出现溢出情况，因为签名 char 的最大值是 128。我猜大多数实现只会返回 (char)-32。

The only situation I can imagine this being an issue is if you choose to do math on chars. It's perfectly legal to write the following code.

char a = (char)42;
char b = (char)120;
char c = a + b;

Depending on the signedness of the char, c could be one of two values. If char's are unsigned then c will be (char)162. If they are signed then it will an overflow case as the max value for a signed char is 128. I'm guessing most implementations would just return (char)-32.

回复收藏 0 原文

戴着白色围巾的女孩 2024-07-18 05:42:52

关于签名字符的一件事是，您可以测试 c >= ' '（空格）并确保它是正常的可打印 ascii 字符。当然，它不便于携带，所以用处不大。

回复收藏 0 原文

烟花易冷人易散 2024-07-18 05:42:51

它不会对字符串产生影响。但在 C 中，你可以使用 char 来做数学，这会产生影响。

事实上，当在受限内存环境中工作时，例如嵌入式 8 位应用程序，通常会使用 char 来进行数学运算，这会产生很大的差异。这是因为 C 中默认没有 byte 类型。

回复收藏 0 原文

凉城 2024-07-18 05:42:51

就它们表示的值而言：

unsigned char：

跨越值范围 0..255 (00000000..11111111)
值在低边缘周围溢出为：
0 - 1 = 255 (00000000 - 00000001 = 11111111)
值在高边沿溢出为：
255 + 1 = 0 (11111111 + 00000001 = 00000000)
按位右移运算符 (>>) 进行逻辑移位：
<代码>10000000>> 1 = 01000000 (128 / 2 = 64)

有符号字符：

跨越值范围-128..127 (10000000..01111111)
值在低边缘周围溢出为：< /p>
-128 - 1 = 127 (10000000 - 00000001 = 01111111)
值在高边沿溢出，如下所示：
127 + 1 = -128 (01111111 + 00000001 = 10000000)
按位右移运算符 (>>) 进行算术移位：
<代码>10000000>> 1 = 11000000 (-128 / 2 = -64)

我包含了二进制表示形式，以表明值包装行为是纯粹的、一致的二进制算术，并且与有符号/无符号的 char 无关（除了正确的转变）。

更新

评论中提到的一些特定于实现的行为：

char！=signed char。没有“signed”或“unsinged”的类型“char”是实现定义的，这意味着它可以像有符号或无符号类型一样工作。
有符号整数溢出导致未定义的行为，程序可以执行任何操作，包括转储核心或溢出缓冲区。

回复收藏 0 原文

请别遗忘我 2024-07-18 05:42:51

#include <stdio.h>

int main(int argc, char** argv)
{
    char a = 'A';
    char b = 0xFF;
    signed char sa = 'A';
    signed char sb = 0xFF;
    unsigned char ua = 'A';
    unsigned char ub = 0xFF;
    printf("a > b: %s\n", a > b ? "true" : "false");
    printf("sa > sb: %s\n", sa > sb ? "true" : "false");
    printf("ua > ub: %s\n", ua > ub ? "true" : "false");
    return 0;
}


[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false

对字符串进行排序时这很重要。

#include <stdio.h>

int main(int argc, char** argv)
{
    char a = 'A';
    char b = 0xFF;
    signed char sa = 'A';
    signed char sb = 0xFF;
    unsigned char ua = 'A';
    unsigned char ub = 0xFF;
    printf("a > b: %s\n", a > b ? "true" : "false");
    printf("sa > sb: %s\n", sa > sb ? "true" : "false");
    printf("ua > ub: %s\n", ua > ub ? "true" : "false");
    return 0;
}


[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false

It's important when sorting strings.

回复收藏 0 原文

醉酒的小男人 2024-07-18 05:42:51

有一些区别。最重要的是，如果您通过分配太大或太小的整数来溢出 char 的有效范围，并且 char 是有符号的，则结果值是实现定义的，甚至可能会出现某些信号（在 C 中），对于所有有符号类型。与将太大或太小的值分配给无符号字符时的情况进行对比：值环绕，您将获得精确定义的语义。例如，将 -1 分配给无符号字符，您将得到 UCHAR_MAX。因此，每当您有一个从 0 到 2^CHAR_BIT 的数字中的字节时，您实际上应该使用 unsigned char 来存储它。

当传递给 vararg 函数时，符号也会产生影响：

char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);

假设分配给 c 的值对于 char 来说太大而无法表示，并且机器使用二进制补码。许多实现的行为是为字符分配太大的值，因为位模式不会改变。如果 int 能够表示 char 的所有值（对于大多数实现来说都是如此），那么在传递给 printf 之前 char 将被提升为 int。因此，传递的值将为负数。升级为 int 将保留该符号。所以你会得到一个负面的结果。但是，如果 char 是无符号的，则该值也是无符号的，并且提升为 int 将产生正 int。您可以使用 unsigned char，然后您将获得对变量赋值和传递给 printf 的精确定义的行为，然后 printf 将打印出正值。

请注意，char、unsigned 和signed char 都至少 8 位宽。不要求 char 恰好是 8 位宽。然而，对于大多数系统来说这是事实，但对于某些系统，您会发现它们使用 32 位字符。 C 和 C++ 中的字节被定义为 char 的大小，因此 C 中的字节也不总是恰好是 8 位。

另一个区别是，在 C 中，无符号字符必须没有填充位。也就是说，如果您发现 CHAR_BIT 为 8，则 unsigned char 的值必须在 0 .. 2^CHAR_BIT-1 范围内。如果 char 是无符号的，则同样如此。对于有符号字符，您不能假设任何有关值范围的信息，即使您知道编译器如何实现符号内容（二进制补码或其他选项），其中也可能存在未使用的填充位。在 C++ 中，所有三种字符类型都没有填充位。

There are a couple of difference. Most importantly, if you overflow the valid range of a char by assigning it a too big or small integer, and char is signed, the resulting value is implementation defined or even some signal (in C) could be risen, as for all signed types. Contrast that to the case when you assign something too big or small to an unsigned char: the value wraps around, you will get precisely defined semantics. For example, assigning a -1 to an unsigned char, you will get an UCHAR_MAX. So whenever you have a byte as in a number from 0 to 2^CHAR_BIT, you should really use unsigned char to store it.

The sign also makes a difference when passing to vararg functions:

char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);

Assume the value assigned to c would be too big for char to represent, and the machine uses two's complement. Many implementation behave for the case that you assign a too big value to the char, in that the bit-pattern won't change. If an int will be able to represent all values of char (which it is for most implementations), then the char is being promoted to int before passing to printf. So, the value of what is passed would be negative. Promoting to int would retain that sign. So you will get a negative result. However, if char is unsigned, then the value is unsigned, and promoting to an int will yield a positive int. You can use unsigned char, then you will get precisely defined behavior for both the assignment to the variable, and passing to printf which will then print something positive.

Note that a char, unsigned and signed char all are at least 8 bits wide. There is no requirement that char is exactly 8 bits wide. However, for most systems that's true, but for some, you will find they use 32bit chars. A byte in C and C++ is defined to have the size of char, so a byte in C also is not always exactly 8 bits.

Another difference is, that in C, a unsigned char must have no padding bits. That is, if you find CHAR_BIT is 8, then an unsigned char's values must range from 0 .. 2^CHAR_BIT-1. THe same is true for char if it's unsigned. For signed char, you can't assume anything about the range of values, even if you know how your compiler implements the sign stuff (two's complement or the other options), there may be unused padding bits in it. In C++, there are no padding bits for all three character types.

回复收藏 0 原文