在非补码系统上，普通 char 通常/总是无符号的吗？

发布于 2024-11-10 03:54:16 字数 166 浏览 3 评论 0原文

显然，标准对此没有任何说明，但从实际/历史的角度来看，我更感兴趣：具有非补码算术的系统是否使用无符号的普通 char 类型？否则，您可能会遇到各种奇怪的情况，例如 null 终止符的两种表示形式，以及无法表示 char 中的所有“字节”值。这么奇怪的系统真的存在吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凹づ凸ル 2024-11-17 03:54:16

用于终止字符串的空字符永远不能有两种表示形式。它的定义如下（即使在 C90 中也是如此）：

基本执行字符集中应存在所有位都为0的字节，称为空字符

因此补码上的“负零”不会起作用。

也就是说，我真的对非补码 C 实现了解不多。我早在大学时就使用过补码机，但不太记得它了（即使我关心当时的标准，那也是在它存在之前）。

回复收藏 0 原文

粉红×色少女 2024-11-17 03:54:16

确实，在商业生产计算机的最初 10 或 20 年（20 世纪 50 年代和 60 年代），显然对于如何用二进制表示负数存在一些分歧。实际上有三个竞争者：

2的补码，它不仅赢得了战争，还使其他人灭绝了
1的补码，-x == ~x
符号幅度，-x = x ^ 0x80000000

我认为最后一个重要的补码机可能是 CDC-6600，当时，地球上最快的机器，也是第一台超级计算机的直接前身。^1.

不幸的是，您的问题无法真正得到回答，不是因为这里没有人知道答案:-)而是因为选择从来没有必要被制作。这实际上是出于两个原因：

二进制补码与字节机器同时接管。字节寻址随着 IBM System/360 的二进制补码而流行。以前的机器没有字节，只有完整的字才有地址。有时程序员会将字符打包在这些单词中，有时他们会使用整个单词。（字长从 12 位到 60 位不等。）
直到字节机和二进制补码过渡十年后，C 才被发明。第#1 项发生在 1960 年代，C 语言首次出现在 1970 年代的小型机器上，直到 1980 年代才占领世界。

因此，机器从未有过有符号字节、C 编译器和除二进制补码数据格式之外的其他东西。 null 终止字符串的想法可能是一个又一个汇编语言程序员反复发明的设计模式，但我不知道它是在 C 时代之前由编译器指定的。

无论如何，第一个实际标准化的 C("C89") 只是指定“附加一个值为零的字节或代码”，从上下文中可以清楚地看出它们我们试图独立于数字格式。所以，“+0”是一个理论上的答案，但它在实践中可能从未真正存在过。

^{1. 6600 是历史上最重要的机器之一，不仅仅是因为它速度快。它由 Seymour Cray 本人设计，引入了乱序执行和后来统称为“RISC”的各种其他元素。尽管其他人试图声称功劳，但 Seymour Cray 才是 RISC 架构的真正发明者。他发明了超级计算机，这是毫无争议的。事实上，很难说出一台过去的“超级计算机”不是他设计的。}

It's true, for the first 10 or 20 years of commercially produced computers (the 1950's and 60's) there were, apparently, some disagreements on how to represent negative numbers in binary. There were actually three contenders:

Two's complement, which not only won the war but also drove the others to extinction
One's complement, -x == ~x
Sign-magnitude, -x = x ^ 0x80000000

I think the last important ones-complement machine was probably the CDC-6600, at the time, the fastest machine on earth and the immediate predecessor of the first supercomputer.^1.

Unfortunately, your question cannot really be answered, not because no one here knows the answer :-) but because the choice never had to be made. And this was for actually two reasons:

Two's complement took over simultaneously with byte machines. Byte addressing hit the world with the twos-complement IBM System/360. Previous machines had no bytes, only complete words had addresses. Sometimes programmers would pack characters inside these words and sometimes they would just use the whole word. (Word length varied from 12 bits to 60.)
C was not invented until a decade after the byte machines and two's complement transition. Item #1 happened in the 1960's, C first appeared on small machines in the 1970's and did not take over the world until the 1980's.

So there simply never was a time when a machine had signed bytes, a C compiler, and something other than a twos-complement data format. The idea of null-terminated strings was probably a repeatedly-invented design pattern thought up by one assembly language programmer after another, but I don't know that it was specified by a compiler until the C era.

In any case, the first actually standardized C ("C89") simply specifies "a byte or code of value zero is appended" and it is clear from the context that they were trying to be number-format independent. So, "+0" is a theoretical answer, but it may never really have existed in practice.

^{1. The 6600 was one of the most important machines historically, and not just because it was fast. Designed by Seymour Cray himself, it introduced out-of-order execution and various other elements later collectively called "RISC". Although others tried to claimed credit, Seymour Cray is the real inventor of the RISC architecture. There is no dispute that he invented the supercomputer. It's actually hard to name a past "supercomputer" that he didn't design.}

回复收藏 0 原文

暖树树初阳… 2024-11-17 03:54:16

我相信系统几乎但不太可能拥有补码“char”类型，但是有四个问题无法全部解决：

每个数据类型都必须可表示为 char 序列，这样如果包含两个对象的所有 char 值比较相同，则包含相关数据对象将相同。
同样，每种数据类型都必须可表示为“unsigned char”序列。
任何数据类型都可以分解成的 unsigned char 值必须形成一个组，其阶数是 2 的幂。
我不认为该标准允许补码机器对负零值进行特殊处理并使其表现得像其他东西。

如果获得负零的唯一方法是覆盖某些其他数据类型，并且如果负零与正数相比不等于，则可能有一个具有补码或符号量值“char”类型的符合标准的机器零。我不确定这是否符合标准。

编辑

顺便说一句，如果放宽要求#2，我想知道将其他数据类型覆盖到“char”上时的确切要求是什么？除此之外，虽然该标准非常清楚地表明，必须能够对可能因将另一个变量覆盖到“char”上而产生的任何“char”值执行赋值和比较，但我不知道它强加了任何要求所有这些值必须表现为算术组。例如，我想知道如果一台机器的合法性是什么，其中每个内存位置物理存储为 66 位，前两位指示该值是否是 64 位整数、32 位内存句柄加上 32 位整数。位偏移量，还是 64 位双精度浮点数？由于该标准允许实现在算术计算超出有符号类型的范围时执行任何操作，因此这表明有符号类型不一定必须表现为一个组。

对于大多数有符号类型，不要求该类型不能表示在 Limits.h 中指定的范围之外的任何数字；如果 limit.h 指定最小“int”为 -32767，那么实现实际上允许 -32768 的值是完全合法的，因为任何尝试这样做的程序都会调用未定义的行为。关键问题可能是，由某些其他类型的覆盖产生的“char”值是否合法，以产生超出limits.h中指定范围的值。我想知道标准是怎么说的？

I believe it would be almost but not quite possible for a system to have a one's-complement 'char' type, but there are four problems which cannot all be resolved:

Every data type must be representable as a sequence of char, such that if all the char values comprising two objects compare identical, the data objects containing in question will be identical.
Every data type must likewise be representable as a sequence of 'unsigned char'.
The unsigned char values into which any data type can be decomposed must form a group whose order is a power of two.
I don't believe the standard permits a one's-complement machine to special-case the value that would be negative zero and make it behave as something else.

It might be possible to have a standards-compliant machine with a one's-complement or sign-magnitude "char" type if the only way to get a negative zero would be by overlaying some other data type, and if negative zero compared unequal to positive zero. I'm not sure if that could be standards-compliant or not.

EDIT

BTW, if requirement #2 were relaxed, I wonder what the exact requirements would be when overlaying other data types onto 'char'? Among other things, while the standard makes it abundantly clear that one must be able to perform assignments and comparisons on any 'char' values that may result from overlaying another variable onto a 'char', I don't know that it imposes any requirement that all such values must behave as an arithmetic group. For example, I wonder what the legality would be of a machine in which every memory location was physically stored as 66 bits, with the top two bits indicating whether the value was a 64-bit integer, a 32-bit memory handle plus a 32-bit offset, or a 64-bit double-precision floating-point number? Since the standard allows implementations to do anything they like when an arithmetic computation exceeds the range of a signed type, that would suggest that signed types do not necessarily have to behave as a group.

For most signed types, there's no requirement that the type be unable to represent any numbers outside the range specified in limits.h; if limits.h specifies that the minimum "int" is -32767, then it would be perfectly legitimate for an implementation to in fact allow a value of -32768 since any program that tried to do so would invoke Undefined Behavior. The key question would probably be whether it would be legitimate for a 'char' value resulting from the overlay of some other type to yield a value outside the range specified in limits.h. I wonder what the standard says?

回复收藏 0 原文

~没有更多了~