数据类型在计算机中到底是如何表示的?
我是一名正在阅读《K&R》的初级程序员,我觉得这本书假设了很多以前的知识。让我困惑的一个方面是内存中变量的实际表示,或者我应该说变量的存在。变量的数据类型到底指定了什么?我不太确定如何表达这个问题......但我会问几个问题,也许有人可以为我提供一个连贯的答案。
当使用 getchar() 时,我被告知最好使用类型“int”而不是类型“char”,因为“int”可以容纳更多的值,而“char”只能容纳 256 个值。由于我们可能需要变量来保存 EOF 值,因此我们需要超过 256 个字符,否则 EOF 值将与 256 个字符之一重叠。在我看来,我将其视为一堆空洞的盒子。有人能给我一个更好的代表吗?这些“盒子”有索引号吗?当 EOF 与 256 个可用值中的某个值重叠时,我们能否预测它将与哪个值重叠?
另外,这是否意味着只有当我们简单地手动为变量赋值时才可以使用数据类型“char”,例如 char c = 'a',当我们明确知道我们只有 256 种可能的 ASCII 时人物?
另外,“char”和“int”之间的实际重要区别是什么?如果我们可以使用“int”类型而不是“char”类型,为什么我们决定在某些时候使用其中一种类型?是为了节省“内存”(我使用引号,因为我实际上并不知道“内存”到底是如何工作的)。
最后,char类型的256个可用值到底是如何获得的呢?我读过一些关于模 2^n 的内容,其中 n = 8,但为什么会这样(与二进制有关?)。 “modulo 2^n”的模部分是什么意思(如果它与模算术有任何相关性,我看不到这种关系......)?
I'm a beginning programmer reading K&R, and I feel as if the book assumes a lot of previous knowledge. One aspect that confuses me is the actual representation, or should I say existence, of variables in memory. What exactly does a data type specify for a variable? I'm not too sure of how to word this question... but I'll ask a few questions and perhaps someone can come up with a coherent answer for me.
When using getchar(), I was told that it is better to use type "int" than type "char" due to the fact that "int" can hold more values while "char" can hold only 256 values. Since we may need the variable to hold the EOF value, we will need more than 256 or the EOF value will overlap with one of the 256 characters. In my mind, I view this as a bunch of boxes with empty holes. Could someone give me a better representation? Do these "boxes" have index numbers? When EOF overlaps with a value in the 256 available values, can we predict which value it will overlap with?
Also, does this mean that the data type "char" is only fine to use when we are simply assigning a value to a variable manually, such as char c = 'a', when we definitely know that we will only have 256 possible ASCII characters?
Also, what is the actual important difference between "char" and "int"? If we can use "int" type instead of "char" type, why do we decide to use one over the other at certain times? Is it to save "memory" (I use quotes as I do not actually how "memory" exactly works).
Lastly, how exactly is the 256 available values of type char obtained? I read something about modulo 2^n, where n = 8, but why does that work (something to do with binary?). What is the modulo portion of "modulo 2^n" mean (if it has any relevance to modular arithmetic, I can't see the relation...)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
很好的问题。 K&R 是在人们对计算机知之甚少的时代编写的,因此程序员对硬件的了解要多得多。每个程序员都应该熟悉这些东西,但是(可以理解)许多新手程序员并不熟悉。
在卡内基梅隆大学,他们开发了一门完整的课程来填补这一知识空白,我是这门课的助教。我推荐该课程的教科书:“计算机系统:程序员的视角”http://amzn.com/013034074X/
你的问题的答案比这里真正能涵盖的要长,但我将为你自己的研究提供一些简短的指导。
基本上,计算机以二进制(以 2 为基数的数字系统)(而不是以 10 为基数的十进制)存储所有信息,无论是在内存 (RAM) 中还是在磁盘上。一个二进制数字称为一位。计算机倾向于使用称为字节的 8 位块内存。
C 中的一个字符是一个字节。 int 通常是四个字节(尽管在不同的机器上可能不同)。因此一个 char 只能容纳 256 个可能的值,即 2^8。一个 int 可以容纳 2^32 个不同的值。
有关更多信息,请务必阅读这本书,或阅读一些维基百科页面:
最佳祝你好运!
根据要求更新有关模算术的信息:
首先,阅读模算术:http ://en.wikipedia.org/wiki/Modular_arithmetic
基本上,在二进制补码系统中,n 位数字实际上表示以 2^n 为模的整数的等价类。
如果这似乎使它变得更复杂而不是更复杂,那么需要知道的关键事情很简单:
因此,无符号字节(8 位数字)可以是 0 到 255。255 + 1 环绕为 0。255 + 2 最终为 1,依此类推。有符号字节可以是 -128 到 127。127 + 1 最终为 -128。 (!)127 + 2 最终为 -127,等等。
Great questions. K&R was written back in the days when there was a lot less to know about computers, and so programmers knew a lot more about the hardware. Every programmer ought to be familiar with this stuff, but (understandably) many beginning programmers aren't.
At Carnegie Mellon University they developed an entire course to fill in this gap in knowledge, which I was a TA for. I recommend the textbook for that class: "Computer Systems: A Programmer's Perspective" http://amzn.com/013034074X/
The answers to your questions are longer than can really be covered here, but I'll give you some brief pointers for your own research.
Basically, computers store all information--whether in memory (RAM) or on disk--in binary, a base-2 number system (as opposed to decimal, which is base 10). One binary digit is called a bit. Computers tend to work with memory in 8-bit chunks called bytes.
A char in C is one byte. An int is typically four bytes (although it can be different on different machines). So a char can hold only 256 possible values, 2^8. An int can hold 2^32 different values.
For more, definitely read the book, or read a few Wikipedia pages:
Best of luck!
UPDATE with info on modular arithmetic as requested:
First, read up on modular arithmetic: http://en.wikipedia.org/wiki/Modular_arithmetic
Basically, in a two's complement system, an n-bit number really represents an equivalence class of integers modulo 2^n.
If that seems to make it more complicated instead of less, then the key things to know are simply:
So, an unsigned byte (8-bit number) can be 0 to 255. 255 + 1 wraps around to 0. 255 + 2 ends up as 1, and so forth. A signed byte can be -128 to 127. 127 + 1 ends up as -128. (!) 127 + 2 ends up as -127, etc.
在机器级别,
int
和char
之间的区别仅在于编程语言为其分配的内存大小或字节数。在 C、IIRC 中,char
是 1 个字节,而int
是 4 个字节。如果您要“查看”机器本身内部的这些内容,您会看到每个内容都有一个位序列。能否将它们视为 int 或 char 取决于语言决定如何解释它们(这也是为什么可以在两种类型之间来回转换的原因)。这是因为 8 位有 2^8 种,即 256 种组合(因为一个位可以有两个可能的值),而 32 位有 2^32 种组合。 EOF 常量(由 C 定义)是一个负值,不在 0 到 255 的范围内。如果您尝试将此负值分配给 char(这将其 4 个字节压缩为 1),则高阶位将会丢失,并且您最终会得到一个与 EOF 不同的有效 char 值。这就是为什么您需要将其存储到 int 中并在转换为 char 之前进行检查。
是的,特别是因为在这种情况下您要分配一个字符文字。
最重要的是,您可以在语言级别选择
int
或char
,具体取决于您要将变量视为数字还是字母(要切换,您需要转换为其他类型)。如果您想要一个占用更少空间的整数值,您可以使用short int
(我认为是 2 个字节),或者如果您真的关心内存使用情况,您可以使用char
,尽管大多数情况下这是不必要的。编辑:这是一个链接 描述 C 中的不同数据类型以及可应用于它们的修饰符。有关尺寸和值范围,请参阅末尾的表格。
At the machine level, the difference between
int
andchar
is only the size, or number of bytes, of the memory allocated for it by the programming language. In C, IIRC, achar
is one byte while anint
is 4 bytes. If you were to "look" at these inside the machine itself, you would see a sequence of bits for each. Being able to treat them asint
orchar
depends on how the language decides to interpret them (this is also why its possible to convert back and forth between the two types).This is because there are 2^8, or 256 combinations of 8 bits (because a bit can have two possible values), whereas there are 2^32 combinations of 32 bits. The EOF constant (as defined by C) is a negative value, not falling within the range of 0 and 255. If you try to assign this negative value to a char (this squeezing its 4 bytes into 1), the higher-order bits will be lost and you will end up with a valid char value that is NOT the same as EOF. This is why you need to store it into an int and check before casting to a char.
Yes, especially since in that case you are assigning a character literal.
Most importantly, you would pick
int
orchar
at the language level depending on whether you wanted to treat the variable as a number or a letter (to switch, you would need to cast to the other type). If you wanted an integer value that took up less space, you could use ashort int
(which I believe is 2 bytes), or if you were REALLY concerned with memory usage you could use achar
, though mostly this is not necessary.Edit: here's a link describing the different data types in C and modifiers that can be applied to them. See the table at the end for sizes and value ranges.
基本上,系统内存是一系列巨大的位,每个位都可以“打开”或“关闭”。剩下的就是约定和解释。
首先,没有办法直接访问各个位;相反,它们被分组为字节,通常以 8 个为一组(有一些外来系统并非如此,但您现在可以忽略它),并且每个字节都有一个内存地址。因此,内存中的第一个字节的地址为 0,第二个字节的地址为 1,等等。8
位字节有 2^8 个可能的不同值,可以将其解释为 0 到 255 之间的数字(无符号字节),或者解释为-128 到 +127(有符号字节)之间的数字,或作为 ASCII 字符。根据 C 标准,
char
类型的变量的大小为 1 个字节。但字节对于很多东西来说太小了,因此定义了更大的其他类型(即它们由多个字节组成),并且 CPU 通过特殊的硬件构造支持这些不同的类型。如今,
int
通常为 4 个字节(尽管 C 标准没有指定它,并且 int 在不同系统上可以更小或更大),因为 4 个字节是 32 位,直到最近,这才是主流 CPU 支持的作为他们的“字大小”。因此
int
类型的变量有 4 个字节大。这意味着当它的内存地址为例如 1000 时,它实际上覆盖了地址 1000、1001、1002 和 1003 处的字节。在 C 中,也可以同时寻址这些单独的字节,这就是变量的方式可以重叠。作为旁注,大多数系统要求较大的类型“字对齐”,即它们的地址必须是字大小的倍数,因为这使硬件变得更容易。因此,int 变量不可能从地址 999 或地址 17 开始(但 1000 和 16 都可以)。
Basically, system memory is one huge series of bits, each of which can be either "on" or "off". The rest is conventions and interpretation.
First of all, there is no way to access individual bits directly; instead they are grouped into bytes, usually in groups of 8 (there are a few exotic systems where this is not the case, but you can ignore that for now), and each byte gets a memory address. So the first byte in memory has address 0, the second has address 1, etc.
A byte of 8 bits has 2^8 possible different values, which can be interpreted as a number between 0 and 255 (unsigned byte), or as a number between -128 and +127 (signed byte), or as an ASCII character. A variable of type
char
per C standard has a size of 1 byte.But bytes are too small for a lot of things, so other types have been defined that are larger (i.e. they consist of multiple bytes), and CPUs support these different types through special hardware constructs. An
int
is typically 4 bytes nowadays (though the C standard does not specify it and ints can be smaller or bigger on different systems) because 4 bytes are 32 bits, and until recently that was what mainstream CPUs supported as their "word size".So a variable of type
int
is 4 bytes large. That means when its memory address is e.g. 1000, then it actually covers the bytes at addresses 1000, 1001, 1002, and 1003. In C, it is possible to address those individual bytes as well at the same time, and that is how variables can overlap.As a sidenote, most systems require larger types to be "word-aligned", i.e. their addresses have to be multiples of the word size, because that makes things easier for the hardware. So it is not possible to have an int variable start at address 999, or address 17 (but 1000 and 16 are OK).
我不会完全回答你的问题,但我想帮助你理解变量,因为当我开始自己编程时,我在理解它们时也遇到了同样的问题。
目前,不必理会内存中变量的电子表示形式。将内存视为一个连续的 1 字节单元块,每个单元存储一个位模式(由 0 和 1 组成)。
仅通过查看内存,您无法确定其中的位代表什么!它们只是任意的 0 和 1 序列。是您指定了如何解释这些位模式!看一下这个例子:
您也可以编写以下内容:
在这两种情况下,变量 a、b 和 c 都存储在内存中的某个位置(并且您无法判断它们的类型)。现在,当编译器编译您的代码(即将您的程序翻译为机器指令)时,它确保将第一种情况下的“+”翻译为integer_add,第二种情况下翻译为float_add,因此CPU将解释位模式正确并执行您想要的操作。
变量类型就像眼镜,让CPU从不同的角度观察位模式。
I'm not going to completely answer Your question, but I would like to help You understand variables, as I had the same problems understanding them, when I began to program by myself.
For the moment, don't bother with the electronic representation of variables in memory. Think of memory as a continuous block of 1-byte-cells, each storing an bit-pattern (consisting of 0s and 1s).
By solely looking at the memory, You can't determine, what the bits in it represent! They are just arbitrary sequences of 0s and 1s. It is YOU, who specifies, HOW to interpret those bit patterns! Take a look at this example:
You could have written the following as well:
In both cases, the variables a, b and c are stored somewhere in the memory (and You can't tell their type). Now, when the compiler compiles Your code (that is translating Your program into machine instructions), it makes sure, to translate the "+" into integer_add in the first case and float_add in the second case, thus the CPU will interpret the bit patterns correctly and perform, what You desired.
Variable types are like glasses, that let the CPU look at a bit patterns from different perspectives.
你好,
为了更深入地了解,我强烈推荐 Charles Petzold 的优秀著作“Code< /a>”
它涵盖的内容比您所要求的要多,所有这些都可以让您更好地了解幕后实际发生的情况。
华泰
G'day,
To go deeper, I'd highly recommend Charles Petzold's excellent book "Code"
It covers more than what you ask, all of which leads to a better understanding of what's actually happening under the covers.
HTH
实际上,数据类型是一种抽象,它允许您的编程语言将某个地址上的几个字节视为某种数字类型。将数据类型视为一个镜头,让您将一段内存视为 int 或 float。事实上,对于计算机来说,这一切都只是位。
Really, datatypes are an abstraction that allows your programming language to treat a few bytes at some address as some kind of numeric type. Consider the data type as a lens that lets you see a piece of memory as an int, or a float. In reality, it's all just bits to the computer.
EOF
是一个“小负数”。char
类型可能是无符号的,这意味着它不能表示负值。MAX
是无符号类型可以容纳的最大值,则将-n
分配给此类类型相当于分配MAX - (n % MAX) + 1
到它。因此,要回答有关预测的具体问题,“是的,你可以”。例如,假设char
是无符号的,并且可以保存值0
到255
(含)。然后将-1
分配给 char 相当于将255 - 1 + 1 = 255
分配给它。鉴于上述情况,为了能够在
c
中存储EOF
,c
不能是char
类型。因此,我们使用int
,因为它可以存储“小的负值”。特别是,在 C 中,int
保证存储-32767
和+32767
范围内的值。这就是getchar()
返回int
的原因。如果您直接赋值,则 C 标准保证像
'a'
这样的表达式适合char
。请注意,在 C 中,'a'
是int
类型,而不是 char,但可以执行char c = 'a'
,因为'a'
能够适合char
类型。关于变量应该保存什么类型的问题,答案是:使用任何有意义的类型。例如,如果您正在计算或查看字符串长度,则数字只能大于或等于零。在这种情况下,您应该使用无符号类型。
size_t
就是这样一种类型。请注意,有时很难弄清楚数据的类型,即使是“专业人士”也可能会犯错误。例如,
gzip
格式将未压缩数据的大小存储在文件的最后 4 个字节中。这会破坏大文件> 4GB 大小,现在相当常见。您应该小心您的术语。在 C 中,
char c = 'a'
将与'a'
对应的整数值分配给c
,但不一定是 ASCII。这取决于您使用的编码。关于“模”部分和 char 类型的 256 个值:如果数据类型中有 n 个二进制位,则每个位可以编码 2 个值:0 和 1。因此,您有
2*2*2...*2
(n
次)可用值,即 2n。对于无符号类型,任何溢出都是明确定义的,就好像将数字除以(最大可能值+1),然后取余数。例如,假设unsigned char
可以存储值0..255
(总共 256 个值)。然后,将257
分配给unsigned char
基本上会将其除以 256,取余数 (1),并将该值分配给变量。但这种关系仅适用于无符号类型。请参阅我对另一个问题的回答了解更多。最后,您可以使用
char
数组从 C 语言的文件中读取数据,即使最终可能会遇到EOF
,因为 C 提供了其他检测EOF 的方法
无需显式地在变量中读取它,但稍后当您阅读了数组和指针后,您就会了解它(如果您对一个示例感到好奇,请参阅fgets()
) 。EOF
is a "small negative number".char
type may be unsigned, meaning that it cannot represent negative values.MAX
is the maximum value an unsigned type can hold, then assigning-n
to such a type is equivalent to assigningMAX - (n % MAX) + 1
to it. So, to answer your specific question about predicting, "yes you can". For example, let's saychar
is unsigned, and can hold values0
to255
inclusive. Then assigning-1
to a char is equivalent to assigning255 - 1 + 1 = 255
to it.Given the above, to be able to store
EOF
inc
,c
can't bechar
type. Thus, we useint
, because it can store "small negative values". Particularly, in C,int
is guaranteed to store values in the range-32767
and+32767
. That is whygetchar()
returnsint
.If you are assigning values directly, then the C standard guarantees that expressions like
'a'
will fit in achar
. Note that in C,'a'
is of typeint
, not char, but it's okay to dochar c = 'a'
, because'a'
is able to fit in achar
type.About your question as to what type a variable should hold, the answer is: use whatever type that makes sense. For example, if you're counting, or looking at string lengths, the numbers can only be greater than or equal to zero. In such cases, you should use an unsigned type.
size_t
is such a type.Note that it is sometimes hard to figure out the type of data, and even the "pros" may make mistakes.
gzip
format for example, stores the size of the uncompressed data in the last 4 bytes of a file. This breaks for huge files > 4GB in size, which are fairly common these days.You should be careful about your terminology. In C, a
char c = 'a'
assigns an integer value corresponding to'a'
toc
, but it need not be ASCII. It depends upon whatever encoding you happen to use.About the "modulo" portion, and 256 values of type
char
: if you haven
binary bits in a data type, each bit can encode 2 values: 0 and 1. So, you have2*2*2...*2
(n
times) available values, or 2n. For unsigned types, any overflow is well-defined, it is as if you divided the number by (the maximum possible value+1), and took the remainder. For example, let's sayunsigned char
can store values0..255
(256 total values). Then, assigning257
to anunsigned char
will basically divide it by 256, take the remainder (1), and assign that value to the variable. This relation holds true for unsigned types only though. See my answer to another question for more.Finally, you can use
char
arrays to read data from a file in C, even though you might end up hittingEOF
, because C provides other ways of detectingEOF
without having to read it in a variable explicitly, but you will learn about it later when you have read about arrays and pointers (seefgets()
if you're curious for one example).根据“stdio.h”,getchars() 返回值为 int,EOF 定义为 -1。
根据实际编码,0..255 之间的所有值都可能出现,对于 unsigned char 来说不足以表示 -1,因此使用 int。
这是一个很好的表格,其中包含详细信息 http://en.wikipedia.org/wiki/ISO/ IEC_8859
According to "stdio.h" getchars() return value is int and EOF is defined as -1.
Depending on the actual encoding all values between 0..255 can occur, there for unsigned char is not enough to represent the -1 and int is used.
Here is a nice table with detailed information http://en.wikipedia.org/wiki/ISO/IEC_8859
K&R的美妙之处在于它的简洁性和可读性,作家总是要为自己的目标做出让步;它不是一本 2000 页的参考手册,而是作为基本参考和学习一般语言的绝佳方法。我推荐 Harbinson 和 Steele 的《C: A Reference Manual》,这是一本优秀的 C 参考书,详细介绍了详细信息,当然还有 C 标准。
你需要愿意用谷歌搜索这些东西。变量在内存中的特定位置表示,并且对于它们所属的给定范围内的程序来说是已知的。字符通常存储在 8 位内存中(在某些罕见的平台上,这不一定是真的)。 2^8 代表变量的 256 种不同可能性。不同的CPU/编译器/等代表不同大小的基本类型int、long。我认为 C 标准可能会指定这些的最小尺寸,但不会指定最大尺寸。我认为对于 double 它至少指定 64 位,但这并不妨碍英特尔在浮点单元中使用 80 位。无论如何,32 位 intel 平台上内存的典型大小对于无符号/有符号 int 和浮点来说是 32 位(4 字节),对于 double 是 64 位(8 字节),对于 char(有符号/无符号)是 8 位。如果您确实对该主题感兴趣,您还应该查找内存对齐。您还可以通过使用“&”获取变量的地址来查看调试器中的确切布局。运算符,然后查看该地址。在查看内存中的值时,英特尔平台可能会让您有点困惑,因此请同时查找小端/大端。我确信堆栈溢出对此也有一些很好的总结。
The beauty of K&R is it's conciseness and readability, writers always have to make concessions for their goals; rather than being a 2000 page reference manual it serves as a basic reference and an excellent way to learn the language in general. I recommend Harbinson and Steele "C: A Reference Manual" for an excellent C reference book for details, and the C standard of course.
You need to be willing to google this stuff. Variables are represented in memory at specific locations and are known to the program of which they are a part of within a given scope. A char will typically be stored in 8 bits of memory (on some rare platforms this isn't necessarily true). 2^8 represents 256 distinct posibilities for variables. Different CPU/compilers/etc represent the basic types int, long of varying sizes. I think the C standard might specify minimum sizes for these, but not maximum sizes. I think for double it specifies at least 64 bits, but this doesn't preclude intel from using 80 bits in a floating point unit. Anyway, typical sizes in memory on 32bit intel platforms would be 32 bits (4 bytes) for unsigned/signed int and float, 64 bits (8 bytes) for double, 8 bits for char (signed/unsigned). You should also look up memory alignment if you are really interested on the topic. You can also at the exact layout in your debugger by getting the address of your variable with the "&" operator and then peeking at that address. Intel platforms may confuse you a little when looking at values in memory so please look up little endian/big endian as well. I am sure stack overflow has some good summaries of this as well.
语言中所需的所有字符均由 ASCII 和扩展 ASCII 表示。所以没有超出扩展 ASCII 的字符。
使用 char 时,由于直接存储字符,因此有可能获得垃圾值;而使用 int 时,由于存储的是字符的 ASCII 值,因此出现垃圾值的可能性较小。
All of the characters needed in a language are respresented by ASCII and Extended ASCII. So there is no character beyond the Extended ASCII.
While using char, there is probability of getting garbage value as it directly stores the character but using int, there is less probability of it as it stores the ASCII value of the character.
对于关于模数的最后一个问题:
将模视为一个时钟,添加小时最终会导致您从 0 开始。每一步添加一个小时,您就会从 00:00 到 01:00 到 02:00 到 03:00 到 ... 到 23 :00,然后再加一小时回到 00:00。 “环绕”或“翻转”称为模,在本例中为模 24。
使用模时,永远不会达到最大数字;一旦您“达到”该数字,该数字就会回到开头(在时间示例中,24:00 实际上是 00:00)。
另一个例子,现代人类的计数系统是基数 10(即十进制),其中有数字 0 到 9。我们没有代表值 10 的单数。我们需要两个数字来存储 10。
假设我们只有一个一位数加法器,其输出只能存储一位数。我们可以将任意两个个位数相加,例如
1+2
或5+4
。1+2=3
,正如预期的那样。5+4=9
,正如预期的那样。但是如果我们加上 5+5 或 9+1 或 9+9 会发生什么呢?为了计算5+5
,我们的机器计算10
,但由于缺乏内存能力,它无法存储1
,因此计算机将1
视为“溢出数字”并将其丢弃,仅将0
存储为结果。因此,查看计算5+5
的输出,您会看到结果是0
,这可能不是您所期望的。要计算9+9
,您的个位数加法机将正确计算18
,但是,由于最多存储一位数字的硬件内存限制,它不会再计算出来。没有能力存储1
,因此它会丢弃它。然而,加法器可以存储8
,因此9+9
的结果会生成8
。您的个位数加法器对 10 进行模运算请注意,即使结果应该为 10 或更大,输出中也永远无法达到数字 10。同样的问题也出现在二进制中,但模值不同。顺便说一句,这种“溢出”问题在乘法中尤其严重,因为您需要最大输入长度的两倍来乘以两个数字(无论数字是二进制、十进制还是其他标准基数),并且所有结果的数字都完好无损。也就是说,如果您将一个 32 位数字乘以另一个 32 位数字,您的结果可能占用 64 位内存,而不是方便的 32 位内存!例如,999(3 位输入)乘以 999(3 位输入)= 998,001(6 位输出)。请注意,与输入之一的长度(位数)相比,输出需要双倍的存储位数。
回到二进制模,
编程语言 C 中的 char 被定义为最小的可访问单元(来源:我根据我被告知一百次的内容编写了它)。 AFAIK,单个字符始终是单个字节的长度。一个字节是8位,也就是说,一个字节是八个1和/或0的有序组。例如,11001010 是一个字节。同样,顺序很重要,这意味着 01 与 10 不同,就像基数 10 中的 312 与 321 不同一样。
您添加的每一位都会为您提供两倍的可能状态。对于 1 位,您有 2^1 = 2 种可能的状态 (0,1)。使用 2 位,您有 2^2 = 4 个状态 (00,01,10,11),使用 3 位,您有 2 个状态^3 = 8 个状态 (000,001,010,011,100,101,110,111)。使用 4 位,您有 2^4 = 16 个状态,等等。使用 8 位(字节的长度,也是字符的长度),您有 2^8< /strong> = 256 种可能的状态。
char 中可以存储的最大值是 255,因为 char 只有 8 位,这意味着您可以存储全 1 以获得最大值,即 11111111(bin) = 255(dec)。一旦您尝试存储更大的数字(例如加 1),我们就会遇到 1 位加法器示例中提到的相同溢出问题。 255+1 = 256 = 1 0000 0000(为了可读性添加了空格)。 256 需要 9 位来表示,但由于我们处理的是字符,所以只能存储低 8 位,因此最高有效位(位序列中唯一的 1)被截断,剩下 0000 0000 = 0. 我们可以向 char 添加任何数字,但生成的 char 始终介于值 0(所有位均为 0)和 255(所有位均为 1)之间。
由于可以存储的最大值为 255,因此我们可以说以 char 形式输出/结果的操作是 mod256(永远无法达到 256,但低于该值的所有值都可以(参见时钟示例))。即使我们给一个字符加上一百万,最终的结果也会在0到255之间(发生了很多截断之后)。如果您执行导致溢出的基本操作,编译器可能会向您发出警告,但不要依赖它。
我之前说过 char 最多可以存储 256 个值,从 0 到 255 - 这只是部分正确。您可能会注意到,当您尝试执行诸如
char a = 255;
之类的操作时,会得到奇怪的数字。将数字作为整数打印出来 (char a = 128; printf("%d" ,a);
) 应该告诉您结果是 -128。计算机是否认为我无意中添加了负数?不会。发生这种情况是因为 char 自然有符号,这意味着它可以为负数。 128 实际上是溢出,因为 0 到 255 的范围大致分为两半,从 -128 到 +127。最大值为 +127,然后加 1 达到 128(这使其恰好溢出 1)告诉我们char a = 128;
的结果数将是 char 可以得到的最小值存储,即-128。如果我们添加 2 而不是 1(就像我们尝试执行char a = 129;
),那么它将溢出 2,这意味着生成的 char 将存储 -127。非浮点数中的最大值将始终环绕到最小值。如果您选择在设置等于字面值(如 128 或 -5000)的变量时查看原始二进制文件,
对于带符号的非浮点数,当总数为负数时,最大的位值将被分配为 1,并且该位值将被指定为 1。被视为该典型位值的负版本。例如,-5(十进制)在二进制中将是 1xx...x(其中每个 x 是 0 或 1 的占位符)。另一个例子,无符号数的位值为 8,4,2,1,有符号数的位值为 -8,4,2,1,这意味着您现在有一个“负 8 的位”。
2 的补码在 + 和 - 值之间切换:翻转(即“补码”)所有位(即每个 1 翻转为 0,同时每个 0 翻转为 1) (例如,-12=-16+4=10100→01011)。翻转后,加值1(置值1)。 (例如,01011 + 1 = 01100 = 0+8+4+0+0 = +12)。摘要:翻转位,然后加 1。
使用2 的补码将二进制数转换为等价的有符号十进制数的示例:
如果您看到二进制数 1111,您可能会想,“哦,那是 8+4+2+1 = 15”。但是,您没有足够的信息来做出这样的假设。它可能是负数。如果你看到“(signed) 1111”,那么你仍然不知道这个数字,因为存在补码,但你可以假设它意味着“(signed 2's Complement) 1111”,即(-8)+ 4+2+1 = -1。相同的位序列 1111 可以解释为 -1 或 15,具体取决于其符号。这就是为什么
unsigned char
中的unsigned
关键字很重要。当您编写char
时,您就隐式地告诉计算机您想要一个带符号的字符。unsigned char
- 可以存储 0 到 255 之间的数字char
- 可以存储 -128 到 +127 之间的数字(相同的范围,但移动以允许负数)For your last question about modulo:
Think about modulo as a clock, where adding hours eventually results in you starting back at 0. Adding an hour for each step, you go from 00:00 to 01:00 to 02:00 to 03:00 to ... to 23:00 and then add one more hour to get back to 00:00. The "wrap-around" or "roll-over" is called modulo, and in this case is modulo 24.
With modulo, that largest number is never reached; as soon as you "reach" that number, the number wraps around to the beginning (24:00 is really 00:00 in the time example).
As another example, modern humanity's numbering system is Base 10 (i.e., Decimal), where we have digits 0 through 9. We don't have a singular digit that represents value 10. We need two digits to store 10.
Let's say we only have a one-digit adder, where the output can only store a single digit. We can add any two single-digit numbers together, like
1+2
, or5+4
.1+2=3
, as expected.5+4=9
, as expected. But what happens if we add 5+5 or 9+1 or 9+9? To calculate5+5
, our machine computes10
, but it can't store the1
due to its lack of memory capabilities, so the computer treats the1
as an "overflow digit" and throws it away, only storing the0
as the result. So, looking at your output for the computation5+5
, you see the result is0
, which probably isn't what you were expecting. To calculate9+9
, your single-digit adding machine would correctly calculate18
, but, it again, due to hardware memory limitations of storing a maximum of one digit, doesn't have the ability to store the1
, so it throws it away. The adder CAN however store the8
, so your result of9+9
produces8
.Your single-digit adder is modulo'ing by 10. Notice how you can never reach the number 10 in your output, even when your result should be 10 or bigger. The same issue occurs in binary, but with different modulo values.As an aside, this "overflow" issue is especially bad with multiplication since you need twice the length of your biggest input to multiply two numbers (whether the numbers are binary or decimal or some other standard base) with all the result's digits intact. I.e., if you're multiplying a 32-bit number by another 32-bit number, your result might take 64 bits of memory instead of a convenient 32! E.g., 999 (3 digit input) times 999 (3 digit input) = 998,001 (6 digit output). Notice how the output requires double the number of digits of storage compared to one of the inputs' lengths (number of digits).
Back to binary modulo,
A char in the programming language C is defined as the smallest accessible unit (source: I made it up based off what I've been told a hundred times). AFAIK, a single char is always the length of a single Byte. A byte is 8 bits, which is to say, a byte is an ordered group of eight 1s and/or 0s. E.g., 11001010 is a byte. Again, the order matters, meaning that 01 is not the same as 10, much like how 312 is not the same as 321 in Base 10.
Each bit you add gives you twice as many possible states. With 1 bit, you have 2^1 = 2 possible states (0,1). With 2 bits, you have 2^2 = 4 states (00,01,10,11), With 3 bits, you have 2^3 = 8 states (000,001,010,011,100,101,110,111). With 4 bits, you have 2^4 = 16 states, etc. With 8 bits (the length of a Byte, and also the length of a char), you have 2^8 = 256 possible states.
The largest value you can store in a char is 255 because a char has only 8 bits, meaning you can store all 1s to get the maximum value, which will be 11111111(bin) = 255(dec). As soon as you try to store a larger number, like by adding 1, we get the same overflow issue mentioned in the 1-digit adder example. 255+1 = 256 = 1 0000 0000 (spaces added for readability). 256 takes 9 bits to represent, but only the lower 8 bits can be stored since we're dealing with chars, so the most significant bit (the only 1 in the sequence of bits) gets truncated and we're left with 0000 0000 = 0. We could've added any number to the char, but the resulting char will always be between values 0 (all bits are 0s) and 255 (all bits are 1s).
Since a maximum value of 255 can be stored, we can say that operations that output/result in a char are mod256 (256 can never be reached, but everything below that value can (see the clock example)). Even if we add a million to a char, the final result will be between 0 and 255 (after a lot of truncation happens). Your compiler may give you a warning if you do a basic operation that causes overflow, but don't depend on it.
I said earlier that char can store up to 256 values, 0 through 255 - this is only partially true. You might notice that you get strange numbers when you try to do operations like
char a = 255;
.Printing out the number as an integer (char a = 128; printf("%d",a);
) should tell you that the result is -128. Did the computer think I added a negative by accident? No. This happened because a char is naturally signed, meaning it's able to be negative. 128 is actually overflow because the range of 0 to 255 is split roughly in half, from -128 to +127. The maximum value being +127, and then adding 1 to reach 128 (which makes it overflow by exactly 1) tells us that the resulting number ofchar a = 128;
will be the minimum value a char can store, which is -128. If we had added 2 instead of 1 (like if we tried to dochar a = 129;
), then it would overflow by 2, meaning the resulting char would have stored -127. The maximum value will always wrap around to the minimum value in non-floating point numbers.If you choose to look at the raw binary when setting variables equal to literal values like 128 or -5000
For signed non-floating-point numbers, the largest place value get assigned a 1 when the overall number is negative, and that place value gets treated as a negative version of that typical place value. E.g., -5 (Decimal) would be 1xx...x in binary (where each x is a placeholder for either a 0 or 1). As another example, instead of place values being 8,4,2,1 for an unsigned number, they become -8,4,2,1 for a signed number, meaning you now have a "negative 8's place".
2's Complement to switch between + and - values: Flip (i.e., "Complement") all bits (i.e., each 1 gets flipped to a 0, and, simultaneously, each 0 gets flipped to a 1)(e.g., -12 = -16 + 4 = 10100 -> 01011). After flipping, add value 1 (place value of 1). (e.g., 01011 + 1 = 01100 = 0+8+4+0+0 = +12). Summary: Flip bits, then add 1.
Examples of using 2's Complement to convert binary numbers into EQUIVALENT signed decimal numbers:
If you see the binary number 1111, you might think, "Oh, that's 8+4+2+1 = 15". However, you don't have enough information to assume that. It could be a negative number. If you see "(signed) 1111", then you still don't know the number for certain due to One's Complement existing, but you can assume it means "(signed 2's Complement) 1111", which would be (-8)+4+2+1 = -1. The same sequence of bits, 1111, can be interpreted as either -1 or 15, depending on its signedness. This is why the
unsigned
keyword inunsigned char
is important. When you writechar
, you are implicitly telling the computer that you want a signed char.unsigned char
- Can store numbers between 0 and 255char
- Can store number between -128 and +127 (same span, but shifted to allow negatives)