C 编译器内部如何处理数组和指针类型? ( int *a; 与 int a[]; )
我需要一位拥有权威来源的语言律师。
看一下下面的测试程序,它在 gcc 下编译干净:
#include <stdio.h>
void foo(int *a) {
a[98] = 0xFEADFACE;
}
void bar(int b[]) {
*(b+498) = 0xFEADFACE;
}
int main(int argc, char **argv) {
int a[100], b[500], *a_p;
*(a+99) = 0xDEADBEEF;
*(b+499) = *(a+99);
foo(a);
bar(b);
printf("a[98] == %X\na[99] == %X\n", a[98], a[99]);
printf("b[498] == %X\nb[499] == %X\n", b[498], b[499]);
a_p = a+98;
*a_p = 0xDEADFACE;
printf("a[98] == %X\na[99] == %X\n", a[98], a[99]);
}
它产生了我期望的输出:
anon@anon:~/study/test_code$ gcc arrayType.c -o arrayType
anon@anon:~/study/test_code$ ./arrayType
a[98] == FEADFACE
a[99] == DEADBEEF
b[498] == FEADFACE
b[499] == DEADBEEF
a[98] == DEADFACE
a[99] == DEADBEEF
a 和 b 是同一类型吗? int *a
在编译器内部是否被处理为与 int a[]
相同的类型?
从实际角度来看,int a[100]、b[500]、*a_p、b_a[];
似乎都是同一类型。我很难相信编译器会在上面示例中的各种情况下不断调整这些类型。我很高兴被证明是错误的。
有人可以明确而详细地为我解决这个问题吗?
I need a language lawyer with authoritative sources.
Take a look at the following test program which compiles cleanly under gcc:
#include <stdio.h>
void foo(int *a) {
a[98] = 0xFEADFACE;
}
void bar(int b[]) {
*(b+498) = 0xFEADFACE;
}
int main(int argc, char **argv) {
int a[100], b[500], *a_p;
*(a+99) = 0xDEADBEEF;
*(b+499) = *(a+99);
foo(a);
bar(b);
printf("a[98] == %X\na[99] == %X\n", a[98], a[99]);
printf("b[498] == %X\nb[499] == %X\n", b[498], b[499]);
a_p = a+98;
*a_p = 0xDEADFACE;
printf("a[98] == %X\na[99] == %X\n", a[98], a[99]);
}
It produces the output I expect:
anon@anon:~/study/test_code$ gcc arrayType.c -o arrayType
anon@anon:~/study/test_code$ ./arrayType
a[98] == FEADFACE
a[99] == DEADBEEF
b[498] == FEADFACE
b[499] == DEADBEEF
a[98] == DEADFACE
a[99] == DEADBEEF
Are a and b the same type? Is int *a
handled as the same type as int a[]
internally in the compiler?
From a practical point of view int a[100], b[500], *a_p, b_a[];
all seem to be the same type. It's hard for me to believe that the compiler is constantly adjusting these types in the various circumstances in my above example. I'm happy to be proven wrong.
Can someone settle this question for me definitively and in detail ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
来自
comp.lang.C
常见问题解答:给定声明
强调是我的。最大的区别似乎是,当它是一个指针时,会获取指针,而如果它是一个数组,则没有指针可获取。
From the
comp.lang.C
FAQ:Given declarations of
Emphasis is mine. The biggest difference seems to be that the pointer is fetched when it's a pointer, while there is no pointer to fetch if it's an array.
区别之一 -
int a[x][y]
和int **a
不可互换。http://www.lysator.liu.se/c/ c-faq/c-2.html
2.10:
One of the differences -
int a[x][y]
andint **a
are not interchangeable.http://www.lysator.liu.se/c/c-faq/c-2.html
2.10:
a 和 b 都是整数数组。 a[0] 不是包含内存地址的内存位置,它是包含 int 的内存位置。
数组和指针既不相同也不能互换。当表达式中出现的 array-of-T 类型的左值衰减(有三个例外)为指向其第一个元素的指针时,数组等价于指针iff;结果指针的类型是指向 T 的指针。当查看相关代码的汇编输出时,这一点变得很清楚。仅供参考,这三个例外是当数组是 sizeof 或 & 的操作数或字符数组的文字字符串初始值设定项时。
如果您想象这样的情况:
将产生可以如下表示的数据结构:
并意识到像 x[3] 这样的引用会根据 x 是指针还是数组而生成不同的代码。 a[3] 对于编译器来说意味着:从位置 a 开始,将 3 移过它并在那里获取 char。 p[3] 表示转到位置 p,取消引用那里的值,将 3 移过去并获取那里的 char。
a and b are both arrays of ints. a[0] is not a memory location containing a memory address, it is a memory location containing an int.
Arrays and pointers are neither identical nor interchangeable. Arrays are equivalent to pointers iff when an lvalue of type array-of-T which appears in an expression decays (with three exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-to-T. This becomes clear when looking at the assembly output for related code. The three exceptions, fyi, are when the array is an operand of sizeof or & or a literal string initializer for a character array.
If you would picture this:
would result in data structures which could be represented like this:
and realize that a reference like x[3] produces different code depending on whether x is a pointer or an array. a[3] for the compiler means: start at the location a and move three past it and fetch the char there. p[3] means go to the location p, dereference the value there, move three past it and fetch the char there.
来自 C 语言标准:
假设以下代码:
foo 被声明为 10 元素的 char 数组,所有元素都初始化为 0。 p 被声明为指向 char 的指针,并被初始化为指向 foo。
在该行中,
表达式 foo 的类型为“10-element array of char”;由于 foo 不是 sizeof 或 & 的操作数,也不是用于初始化数组的字符串文字,因此它的类型会隐式转换为“指向 char 的指针”,并设置为指向数组的第一个元素。该指针值被复制到p。
在这些行中,
表达式 foo 的类型为“10-element array of char”;由于 foo 不是 sizeof 或 & 的操作数,也不是用于初始化数组的字符串文字,因此它的类型会隐式转换为“指向 char 的指针”,并设置为指向数组的第一个元素。下标表达式被解释为“`*(foo + 0)”。
在行
foo 中,类型为“10-element array of char”,字符串文字“t”的类型为“2-element array of char”;因为两者都不是 sizeof 或 & 的操作数,并且虽然“t”是字符串文字,但它不用于初始化数组,两者都隐式转换为“指向 char 的指针”类型,并且传递指针值到 strcat()。
在该行中,
foo 的第一个实例被转换为指向 char 的指针,如上所述。 foo 的第二个实例是 & 的操作数;运算符,因此其类型不会转换为“指向 char 的指针”,而表达式“&foo”的类型是“指向 char 的 10 元素数组的指针”,或“char (<代码>*)[10]"。将其与表达式“&p”的类型类型进行比较,即“指向 char 的指针”或“char
**
”。 foo 的第三个实例是 sizeof 运算符的操作数,因此它的类型再次未转换,并且 sizeof 返回分配给数组的字节数。将其与 sizeof p 的结果进行比较,后者返回分配给指针的字节数。每当有人告诉您“数组只是一个指针”时,他们都会混淆上面引用的标准中的部分。数组不是指针,指针也不是数组;但是,在许多情况下,您可以将数组视为它是一个指针,也可以将指针视为它是一个数组。在第 6、7 和 8 行中,“p”可以替换为“foo”。但是,它们不能作为 sizeof 或 & 的操作数互换。
编辑:顺便说一句,作为函数参数,
和
是等价的。 “a[]”被解释为“
*
a”。请注意,这仅适用于函数参数。From the C language standard:
Assume the following code:
foo is declared as a 10-element array of char with all elements initialized to 0. p is declared as a pointer to char and is initialized to point to foo.
In the line
the expression foo has type "10-element array of char"; since foo is not an operand of either sizeof or &, and is not a string literal being used to initialize an array, its type is implicitly converted to "pointer to char" and is set to point to the first element of the array. This pointer value is copied to p.
In the lines
the expression foo has type "10-element array of char"; since foo is not an operand of either sizeof or &, and is not a string literal being used to initialize an array, its type is implicitly converted to "pointer to char" and is set to point to the first element of the array. The subscript expression is interpreted as "`*(foo + 0)".
In the line
foo has type "10-element array of char" and the string literal "t" has type "2-element array of char"; since neither is an operand of either sizeof or &, and while "t" is a string literal, it is not being used to initialize an array, both are implicitly converted to type "pointer to char", and the pointer values are passed to strcat().
In the line
the first instance of foo is converted to a pointer to char as described above. The second instance of foo is an operand of the & operator, so its type is not converted to "pointer to char", and the type of the expression "&foo" is "pointer to 10-element array of char", or "char (
*
)[10]". Compare this with type type of the expression "&p", which is "pointer to pointer to char", or "char**
". The third instance of foo is an operand of the sizeof operator, so again its type is not converted, and sizeof returns the number of bytes allocated to the array. Compare this with the result of sizeof p, which returns the number of bytes allocated to the pointer.Whenever anyone tells you "an array is just a pointer", they are garbling the section from the standard quoted above. An arrays are not pointers and pointers are not arrays; however, in many circumstances, you can treat an array as though it were a pointer and you can treat a pointer as though it were an array. "p" could be substituted for "foo" in lines 6, 7, and 8. However, they are not interchangeable as operands to sizeof or &.
Edit: btw, as function parameters,
and
are equivalent. "a[]" is interpreted as "
*
a". Note that this is only true for function parameters.看这里:
2.2:但我听说 char a[] 与 char *a 相同。
http://www.lysator.liu.se/c/ c-faq/c-2.html
Look here:
2.2: But I heard that char a[] was identical to char *a.
http://www.lysator.liu.se/c/c-faq/c-2.html
我同意 sepp2k 的回答和 Mark Rushakoff 的 comp.lang.c FAQ 引用。让我补充一下这两个声明之间的一些重要区别和一个常见陷阱。
当您将
a
定义为数组时(在函数参数以外的上下文中,这是一种特殊情况),您不能编写a = 0;
或者
一个++;
因为
a
不是左值(可以出现在赋值运算符左侧的值)。数组定义保留空间,而指针则不保留空间。因此,
sizeof(array)
将返回存储所有数组元素所需的内存空间(例如,在 32 位体系结构上,对于包含 10 个整数的数组,为 10 乘以 4 个字节),而sizeof(pointer)
将仅返回存储该指针所需的内存空间(例如 64 位体系结构中的 8 个字节)。当您在前面添加指针或添加数组声明时,事情肯定会有所不同。例如,
int **a
是一个指向整数的指针。通过将指针数组分配给行并使每个指针指向用于存储整数的内存,它可以用作二维数组(具有不同大小的行)。要访问a[2][3]
,编译器将获取a[2]
中的指针,然后将三个元素移过它指向的位置,以便访问价值。将此与b[10][20]
进行对比,b[10][20]是一个包含 10 个元素的数组,每个元素都是一个包含 20 个整数的数组。要访问b[2][3]
,编译器将通过将 2 乘以 20 个整数的大小并加上 3 个整数的大小来偏移数组内存区域的开头。最后,考虑一下这个陷阱。如果您在一个 C 文件
和另一个
C 文件中都存在,则这些文件将编译和链接而不会出现错误,但代码不会执行您可能期望的操作;它可能会因空指针分配而崩溃。原因是第二个文件中的a是一个指针,其值为第一个文件的
a[0]
内容,即最初为0。I agree with sepp2k's answer and Mark Rushakoff's comp.lang.c FAQ quote. Let me add some important differences between the two declarations and a common trap.
When you define
a
as an array (in a context other than a function's argument, which is a special case) you can't writea = 0;
or
a++;
because
a
is not an lvalue (a value that can appear on the left of an assignment operator).The array definition reserves space, whereas the pointer doesn't. Therefore,
sizeof(array)
will return the memory space needed for storing all the array's elements (for instance 10 times four bytes for an array of 10 integers on a 32-bit architecture), whereassizeof(pointer)
will only return the memory space required for storing that pointer (for instance 8 bytes in a 64-bit architecture).When you prepend pointer or append array declarations things definitely diverge. For instance,
int **a
is a pointer to a pointer to an integer. It can be used as a two-dimensional array (with rows of varying sizes) by allocating an array of pointers to the rows and making each one point to memory for storing integers. To accessa[2][3]
the compiler will fetch the pointer ina[2]
and then move three elements past the location it points to in order to access the value. Contrast this withb[10][20]
which is an array of 10 elements, each of which is an array of 20 integers. To accessb[2][3]
the compiler will offset the beginning of the array's memory area by multiplying 2 by the size of 20 integers and adding the size of 3 more integers.Finally, consider this trap. If you have in one C file
and in another
the files will compile and link without an error, but the code will not do what you might expect; it will probably crash with a null pointer assignment. The reason is that in the second file a is a pointer whose value is the contents of the first file's
a[0]
, i.e. initially 0.您的示例中有两个 a 和两个 b。
由于参数
a 和 b 的类型相同:指向 int 的指针。
因为变量
不是同一时间的。第一个是指针,第二个是数组。
数组行为
数组(无论是否为变量)在大多数情况下都会隐式转换
指向其第一个元素的指针中的上下文。 C 中的两个上下文
未完成的是 sizeof 的参数和
&
的参数;在C++中有一些更多与参考参数和模板相关。
我写了一个变量与否,因为转换不仅仅针对
变量,一些示例:
foo
是一个由 10 个 10 个整数的数组组成的数组。在大多数情况下,它将是转换为指向其第一个元素的指针,类型为 指向数组的指针
10 int。
foo[10]
是一个 10 个 int 的数组;在大多数情况下,它将是转换为指向其第一个元素的指针,类型为指向 int 的指针。
*bar
是一个 10 个 int 的数组;在大多数情况下,它将是转换为指向其第一个元素的指针,类型为 指向 int 的指针。
一些历史
在 C 的直接祖先 B 中,相当于
我们在当前 C 中编写的内容
,即它分配了内存并初始化了一个指向它的指针。有些人似乎有这样的误解,认为在 C 中仍然如此。
在 NB 中——当 C 不再是 B 但还没有被称为 C 时——曾经有一段时间
如果声明了一个指针
,但
具有当前的含义。功能参数的调整是
那个时代的残余。
There are two a's and two b's in your example.
As parameters
a and b are of the same type: pointer to int.
As variables
aren't of the same time. The first is a pointer, the second is an array.
Array behavior
An array (a variable or not) is converted implicitly in most of the
contexts in a pointer to its first element. The two contexts in C where it
is not done are as argument of sizeof and argument of
&
; in C++ there aresome more related to reference parameters and templates.
I wrote, a variable or not because the conversion is not done only for
variables, some examples:
foo
is an array of 10 arrays of 10 ints. In most context it will beconverted in a pointer to its first element, of type pointer to array of
10 int.
foo[10]
is an array of 10 int; In most context it will beconverted in a pointer to its first element, of type pointer to int.
*bar
is an array of 10 int; In most context it will beconverted in a pointer to its first element, of type pointer to int.
Some history
In B, the direct ancestor of C, the equivalent of
had the effect of what in current C we'd write
ie it allocated memory and initialized a pointer to it. Some people seem to have the misconception that it is still true in C.
In NB -- when C was no more B but not yet called C --, there was a time
were a pointer was declared
but
would have the current meaning. The adjustment in function parameter is a
remnant of that time.
是的。 [编辑:我应该澄清:函数 foo 的参数 a 与函数 bar 的参数 b 的类型相同。两者都是指向 int 的指针。 main 中的局部变量 a 与 int 中的局部变量 b 类型相同。两者都是整数数组(实际上它们不是同一类型,因为它们的大小不同。但两者都是数组)。]
通常不会。例外情况是,当您将
foo bar[]
写为函数的参数时(就像您在此处所做的那样),它会自动变为foo *bar
。然而,在声明非参数变量时,存在很大的差异。
Yes. [Edit: I should clarify: The parameter a of function foo is the same type as the parameter b to function bar. Both are pointers to int. The local variable a in main is the same type as the local variable b in int. Both are arrays of ints (well actually they're not the same type because they don't have the same size. But both are arrays).]
Usually not. The exception is when you write
foo bar[]
as a parameter to a function (like you did here), it automatically becomesfoo *bar
.When declaring non-parameter variables however there is a big difference.
不,他们不一样!一个是指向 int 的指针,另一个是 100 个 int 的数组。所以是的,它们是相同的!
好吧,我会尝试解释这种愚蠢的行为。
*a 和 a[100] 与您正在做的事情基本相同。但是如果我们详细查看编译器的内存处理逻辑,我们所说的是:
a[100]
编译器,我现在需要内存,而且我知道我需要 100,所以请确保我们拥有它!两者都是指针。您的代码可以以相同的方式对待它们,并随意践踏这些指针附近的内存。但是,a[100] 是编译时分配的指针的连续内存,而 *a 只分配指针,因为它不知道何时需要内存(运行时内存噩梦)。
那么,谁在乎,对吧?好吧,某些函数(例如
sizeof()
)需要关心。sizeof(a)
将为*a
和a[100]
返回不同的答案。而且功能上也会有所不同。在这个函数的情况下,编译器知道其中的区别,因此您也可以在代码中利用这一点,例如 for 循环、memcpy 等。继续尝试。这是一个很大的问题,但我在这里给出的答案是这样的。编译器知道这种微妙的差异,并且它会生成大多数时候看起来相同的代码,但在重要的时候却有所不同。由您来了解 *a 或 a[100] 对 cpiler 意味着什么,以及它在哪里会以不同的方式对待它。它们实际上可以相同,但并不相同。更糟糕的是,您可以通过调用类似的函数来改变整个游戏。
唷...像 c# 这样的托管代码现在如此热门,这有什么奇怪的吗?!
编辑: 我还应该补充一点,您可以执行
*a_p = X
,但请尝试使用您的一个数组来执行此操作!数组像指针一样使用内存,但不能移动或调整大小。像*a_p
这样的指针可以指向不同的东西。No, they are not the same! One is a pointer to an int, the other is an array of 100 ints. So yes, they are the same!
OK, I'll try to explain this stupidity.
*a and a[100] are basically the same for what you are doing. But if we look in detail at the memory handling logic for the compiler, what we are saying is:
*a
compiler, I need memory, but I'll tell you how much later, so chill for now!a[100]
compiler, I need memory now, and I know I need 100, so make sure we have it!Both are pointers. And your code can treat them the same and trample the memory near those pointers all you want. But,
a[100]
is continuous memory from the pointer allocated at compile time while *a only allocates the pointer because it doesnt know when you are going to need the memory (run time memory nightmares).So, Who Cares, right? Well, certain functions like
sizeof()
care.sizeof(a)
will return a different answer for*a
and fora[100]
. And this will be different in the functions too. In this functions case, the compiler knows the difference, so you can use this to your advantage in your code too, for loops, memcpy, etc. Go on, try.This is a huge question, but the answer I am giving here is this. The compiler knows the subtle difference, and it will produce code that will look the same most times, but different when it matters. It is up to you to find out what *a or a[100] means to the cimpiler and where it will treat it differently. They can be effectively the same, but they are not the same. And to make it worse, you can change the whole game by calling a function like you have.
Phew... Is it any wonder that managed code like c# is so hot right now?!
Edit: I should also add that you can do
*a_p = X
, but try to do that with one of your arrays! Arrays work with memory just like pointers, but they can't be moved or resized. Pointers like*a_p
can point at different things.我将对此进行简单的解释:
数组是一系列连续的
相同类型的存储位置
指针是单个存储位置的地址
获取数组的地址可以得出
其地址(即指向其的指针)
第一个元素。
数组的元素可以通过
指向数组第一个元素的指针。这
有效是因为下标运算符
[] 以某种方式在指针上定义
旨在促进这一点。
可以传递一个数组,其中
需要指针参数,并且它
会自动转换成
指向第一个元素的指针(尽管
这对于多个不是递归的
指针的级别,或
多维数组)。同样,这是设计使然。
因此,在许多情况下,同一段代码可以对数组和未分配为数组的连续内存块进行操作,因为数组和指向其第一个元素的指针之间有意存在特殊关系。然而,它们是不同的类型,并且在某些情况下它们的行为确实不同,例如,指向数组的指针与指向指针的指针完全不同。
这是最近的一个 SO 问题,涉及指针到数组与指针到指针问题: C 语言中“abc”和 {“abc”} 有什么区别?
I'll throw my hat into the ring for a simple explanation of this:
An array is a series of contiguous
storage locations for the same type
A pointer is the address of a single storage location
Taking the address of an array gives
the address of (i.e a pointer to) its
first element.
Elements of an array can be accessed through a
pointer to the array's first element. This
works because the subscript operator
[] is defined on pointers in a way
designed to facilitate this.
An array can be passed where a
pointer parameter is expected, and it
will be automatically converted into
a pointer-to-first-element (although
this is not recursive for multiple
levels of pointers, or
multi-dimensional arrays). Again, this is by design.
So, in many cases, the same piece of code can operate on arrays and contiguous blocks of memory that were not allocated as an array because of the intentionally special relationship between an array and a pointer to its first element. However they are distinct types, and they do behave differently in some circumstances, e.g. pointer-to-array is not at all the same as pointer-to-pointer.
Here's a recent SO question that touches on the pointer-to-array versus pointer-to-pointer issue: Whats the difference between "abc" and {"abc"} in C?
如果您有一个指向字符数组的指针(并且想要获取该数组的大小),则不能使用 sizeof(ptr) 而必须使用 strlen(ptr)+1!
If you have a pointer to a character array (and want to get the size of that array), you cannot use sizeof(ptr) but instead have to use strlen(ptr)+1!