C 编译器内部如何处理数组和指针类型？ ( int *a; 与 int a[]; )

发布于 2024-08-03 16:42:50 字数 1039 浏览 4 评论 0原文

我需要一位拥有权威来源的语言律师。

看一下下面的测试程序，它在 gcc 下编译干净：

#include <stdio.h>


void foo(int *a) {
    a[98] = 0xFEADFACE;
}

void bar(int b[]) {
    *(b+498) = 0xFEADFACE;
}

int main(int argc, char **argv) {

int a[100], b[500], *a_p;

*(a+99) = 0xDEADBEEF;
*(b+499) = *(a+99);

foo(a);
bar(b);

printf("a[98] == %X\na[99] == %X\n", a[98], a[99]);
printf("b[498] == %X\nb[499] == %X\n", b[498], b[499]);

a_p = a+98;
*a_p = 0xDEADFACE;

printf("a[98] == %X\na[99] == %X\n", a[98], a[99]);

}

它产生了我期望的输出：

anon@anon:~/study/test_code$ gcc arrayType.c -o arrayType
anon@anon:~/study/test_code$ ./arrayType 
a[98] == FEADFACE
a[99] == DEADBEEF
b[498] == FEADFACE
b[499] == DEADBEEF
a[98] == DEADFACE
a[99] == DEADBEEF

a 和 b 是同一类型吗？ int *a 在编译器内部是否被处理为与 int a[] 相同的类型？

从实际角度来看，int a[100]、b[500]、*a_p、b_a[];似乎都是同一类型。我很难相信编译器会在上面示例中的各种情况下不断调整这些类型。我很高兴被证明是错误的。

有人可以明确而详细地为我解决这个问题吗？

原文

I need a language lawyer with authoritative sources.

Take a look at the following test program which compiles cleanly under gcc:

#include <stdio.h>


void foo(int *a) {
    a[98] = 0xFEADFACE;
}

void bar(int b[]) {
    *(b+498) = 0xFEADFACE;
}

int main(int argc, char **argv) {

int a[100], b[500], *a_p;

*(a+99) = 0xDEADBEEF;
*(b+499) = *(a+99);

foo(a);
bar(b);

printf("a[98] == %X\na[99] == %X\n", a[98], a[99]);
printf("b[498] == %X\nb[499] == %X\n", b[498], b[499]);

a_p = a+98;
*a_p = 0xDEADFACE;

printf("a[98] == %X\na[99] == %X\n", a[98], a[99]);

}

It produces the output I expect:

anon@anon:~/study/test_code$ gcc arrayType.c -o arrayType
anon@anon:~/study/test_code$ ./arrayType 
a[98] == FEADFACE
a[99] == DEADBEEF
b[498] == FEADFACE
b[499] == DEADBEEF
a[98] == DEADFACE
a[99] == DEADBEEF

Are a and b the same type? Is int *a handled as the same type as int a[] internally in the compiler?

From a practical point of view int a[100], b[500], *a_p, b_a[]; all seem to be the same type. It's hard for me to believe that the compiler is constantly adjusting these types in the various circumstances in my above example. I'm happy to be proven wrong.

Can someone settle this question for me definitively and in detail ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

十年不长 2024-08-10 16:42:50

a 和 b 是同一类型吗？ int *a 在编译器内部是否被视为与 int a[] 相同的类型？

来自comp.lang.C常见问题解答：

...每当数组出现在表达式中时，编译器都会隐式生成一个指向数组第一个元素的指针，就像程序员编写了 &a[0] 一样。（例外情况是数组是 sizeof 或 & 运算符的操作数，或者是字符数组的字符串文字初始值设定项...）
...给定一个数组 a 和指针 p，a[i] 形式的表达式会导致数组按照上述规则衰减为指针，然后像下面的指针变量一样被加上下标表达式 p[i] （尽管最终的内存访问会有所不同......

给定声明

char a[] = "hello";
char *p = "world";

...当编译器看到表达式 a[3] 时，它会发出代码，从位置 a 开始，将 3 移过去，然后获取字符那里。当它看到表达式 p[3] 时，它会发出代码从位置 p 开始，获取那里的指针值，将指针加 3，最后获取字符指向。 换句话说，a[3] 位于名为 a 的对象（开头）之后的三个位置，而 p[3] > 距离 p 指向的对象过去三个位置。

强调是我的。最大的区别似乎是，当它是一个指针时，会获取指针，而如果它是一个数组，则没有指针可获取。

Are a and b the same type? Is int *a handled as the same type as int a[] internally in the compiler?

From the comp.lang.C FAQ:

... whenever an array appears in an expression, the compiler implicitly generates a pointer to the array's first element, just as if the programmer had written &a[0]. (The exceptions are when the array is the operand of a sizeof or & operator, or is a string literal initializer for a character array...)
... Given an array a and pointer p, an expression of the form a[i] causes the array to decay into a pointer, following the rule above, and then to be subscripted just as would be a pointer variable in the expression p[i] (although the eventual memory accesses will be different ...

Given declarations of

char a[] = "hello";
char *p = "world";

... when the compiler sees the expression a[3], it emits code to start at the location a, move three past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location p, fetch the pointer value there, add three to the pointer, and finally fetch the character pointed to. In other words, a[3] is three places past (the start of) the object named a, while p[3] is three places past the object pointed to by p.

Emphasis is mine. The biggest difference seems to be that the pointer is fetched when it's a pointer, while there is no pointer to fetch if it's an array.

回复收藏 0 原文

带上头具痛哭 2024-08-10 16:42:50

区别之一 - int a[x][y] 和 int **a 不可互换。

http://www.lysator.liu.se/c/ c-faq/c-2.html

2.10：

数组的数组（即 C 中的二维数组）会衰减为指向数组的指针，而不是指向指针的指针。

回复收藏 0 原文

梦里寻她 2024-08-10 16:42:50

a 和 b 都是整数数组。 a[0] 不是包含内存地址的内存位置，它是包含 int 的内存位置。

数组和指针既不相同也不能互换。当表达式中出现的 array-of-T 类型的左值衰减（有三个例外）为指向其第一个元素的指针时，数组等价于指针iff；结果指针的类型是指向 T 的指针。当查看相关代码的汇编输出时，这一点变得很清楚。仅供参考，这三个例外是当数组是 sizeof 或 & 的操作数或字符数组的文字字符串初始值设定项时。

如果您想象这样的情况：

char a[] = "hello";
char *p = "world";

将产生可以如下表示的数据结构：

   +---+---+---+---+---+---+
a: | h | e | l | l | o |\0 |
   +---+---+---+---+---+---+

   +-----+     +---+---+---+---+---+---+
p: |  *======> | w | o | r | l | d |\0 |
   +-----+     +---+---+---+---+---+---+

并意识到像 x[3] 这样的引用会根据 x 是指针还是数组而生成不同的代码。 a[3] 对于编译器来说意味着：从位置 a 开始，将 3 移过它并在那里获取 char。 p[3] 表示转到位置 p，取消引用那里的值，将 3 移过去并获取那里的 char。

a and b are both arrays of ints. a[0] is not a memory location containing a memory address, it is a memory location containing an int.

Arrays and pointers are neither identical nor interchangeable. Arrays are equivalent to pointers iff when an lvalue of type array-of-T which appears in an expression decays (with three exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-to-T. This becomes clear when looking at the assembly output for related code. The three exceptions, fyi, are when the array is an operand of sizeof or & or a literal string initializer for a character array.

If you would picture this:

char a[] = "hello";
char *p = "world";

would result in data structures which could be represented like this:

   +---+---+---+---+---+---+
a: | h | e | l | l | o |\0 |
   +---+---+---+---+---+---+

   +-----+     +---+---+---+---+---+---+
p: |  *======> | w | o | r | l | d |\0 |
   +-----+     +---+---+---+---+---+---+

and realize that a reference like x[3] produces different code depending on whether x is a pointer or an array. a[3] for the compiler means: start at the location a and move three past it and fetch the char there. p[3] means go to the location p, dereference the value there, move three past it and fetch the char there.

回复收藏 0 原文

往昔成烟 2024-08-10 16:42:50

来自 C 语言标准：

6.3.2.1.3 Except when it is the operand of the sizeof operator or the 
          unary & operator, or is a string literal used to initialize 
          an array, an expression that has type ‘‘array of type’’ is
          converted to an expression with type ‘‘pointer to type’’ that 
          points to the initial element of the array object and is not 
          an lvalue. If the array object has register storage class, the
          behavior is undefined.

假设以下代码：

#include <stdio.h>
#include <string.h>
int main(void)
{
  char foo[10] = {0};
  char *p = foo;
  foo[0] = 'b';
  *(foo + 1) = 'a';
  strcat(foo, "t");
  printf("foo = %s, &foo = %p, &p = %p, sizeof foo = %lu, sizeof p = %lu\n", 
    foo, &foo, &p, (unsigned long) sizeof foo, (unsigned long) sizeof p);
  return 0;
}

foo 被声明为 10 元素的 char 数组，所有元素都初始化为 0。 p 被声明为指向 char 的指针，并被初始化为指向 foo。

在该行中，

char *p = foo;

表达式 foo 的类型为“10-element array of char”；由于 foo 不是 sizeof 或 & 的操作数，也不是用于初始化数组的字符串文字，因此它的类型会隐式转换为“指向 char 的指针”，并设置为指向数组的第一个元素。该指针值被复制到p。

在这些行中，

foo[0] = 'b';
*(foo + 1) = 'a';

表达式 foo 的类型为“10-element array of char”；由于 foo 不是 sizeof 或 & 的操作数，也不是用于初始化数组的字符串文字，因此它的类型会隐式转换为“指向 char 的指针”，并设置为指向数组的第一个元素。下标表达式被解释为“`*(foo + 0)”。

在行

strcat(foo, "t");

foo 中，类型为“10-element array of char”，字符串文字“t”的类型为“2-element array of char”；因为两者都不是 sizeof 或 & 的操作数，并且虽然“t”是字符串文字，但它不用于初始化数组，两者都隐式转换为“指向 char 的指针”类型，并且传递指针值到 strcat()。

在该行中，

  printf("foo = %s, &foo = %p, &p = %p, sizeof foo = %lu, sizeof p = %lu\n", 
    foo, &foo, &p, (unsigned long) sizeof foo, (unsigned long) sizeof p);

foo 的第一个实例被转换为指向 char 的指针，如上所述。 foo 的第二个实例是 & 的操作数；运算符，因此其类型不会转换为“指向 char 的指针”，而表达式“&foo”的类型是“指向 char 的 10 元素数组的指针”，或“char (<代码>*)[10]"。将其与表达式“&p”的类型类型进行比较，即“指向 char 的指针”或“char **”。 foo 的第三个实例是 sizeof 运算符的操作数，因此它的类型再次未转换，并且 sizeof 返回分配给数组的字节数。将其与 sizeof p 的结果进行比较，后者返回分配给指针的字节数。

每当有人告诉您“数组只是一个指针”时，他们都会混淆上面引用的标准中的部分。数组不是指针，指针也不是数组；但是，在许多情况下，您可以将数组视为它是一个指针，也可以将指针视为它是一个数组。在第 6、7 和 8 行中，“p”可以替换为“foo”。但是，它们不能作为 sizeof 或 & 的操作数互换。

编辑：顺便说一句，作为函数参数，

void foo(int *a);

和

void foo(int a[]);

是等价的。 “a[]”被解释为“*a”。请注意，这仅适用于函数参数。

From the C language standard:

6.3.2.1.3 Except when it is the operand of the sizeof operator or the 
          unary & operator, or is a string literal used to initialize 
          an array, an expression that has type ‘‘array of type’’ is
          converted to an expression with type ‘‘pointer to type’’ that 
          points to the initial element of the array object and is not 
          an lvalue. If the array object has register storage class, the
          behavior is undefined.

Assume the following code:

#include <stdio.h>
#include <string.h>
int main(void)
{
  char foo[10] = {0};
  char *p = foo;
  foo[0] = 'b';
  *(foo + 1) = 'a';
  strcat(foo, "t");
  printf("foo = %s, &foo = %p, &p = %p, sizeof foo = %lu, sizeof p = %lu\n", 
    foo, &foo, &p, (unsigned long) sizeof foo, (unsigned long) sizeof p);
  return 0;
}

foo is declared as a 10-element array of char with all elements initialized to 0. p is declared as a pointer to char and is initialized to point to foo.

In the line

char *p = foo;

the expression foo has type "10-element array of char"; since foo is not an operand of either sizeof or &, and is not a string literal being used to initialize an array, its type is implicitly converted to "pointer to char" and is set to point to the first element of the array. This pointer value is copied to p.

In the lines

foo[0] = 'b';
*(foo + 1) = 'a';

In the line

strcat(foo, "t");

foo has type "10-element array of char" and the string literal "t" has type "2-element array of char"; since neither is an operand of either sizeof or &, and while "t" is a string literal, it is not being used to initialize an array, both are implicitly converted to type "pointer to char", and the pointer values are passed to strcat().

In the line

  printf("foo = %s, &foo = %p, &p = %p, sizeof foo = %lu, sizeof p = %lu\n", 
    foo, &foo, &p, (unsigned long) sizeof foo, (unsigned long) sizeof p);

the first instance of foo is converted to a pointer to char as described above. The second instance of foo is an operand of the & operator, so its type is not converted to "pointer to char", and the type of the expression "&foo" is "pointer to 10-element array of char", or "char (*)[10]". Compare this with type type of the expression "&p", which is "pointer to pointer to char", or "char **". The third instance of foo is an operand of the sizeof operator, so again its type is not converted, and sizeof returns the number of bytes allocated to the array. Compare this with the result of sizeof p, which returns the number of bytes allocated to the pointer.

Whenever anyone tells you "an array is just a pointer", they are garbling the section from the standard quoted above. An arrays are not pointers and pointers are not arrays; however, in many circumstances, you can treat an array as though it were a pointer and you can treat a pointer as though it were an array. "p" could be substituted for "foo" in lines 6, 7, and 8. However, they are not interchangeable as operands to sizeof or &.

Edit: btw, as function parameters,

void foo(int *a);

and

void foo(int a[]);

are equivalent. "a[]" is interpreted as "*a". Note that this is only true for function parameters.

回复收藏 0 原文

长安忆 2024-08-10 16:42:50

看这里：

2.2：但我听说 char a[] 与 char *a 相同。

http://www.lysator.liu.se/c/ c-faq/c-2.html

回复收藏 0 原文

匿名的好友 2024-08-10 16:42:50

我同意 sepp2k 的回答和 Mark Rushakoff 的 comp.lang.c FAQ 引用。让我补充一下这两个声明之间的一些重要区别和一个常见陷阱。

当您将 a 定义为数组时（在函数参数以外的上下文中，这是一种特殊情况），您不能编写
a = 0;
或者
一个++；
因为 a 不是左值（可以出现在赋值运算符左侧的值）。
数组定义保留空间，而指针则不保留空间。因此，sizeof(array) 将返回存储所有数组元素所需的内存空间（例如，在 32 位体系结构上，对于包含 10 个整数的数组，为 10 乘以 4 个字节），而 sizeof(pointer) 将仅返回存储该指针所需的内存空间（例如 64 位体系结构中的 8 个字节）。
当您在前面添加指针或添加数组声明时，事情肯定会有所不同。例如，int **a 是一个指向整数的指针。通过将指针数组分配给行并使每个指针指向用于存储整数的内存，它可以用作二维数组（具有不同大小的行）。要访问 a[2][3]，编译器将获取 a[2] 中的指针，然后将三个元素移过它指向的位置，以便访问价值。将此与 b[10][20] 进行对比，b[10][20]是一个包含 10 个元素的数组，每个元素都是一个包含 20 个整数的数组。要访问 b[2][3]，编译器将通过将 2 乘以 20 个整数的大小并加上 3 个整数的大小来偏移数组内存区域的开头。

最后，考虑一下这个陷阱。如果您在一个 C 文件

int a[10];

和另一个

extern int *a;
a[0] = 42;

C 文件中都存在，则这些文件将编译和链接而不会出现错误，但代码不会执行您可能期望的操作；它可能会因空指针分配而崩溃。原因是第二个文件中的a是一个指针，其值为第一个文件的a[0]内容，即最初为0。

I agree with sepp2k's answer and Mark Rushakoff's comp.lang.c FAQ quote. Let me add some important differences between the two declarations and a common trap.

When you define a as an array (in a context other than a function's argument, which is a special case) you can't write
a = 0;
or
a++;
because a is not an lvalue (a value that can appear on the left of an assignment operator).
The array definition reserves space, whereas the pointer doesn't. Therefore, sizeof(array) will return the memory space needed for storing all the array's elements (for instance 10 times four bytes for an array of 10 integers on a 32-bit architecture), whereas sizeof(pointer) will only return the memory space required for storing that pointer (for instance 8 bytes in a 64-bit architecture).
When you prepend pointer or append array declarations things definitely diverge. For instance, int **a is a pointer to a pointer to an integer. It can be used as a two-dimensional array (with rows of varying sizes) by allocating an array of pointers to the rows and making each one point to memory for storing integers. To access a[2][3] the compiler will fetch the pointer in a[2] and then move three elements past the location it points to in order to access the value. Contrast this with b[10][20] which is an array of 10 elements, each of which is an array of 20 integers. To access b[2][3] the compiler will offset the beginning of the array's memory area by multiplying 2 by the size of 20 integers and adding the size of 3 more integers.

Finally, consider this trap. If you have in one C file

int a[10];

and in another

extern int *a;
a[0] = 42;

the files will compile and link without an error, but the code will not do what you might expect; it will probably crash with a null pointer assignment. The reason is that in the second file a is a pointer whose value is the contents of the first file's a[0], i.e. initially 0.

回复收藏 0 原文

冷情妓 2024-08-10 16:42:50

您的示例中有两个 a 和两个 b。

由于参数

void foo(int *a) {
    a[98] = 0xFEADFACE;
}

void bar(int b[]) {
    *(b+498) = 0xFEADFACE;
}

a 和 b 的类型相同：指向 int 的指针。

因为变量

int *a;
int b[10];

不是同一时间的。第一个是指针，第二个是数组。

数组行为

数组（无论是否为变量）在大多数情况下都会隐式转换
指向其第一个元素的指针中的上下文。 C 中的两个上下文
未完成的是 sizeof 的参数和 & 的参数；在C++中有
一些更多与参考参数和模板相关。

我写了一个变量与否，因为转换不仅仅针对
变量，一些示例：

int foo[10][10];
int (*bar)[10];

foo 是一个由 10 个 10 个整数的数组组成的数组。在大多数情况下，它将是
转换为指向其第一个元素的指针，类型为指向数组的指针
10 int。
foo[10] 是一个 10 个 int 的数组；在大多数情况下，它将是
转换为指向其第一个元素的指针，类型为指向 int 的指针。
*bar 是一个 10 个 int 的数组；在大多数情况下，它将是
转换为指向其第一个元素的指针，类型为 指向 int 的指针。

一些历史

在 C 的直接祖先 B 中，相当于

int x[10];

我们在当前 C 中编写的内容

int _x[10];
int *x = &_x;

，即它分配了内存并初始化了一个指向它的指针。有些人似乎有这样的误解，认为在 C 中仍然如此。

在 NB 中——当 C 不再是 B 但还没有被称为 C 时——曾经有一段时间
如果声明了一个指针

int x[];

，但

int foo[10];

具有当前的含义。功能参数的调整是
那个时代的残余。

There are two a's and two b's in your example.

As parameters

void foo(int *a) {
    a[98] = 0xFEADFACE;
}

void bar(int b[]) {
    *(b+498) = 0xFEADFACE;
}

a and b are of the same type: pointer to int.

As variables

int *a;
int b[10];

aren't of the same time. The first is a pointer, the second is an array.

Array behavior

An array (a variable or not) is converted implicitly in most of the
contexts in a pointer to its first element. The two contexts in C where it
is not done are as argument of sizeof and argument of &; in C++ there are
some more related to reference parameters and templates.

I wrote, a variable or not because the conversion is not done only for
variables, some examples:

int foo[10][10];
int (*bar)[10];

foo is an array of 10 arrays of 10 ints. In most context it will be
converted in a pointer to its first element, of type pointer to array of
10 int.
foo[10] is an array of 10 int; In most context it will be
converted in a pointer to its first element, of type pointer to int.
*bar is an array of 10 int; In most context it will be
converted in a pointer to its first element, of type pointer to int.

Some history

In B, the direct ancestor of C, the equivalent of

int x[10];

had the effect of what in current C we'd write

int _x[10];
int *x = &_x;

ie it allocated memory and initialized a pointer to it. Some people seem to have the misconception that it is still true in C.

In NB -- when C was no more B but not yet called C --, there was a time
were a pointer was declared

int x[];

but

int foo[10];

would have the current meaning. The adjustment in function parameter is a
remnant of that time.

回复收藏 0 原文

少年亿悲伤 2024-08-10 16:42:50

a 和 b 是同一类型吗？

是的。 [编辑：我应该澄清：函数 foo 的参数 a 与函数 bar 的参数 b 的类型相同。两者都是指向 int 的指针。 main 中的局部变量 a 与 int 中的局部变量 b 类型相同。两者都是整数数组（实际上它们不是同一类型，因为它们的大小不同。但两者都是数组）。]

编译器内部将 int *a 处理为与 int a[] 相同的类型吗？

通常不会。例外情况是，当您将 foo bar[] 写为函数的参数时（就像您在此处所做的那样），它会自动变为 foo *bar。

然而，在声明非参数变量时，存在很大的差异。

int * a; /* pointer to int. points nowhere in paticular right now */
int b[10]; /* array of int. Memory for 10 ints has been allocated on the stack */
foo(a); /* calls foo with parameter `int*` */
foo(b); /* also calls foo with parameter `int*` because here the name b basically
           is a pointer to the first elment of the array */

Are a and b the same type?

Yes. [Edit: I should clarify: The parameter a of function foo is the same type as the parameter b to function bar. Both are pointers to int. The local variable a in main is the same type as the local variable b in int. Both are arrays of ints (well actually they're not the same type because they don't have the same size. But both are arrays).]

Is int *a handled as the same type as int a[] internally in the compiler?

Usually not. The exception is when you write foo bar[] as a parameter to a function (like you did here), it automatically becomes foo *bar.

When declaring non-parameter variables however there is a big difference.

int * a; /* pointer to int. points nowhere in paticular right now */
int b[10]; /* array of int. Memory for 10 ints has been allocated on the stack */
foo(a); /* calls foo with parameter `int*` */
foo(b); /* also calls foo with parameter `int*` because here the name b basically
           is a pointer to the first elment of the array */

回复收藏 0 原文

无所的.畏惧 2024-08-10 16:42:50

不，他们不一样！一个是指向 int 的指针，另一个是 100 个 int 的数组。所以是的，它们是相同的！

好吧，我会尝试解释这种愚蠢的行为。

*a 和 a[100] 与您正在做的事情基本相同。但是如果我们详细查看编译器的内存处理逻辑，我们所说的是：

*a 编译器，我需要内存，但稍后我会告诉你需要多少，所以现在冷静一下！
a[100] 编译器，我现在需要内存，而且我知道我需要 100，所以请确保我们拥有它！

两者都是指针。您的代码可以以相同的方式对待它们，并随意践踏这些指针附近的内存。但是，a[100] 是编译时分配的指针的连续内存，而 *a 只分配指针，因为它不知道何时需要内存（运行时内存噩梦）。

那么，谁在乎，对吧？好吧，某些函数（例如 sizeof()）需要关心。 sizeof(a) 将为 *a 和 a[100] 返回不同的答案。而且功能上也会有所不同。在这个函数的情况下，编译器知道其中的区别，因此您也可以在代码中利用这一点，例如 for 循环、memcpy 等。继续尝试。

这是一个很大的问题，但我在这里给出的答案是这样的。编译器知道这种微妙的差异，并且它会生成大多数时候看起来相同的代码，但在重要的时候却有所不同。由您来了解 *a 或 a[100] 对 cpiler 意味着什么，以及它在哪里会以不同的方式对待它。它们实际上可以相同，但并不相同。更糟糕的是，您可以通过调用类似的函数来改变整个游戏。

唷...像 c# 这样的托管代码现在如此热门，这有什么奇怪的吗？！

编辑： 我还应该补充一点，您可以执行*a_p = X，但请尝试使用您的一个数组来执行此操作！数组像指针一样使用内存，但不能移动或调整大小。像 *a_p 这样的指针可以指向不同的东西。