返回包含数组的结构体
以下简单代码在 gcc 4.4.4 下出现段错误,
#include<stdio.h>
typedef struct Foo Foo;
struct Foo {
char f[25];
};
Foo foo(){
Foo f = {"Hello, World!"};
return f;
}
int main(){
printf("%s\n", foo().f);
}
将最后一行更改为
Foo f = foo(); printf("%s\n", f.f);
工作正常。使用 -std=c99
编译时,两个版本都可以工作。我是否只是调用未定义的行为,或者标准中的某些内容发生了更改,从而允许代码在 C99 下工作?为什么在C89下会崩溃?
The following simple code segfaults under gcc 4.4.4
#include<stdio.h>
typedef struct Foo Foo;
struct Foo {
char f[25];
};
Foo foo(){
Foo f = {"Hello, World!"};
return f;
}
int main(){
printf("%s\n", foo().f);
}
Changing the final line to
Foo f = foo(); printf("%s\n", f.f);
Works fine. Both versions work when compiled with -std=c99
. Am I simply invoking undefined behavior, or has something in the standard changed, which permits the code to work under C99? Why does is crash under C89?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我相信 C89/C90 和 C99 中的行为都未定义。
foo().f
是一个数组类型的表达式,具体为char[25]
。 C99 6.3.2.1p3 说:在这种特殊情况下(数组是函数返回的结构的元素),问题在于不存在“数组对象”。函数结果按值返回,因此调用
foo()
的结果是struct Foo
类型的 value,并且foo( ).f
是char[25]
类型的值(不是左值)。据我所知,这是 C(直到 C99)中唯一可以使用数组类型的非左值表达式的情况。我想说的是,尝试访问它的行为是由于遗漏而未定义的,可能是因为标准的作者(恕我直言,可以理解)没有考虑到这种情况。您可能会在不同的优化设置下看到不同的行为。
新的 2011 C 标准通过发明新的存储类来修补这个极端情况。 N1570(链接是 C11 之前的最新草案)说在 6.2.4p8 中:
因此程序的行为在 C11 中得到了很好的定义。不过,在您能够获得符合 C11 标准的编译器之前,最好的选择可能是将函数的结果存储在本地对象中(假设您的目标是工作代码而不是破坏编译器):
I believe the behavior is undefined both in C89/C90 and in C99.
foo().f
is an expression of array type, specificallychar[25]
. C99 6.3.2.1p3 says:The problem in this particular case (an array that's an element of a structure returned by a function) is that there is no "array object". Function results are returned by value, so the result of calling
foo()
is a value of typestruct Foo
, andfoo().f
is a value (not an lvalue) of typechar[25]
.This is, as far as I know, the only case in C (up to C99) where you can have a non-lvalue expression of array type. I'd say that the behavior of attempting to access it is undefined by omission, likely because the authors of the standard (understandably IMHO) didn't think of this case. You're likely to see different behaviors at different optimization settings.
The new 2011 C standard patches this corner case by inventing a new storage class. N1570 (the link is to a late pre-C11 draft) says in 6.2.4p8:
So the program's behavior is well defined in C11. Until you're able to get a C11-conforming compiler, though, your best bet is probably to store the result of the function in a local object (assuming your goal is working code rather than breaking compilers):
printf
有点有趣,因为它是需要 可变参数。因此,让我们通过编写一个辅助函数bar
来分解它。稍后我们将返回到printf
。(我使用“gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3”)
并调用它:
好的,这会产生错误。在 C 和 C++ 中,不允许按值传递数组。您可以通过将数组放入结构中来解决此限制,例如
void bar2(Foo f) {...}
但我们没有使用该解决方法 - 我们不允许通过按值在数组中。现在,您可能认为它应该衰减为
char*
,从而允许您通过引用传递数组。但只有当数组有地址(即是左值)时,衰减才有效。但是临时变量(例如函数的返回值)存在于一个没有地址的神奇土地中。因此您不能获取临时地址&
。简而言之,我们不允许获取临时地址,因此它不能衰减为指针。我们无法按值传递它(因为它是一个数组),也无法通过引用传递它(因为它是临时的)。我发现以下代码有效:
但说实话,我认为这是可疑的。这不是违反了我刚才列出的规则吗?
为了完整起见,这可以完美地工作:
变量
f
不是临时变量,因此我们可以(隐式地,在衰减期间)获取它的地址。printf、32 位与 64 位以及奇怪之处
我答应再次提及
printf
。根据上面的内容,它应该拒绝将 foo().f 传递给任何函数(包括 printf)。但 printf 很有趣,因为它是可变参数函数之一。 gcc 允许自己将数组按值传递给 printf。当我第一次编译并运行代码时,它处于 64 位模式。直到我用 32 位编译(
-m32
到 gcc)时,我的理论才得到证实。果然我遇到了段错误,就像原来的问题一样。 (在 64 位时,我得到了一些乱码输出,但没有段错误)。我实现了自己的
my_printf
(带有无意义的 vararg),它在尝试打印char*< 指向的字母之前打印了
char *
的实际值/代码>。我这样称呼它:这是我得到的输出(ideone 上的代码):
第一个指针值
0xffc14eb3
是正确的(它指向字符“Hello, world!”),但看看第二个0x6c6c6548
。这就是Hell
的 ASCII 代码(逆序 - 小字节序或类似的东西)。它已按值将数组复制到 printf 中,并且前四个字节已被解释为 32 位指针或整数。该指针没有指向任何合理的位置,因此当程序尝试访问该位置时会崩溃。我认为这违反了标准,仅仅是因为我们不应该被允许按值复制数组。
printf
is a bit funny, because it's one of those functions that takes varargs. So let's break it down by writing a helper functionbar
. We'll return toprintf
later.(I'm using "gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3")
and calling that instead:
OK, that gives an error. In C and C++, you are not allowed to pass an array by value. You can work around this limitation by putting the array inside a struct, for example
void bar2(Foo f) {...}
But we're not using that workaround - we're not allowed to pass in the array by value. Now, you might think it should decay to a
char*
, allowing you to pass the array by reference. But decay only works if the array has an address (i.e. is an lvalue). But temporaries, such as the return values from function, live in a magic land where they don't have an address. Therefore you can't take the address&
of a temporary. In short, we're not allowed to take the address of a temporary, and hence it can't decay to a pointer. We are unable to pass it by value (because it's an array), nor by reference (because it's a temporary).I found that the following code worked:
but to be honest I think that's suspect. Hasn't this broken the rules I just listed?
And just to be complete, this works perfectly as it should:
The variable
f
is not a temporary and hence we can (implicitly, during decay) takes its address.printf, 32-bit versus 64-bit, and weirdness
I promised to mention
printf
again. According to the above, it should refuse to pass foo().f to any function (including printf). But printf is funny because it's one of those vararg functions. gcc allowed itself to pass the array by value to the printf.When I first compiled and ran the code, it was in 64-bit mode. I didn't see confirmation of my theory until I compiled in 32-bit (
-m32
to gcc). Sure enough I got a segfault, as in the original question. (I had been getting some gibberish output, but no segfault, when in 64 bits).I implemented my own
my_printf
(with the vararg nonsense) which printed the actual value of thechar *
before trying to print the letters pointed at by thechar*
. I called it like so:and this is the output I got (code on ideone):
The first pointer value
0xffc14eb3
is correct (it points to the characters "Hello, world!"), but look at the second0x6c6c6548
. That's the ASCII codes forHell
(reverse order - little endianness or something like that). It has copied the array by value into printf and the first four bytes have been interpreted as a 32-bit pointer or integer. This pointer doesn't point anywhere sensible and hence the program crashes when it attempts to access that location.I think this is in violation of the standard, simply by virtue of the fact that we're not supposed to be allowed to copy arrays by value.
在 MacOS X 10.7.2 上,GCC/LLVM 4.2.1('i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1(基于 Apple Inc. build 5658)(LLVM build 2335.15.00)' )和 GCC 4.6.1(我构建的)编译代码而不发出警告(在
-Wall 下-Wextra
),在 32 位和 64 位模式下。程序全部运行,没有崩溃。这正是我所期望的;该代码对我来说看起来不错。也许 Ubuntu 上的问题是特定版本的 GCC 中的一个错误,现已修复?
On MacOS X 10.7.2, both GCC/LLVM 4.2.1 ('i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)') and GCC 4.6.1 (which I built) compile the code without warnings (under
-Wall -Wextra
), in both 32-bit and 64-bit modes. The programs all run without crashing. This is what I'd expect; the code looks fine to me.Maybe the problem on Ubuntu is a bug in the specific version of GCC that has since been fixed?