返回包含数组的结构体

发布于 2024-12-25 09:31:51 字数 433 浏览 2 评论 0原文

以下简单代码在 gcc 4.4.4 下出现段错误,

#include<stdio.h>

typedef struct Foo Foo;
struct Foo {
    char f[25];
};

Foo foo(){
    Foo f = {"Hello, World!"};
    return f;
}

int main(){
    printf("%s\n", foo().f);
}

将最后一行更改为

 Foo f = foo(); printf("%s\n", f.f);

工作正常。使用 -std=c99 编译时,两个版本都可以工作。我是否只是调用未定义的行为,或者标准中的某些内容发生了更改,从而允许代码在 C99 下工作?为什么在C89下会崩溃?

The following simple code segfaults under gcc 4.4.4

#include<stdio.h>

typedef struct Foo Foo;
struct Foo {
    char f[25];
};

Foo foo(){
    Foo f = {"Hello, World!"};
    return f;
}

int main(){
    printf("%s\n", foo().f);
}

Changing the final line to

 Foo f = foo(); printf("%s\n", f.f);

Works fine. Both versions work when compiled with -std=c99. Am I simply invoking undefined behavior, or has something in the standard changed, which permits the code to work under C99? Why does is crash under C89?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

长梦不多时 2025-01-01 09:31:51

我相信 C89/C90 和 C99 中的行为都未定义。

foo().f 是一个数组类型的表达式,具体为char[25]C99 6.3.2.1p3 说:

除非它是 sizeof 运算符或一元运算符的操作数
& 运算符,或者是用于初始化数组的字符串文字,
类型为“array of type”的表达式被转换为
类型为“指向类型”的表达式,指向初始值
数组对象的元素并且不是左值。如果数组对象
有寄存器存储类,行为未定义。

在这种特殊情况下(数组是函数返回的结构的元素),问题在于不存在“数组对象”。函数结果按值返回,因此调用 foo() 的结果是 struct Foo 类型的 value,并且 foo( ).fchar[25] 类型的值(不是左值)。

据我所知,这是 C(直到 C99)中唯一可以使用数组类型的非左值表达式的情况。我想说的是,尝试访问它的行为是由于遗漏而未定义的,可能是因为标准的作者(恕我直言,可以理解)没有考虑到这种情况。您可能会在不同的优化设置下看到不同的行为。

新的 2011 C 标准通过发明新的存储类来修补这个极端情况。 N1570(链接是 C11 之前的最新草案)说在 6.2.4p8 中:

具有结构体或联合类型的非左值表达式,其中
结构或联合包含数组类型的成员(包括,
递归地,所有包含的结构和联合的成员)指的是
具有自动存储持续时间和临时生命周期的对象。
它的生命周期从表达式被求值并且它的初始值开始
value 是表达式的值。它的生命周期结束时
包含完整表达式或完整声明符的评估结束。
任何尝试修改具有临时生命周期的对象都会导致
未定义的行为。

因此程序的行为在 C11 中得到了很好的定义。不过,在您能够获得符合 C11 标准的编译器之前,最好的选择可能是将函数的结果存储在本地对象中(假设您的目标是工作代码而不是破坏编译器):

[...]
int main(void ) {
    struct Foo temp = foo();
    printf("%s\n", temp.f);
}

I believe the behavior is undefined both in C89/C90 and in C99.

foo().f is an expression of array type, specifically char[25]. C99 6.3.2.1p3 says:

Except when it is the operand of the sizeof operator or the unary
& operator, or is a string literal used to initialize an array, an
expression that has type "array of type" is converted to an
expression with type "pointer to type" that points to the initial
element of the array object and is not an lvalue. If the array object
has register storage class, the behavior is undefined.

The problem in this particular case (an array that's an element of a structure returned by a function) is that there is no "array object". Function results are returned by value, so the result of calling foo() is a value of type struct Foo, and foo().f is a value (not an lvalue) of type char[25].

This is, as far as I know, the only case in C (up to C99) where you can have a non-lvalue expression of array type. I'd say that the behavior of attempting to access it is undefined by omission, likely because the authors of the standard (understandably IMHO) didn't think of this case. You're likely to see different behaviors at different optimization settings.

The new 2011 C standard patches this corner case by inventing a new storage class. N1570 (the link is to a late pre-C11 draft) says in 6.2.4p8:

A non-lvalue expression with structure or union type, where the
structure or union contains a member with array type (including,
recursively, members of all contained structures and unions) refers to
an object with automatic storage duration and temporary lifetime.
Its lifetime begins when the expression is evaluated and its initial
value is the value of the expression. Its lifetime ends when the
evaluation of the containing full expression or full declarator ends.
Any attempt to modify an object with temporary lifetime results in
undefined behavior.

So the program's behavior is well defined in C11. Until you're able to get a C11-conforming compiler, though, your best bet is probably to store the result of the function in a local object (assuming your goal is working code rather than breaking compilers):

[...]
int main(void ) {
    struct Foo temp = foo();
    printf("%s\n", temp.f);
}
离旧人 2025-01-01 09:31:51

printf 有点有趣,因为它是需要 可变参数。因此,让我们通过编写一个辅助函数 bar 来分解它。稍后我们将返回到 printf

(我使用“gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3”)

void bar(const char *t) {
    printf("bar: %s\n", t);
}

并调用它:

bar(foo().f); // error: invalid use of non-lvalue array

好的,这会产生错误。在 C 和 C++ 中,不允许按值传递数组。您可以通过将数组放入结构中来解决此限制,例如 void bar2(Foo f) {...}

但我们没有使用该解决方法 - 我们不允许通过按值在数组中。现在,您可能认为它应该衰减为 char*,从而允许您通过引用传递数组。但只有当数组有地址(即是左值)时,衰减才有效。但是临时变量(例如函数的返回值)存在于一个没有地址的神奇土地中。因此您不能获取临时地址&。简而言之,我们不允许获取临时地址,因此它不能衰减为指针。我们无法按值传递它(因为它是一个数组),也无法通过引用传递它(因为它是临时的)。

我发现以下代码有效:

bar(&(foo().f[0]));

但说实话,我认为这是可疑的。这不是违反了我刚才列出的规则吗?

为了完整起见,这可以完美地工作:

Foo f = foo();
bar(f.f);

变量 f 不是临时变量,因此我们可以(隐式地,在衰减期间)获取它的地址。

printf、32 位与 64 位以及奇怪之处

我答应再次提及 printf。根据上面的内容,它应该拒绝将 foo().f 传递给任何函数(包括 printf)。但 printf 很有趣,因为它是可变参数函数之一。 gcc 允许自己将数组按值传递给 printf。

当我第一次编译并运行代码时,它处于 64 位模式。直到我用 32 位编译(-m32 到 gcc)时,我的理论才得到证实。果然我遇到了段错误,就像原来的问题一样。 (在 64 位时,我得到了一些乱码输出,但没有段错误)。

我实现了自己的 my_printf (带有无意义的 vararg),它在尝试打印 char*< 指向的字母之前打印了 char * 的实际值/代码>。我这样称呼它:

my_printf("%s\n", f.f);
my_printf("%s\n", foo().f);

这是我得到的输出(ideone 上的代码):

arg = 0xffc14eb3        // my_printf("%s\n", f.f); // worked fine
string = Hello, World!
arg = 0x6c6c6548        // my_printf("%s\n", foo().f); // it's about to crash!
Segmentation fault

第一个指针值0xffc14eb3 是正确的(它指向字符“Hello, world!”),但看看第二个 0x6c6c6548。这就是 Hell 的 ASCII 代码(逆序 - 小字节序或类似的东西)。它已按值将数组复制到 printf 中,并且前四个字节已被解释为 32 位指针或整数。该指针没有指向任何合理的位置,因此当程序尝试访问该位置时会崩溃。

我认为这违反了标准,仅仅是因为我们不应该被允许按值复制数组。

printf is a bit funny, because it's one of those functions that takes varargs. So let's break it down by writing a helper function bar. We'll return to printf later.

(I'm using "gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3")

void bar(const char *t) {
    printf("bar: %s\n", t);
}

and calling that instead:

bar(foo().f); // error: invalid use of non-lvalue array

OK, that gives an error. In C and C++, you are not allowed to pass an array by value. You can work around this limitation by putting the array inside a struct, for example void bar2(Foo f) {...}

But we're not using that workaround - we're not allowed to pass in the array by value. Now, you might think it should decay to a char*, allowing you to pass the array by reference. But decay only works if the array has an address (i.e. is an lvalue). But temporaries, such as the return values from function, live in a magic land where they don't have an address. Therefore you can't take the address & of a temporary. In short, we're not allowed to take the address of a temporary, and hence it can't decay to a pointer. We are unable to pass it by value (because it's an array), nor by reference (because it's a temporary).

I found that the following code worked:

bar(&(foo().f[0]));

but to be honest I think that's suspect. Hasn't this broken the rules I just listed?

And just to be complete, this works perfectly as it should:

Foo f = foo();
bar(f.f);

The variable f is not a temporary and hence we can (implicitly, during decay) takes its address.

printf, 32-bit versus 64-bit, and weirdness

I promised to mention printf again. According to the above, it should refuse to pass foo().f to any function (including printf). But printf is funny because it's one of those vararg functions. gcc allowed itself to pass the array by value to the printf.

When I first compiled and ran the code, it was in 64-bit mode. I didn't see confirmation of my theory until I compiled in 32-bit (-m32 to gcc). Sure enough I got a segfault, as in the original question. (I had been getting some gibberish output, but no segfault, when in 64 bits).

I implemented my own my_printf (with the vararg nonsense) which printed the actual value of the char * before trying to print the letters pointed at by the char*. I called it like so:

my_printf("%s\n", f.f);
my_printf("%s\n", foo().f);

and this is the output I got (code on ideone):

arg = 0xffc14eb3        // my_printf("%s\n", f.f); // worked fine
string = Hello, World!
arg = 0x6c6c6548        // my_printf("%s\n", foo().f); // it's about to crash!
Segmentation fault

The first pointer value 0xffc14eb3 is correct (it points to the characters "Hello, world!"), but look at the second 0x6c6c6548. That's the ASCII codes for Hell (reverse order - little endianness or something like that). It has copied the array by value into printf and the first four bytes have been interpreted as a 32-bit pointer or integer. This pointer doesn't point anywhere sensible and hence the program crashes when it attempts to access that location.

I think this is in violation of the standard, simply by virtue of the fact that we're not supposed to be allowed to copy arrays by value.

风蛊 2025-01-01 09:31:51

在 MacOS X 10.7.2 上,GCC/LLVM 4.2.1('i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1(基于 Apple Inc. build 5658)(LLVM build 2335.15.00)' )和 GCC 4.6.1(我构建的)编译代码而不发出警告(在 -Wall 下-Wextra),在 32 位和 64 位模式下。程序全部运行,没有崩溃。这正是我所期望的;该代码对我来说看起来不错。

也许 Ubuntu 上的问题是特定版本的 GCC 中的一个错误,现已修复?

On MacOS X 10.7.2, both GCC/LLVM 4.2.1 ('i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)') and GCC 4.6.1 (which I built) compile the code without warnings (under -Wall -Wextra), in both 32-bit and 64-bit modes. The programs all run without crashing. This is what I'd expect; the code looks fine to me.

Maybe the problem on Ubuntu is a bug in the specific version of GCC that has since been fixed?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文