当前位置：文江博客话题详情

在 C99 中，f()+g() 是未定义还是只是未指定？

发布于 2024-09-27 23:15:21 字数 336 浏览 13 评论 0原文

我曾经认为在C99中，即使函数f和g的副作用干扰，并且虽然表达式f() + g()< /code> 不包含序列点，f 和 g 将包含一些序列点，因此行为将是未指定的：f() 将在 g() 之前调用，或 f() 之前的 g()。

我不再那么确定了。如果编译器内联函数（即使函数未声明为内联，编译器也可能决定这样做）然后重新排序指令会怎么样？可能会得到与上述两者不同的结果吗？换句话说，这是未定义的行为吗？

这并不是因为我打算写这种东西，这是为了在静态分析器中为这样的语句选择最好的标签。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

魄砕の薆 2024-10-04 23:15:21

表达式f() + g()至少包含4个序列点；调用 f() 之前的一个（在评估其所有零个参数之后）； 1 在调用 g() 之前（在评估其所有零个参数之后）； 1 作为对 f() 的调用返回；一个是对 g() 的调用返回。此外，与f()关联的两个序列点出现在与g()关联的两个序列点之前或之后。您无法判断序列点将以何种顺序出现 - f 点是否出现在 g 点之前，反之亦然。

即使编译器内联了代码，它也必须遵守“好像”规则 - 代码的行为必须与函数未交错时相同。这限制了损坏的范围（假设编译器没有错误）。

因此，f() 和 g() 的求值顺序是未指定的。但其他一切都非常干净。

在评论中，supercat 询问：

我希望源代码中的函数调用仍保留为序列点，即使编译器自行决定内联它们。对于声明为“内联”的函数来说，这仍然如此，还是编译器获得了额外的自由度？

我相信“好像”规则适用，并且编译器没有额外的自由度来省略序列点，因为它使用显式的内联函数。认为（懒得在标准中寻找确切的措辞）的主要原因是编译器可以根据其规则内联或不内联函数，但程序的行为不应该改变（除了表现）。

此外，关于(a(),b()) + (c(),d())的排序可以说些什么？ c() 和/或 d() 是否可以在 a() 和 b() 之间执行>，或者让 a() 或 b() 在 c() 和 d() 之间执行？

显然，a 在 b 之前执行，c 在 d 之前执行。我相信 c 和 d 可以在 a 和 b 之间执行，尽管编译器不太可能生成这样的代码；类似地，a和b可以在c和d之间执行。尽管我在“c 和 d”中使用了“and”，但这可能是一个“or”——也就是说，这些操作序列中的任何一个都满足约束：
- 绝对允许
- abcd
- cdab
- 可能允许（保留 a ≺ b、c ≺ d 顺序）
- acbd
- acdb
- CADB
- 驾驶室
我相信这涵盖了所有可能的序列。另请参阅 Jonathan Leffler 和 AnArrayOfFunctions 之间的聊天 - 要点是AnArrayOfFunctions 认为根本不允许“可能允许”的序列。

如果这样的事情是可能的，那就意味着内联函数和宏之间存在显着差异。

内联函数和宏之间存在显着差异，但我不认为表达式中的顺序是其中之一。也就是说，任何函数 a、b、c 或 d 都可以用宏来替换，并且可以发生相同的宏体排序。在我看来，主要区别在于，对于内联函数，函数调用处以及逗号运算符处都有保证的序列点（如主要答案中所述）。使用宏，您会丢失与功能相关的序列点。（所以，也许这是一个显着的差异......）然而，在很多方面，这个问题很像关于有多少天使可以在大头针头上跳舞的问题 - 这在实践中并不是很重要。如果有人在代码审查中向我提供表达式 (a(),b()) + (c(),d())，我会告诉他们重写代码以使其清晰：

a();
c();
x = b() + d();

并且假设 b() 与 d() 没有严格的排序要求。

The expression f() + g() contains a minimum of 4 sequence points; one before the call to f() (after all zero of its arguments are evaluated); one before the call to g() (after all zero of its arguments are evaluated); one as the call to f() returns; and one as the call to g() returns. Further, the two sequence points associated with f() occur either both before or both after the two sequence points associated with g(). What you cannot tell is which order the sequence points will occur in - whether the f-points occur before the g-points or vice versa.

Even if the compiler inlined the code, it has to obey the 'as if' rule - the code must behave the same as if the functions were not interleaved. That limits the scope for damage (assuming a non-buggy compiler).

So the sequence in which f() and g() are evaluated is unspecified. But everything else is pretty clean.

In a comment, supercat asks:

I would expect function calls in the source code remain as sequence points even if a compiler decides on its own to inline them. Does that remain true of functions declared "inline", or does the compiler get extra latitude?

I believe the 'as if' rule applies and the compiler doesn't get extra latitude to omit sequence points because it uses an explicitly inline function. The main reason for thinking that (being too lazy to look for the exact wording in the standard) is that the compiler is allowed to inline or not inline a function according to its rules, but the behaviour of the program should not change (except for performance).

Also, what can be said about the sequencing of (a(),b()) + (c(),d())? Is it possible for c() and/or d() to execute between a() and b(), or for a() or b() to execute between c() and d()?

Clearly, a executes before b, and c executes before d. I believe it is possible for c and d to be executed between a and b, though it is fairly unlikely that it the compiler would generate the code like that; similarly, a and b could be executed between c and d. And although I used 'and' in 'c and d', that could be an 'or' - that is, any of these sequences of operation meet the constraints:
- Definitely allowed
- abcd
- cdab
- Possibly allowed (preserves a ≺ b, c ≺ d ordering)
- acbd
- acdb
- cadb
- cabd
I believe that covers all possible sequences. See also the chat between Jonathan Leffler and AnArrayOfFunctions — the gist is that AnArrayOfFunctions does not think the 'possibly allowed' sequences are allowed at all.

If such a thing would be possible, that would imply a significant difference between inline functions and macros.

There are significant differences between inline functions and macros, but I don't think the ordering in the expression is one of them. That is, any of the functions a, b, c or d could be replaced with a macro, and the same sequencing of the macro bodies could occur. The primary difference, it seems to me, is that with the inline functions, there are guaranteed sequence points at the function calls - as outlined in the main answer - as well as at the comma operators. With macros, you lose the function-related sequence points. (So, maybe that is a significant difference...) However, in so many ways the issue is rather like questions about how many angels can dance on the head of a pin - it isn't very important in practice. If someone presented me with the expression (a(),b()) + (c(),d()) in a code review, I would tell them to rewrite the code to make it clear:

a();
c();
x = b() + d();

And that assumes there is no critical sequencing requirement on b() vs d().

回复收藏 0 原文

淡看悲欢离合 2024-10-04 23:15:21

有关序列点的列表，请参阅附录 C。函数调用（所有正在计算的参数和传递给函数的执行之间的点）是序列点。正如您所说，未指定首先调用哪个函数，但是两个函数中的每一个都会看到另一个函数的所有副作用，或者根本不会。

回复收藏 0 原文

汹涌人海 2024-10-04 23:15:21

@dmckee

嗯，这不适合评论，但事情是这样的：

首先，你编写一个正确的静态分析器。在这种情况下，“正确”意味着如果分析的代码有任何可疑之处，它不会保持沉默，因此在这个阶段，您可以愉快地将未定义和未指定的行为混为一谈。它们在关键代码中都是不好的且不可接受的，并且您对它们都提出了警告，这是正确的。

但是您只想对一个可能的错误发出警告一次，并且您知道与其他可能不正确的分析器相比，您的分析器将在基准测试中根据“精度”和“召回率”进行判断，因此您不能对同一问题警告两次...无论是真警报还是假警报（您不知道哪个。您永远不知道哪个，否则就太容易了）。

因此，您希望发出一个警告，因为

*p = x;
y = *p;

只要 p 在第一个语句中是有效指针，就可以假定它在第二个语句中是有效指针。不推断这一点会降低你在精度指标上的得分。

因此，一旦您在上面的代码中第一次警告它，您就教您的分析器假设 p 是一个有效的指针，这样您就不会在第二次警告它。更一般地说，您学会忽略与您已经警告过的内容相对应的值（和执行路径）。

然后，您意识到没有多少人在编写关键代码，因此您根据最初正确分析的结果，为其余的人进行其他轻量级分析。比如说，一个 C 程序切片器。

并且您告诉“他们”：您不必检查第一次分析发出的所有警报（可能通常是错误的）。只要没有触发任何一个程序，切片程序的行为就与原始程序相同。切片器生成的程序与“定义的”执行路径的切片标准等效。

用户会愉快地忽略警报并使用切片机。

然后你意识到也许存在误会。例如，大多数 memmove 的实现（你知道，处理重叠块的实现）在使用不指向同一块的指针调用时实际上会调用未指定的行为（比较不指向相同块的地址）同一块）。并且您的分析器会忽略这两个执行路径，因为两者都未指定，但实际上这两个执行路径是等效的并且一切都很好。

因此，对于警报的含义不应该有任何误解，如果有人打算忽略它们，那么只有明确无误的未定义行为才应该被排除。

这就是你最终对区分未指定行为和未定义行为产生浓厚兴趣的原因。没有人可以责怪你忽视后者。但程序员会不假思索地写出前者，当你说你的切片器排除了程序的“错误行为”时，他们不会有他们担心的感觉。

这是一个绝对不适合评论的故事的结局。向所有读到这里的人致歉。

@dmckee

Well, that won't fit inside a comment, but here is the thing:

First, you write a correct static analyzer. "Correct", in this context, means that it won't remain silent if there is anything dubious about the analyzed code, so at this stage you merrily conflate undefined and unspecified behaviors. They are both bad and unacceptable in critical code, and you warn, rightly, for both of them.

But you only want to warn once for one possible bug, and also you know that your analyzer will be judged in benchmarks in terms of "precision" and "recall" when compared to other, possibly not correct, analyzers, so you mustn't warn twice about one same problem... Be it a true or false alarm (you don't know which. you never know which, otherwise it would be too easy).

So you want to emit a single warning for

*p = x;
y = *p;

Because as soon as p is a valid pointer at the first statement, it can be assumed to be a valid pointer at the second statement. And not inferring this will lower your score on the precision metric.

So you teach your analyzer to assume that p is a valid pointer as soon as you have warned about it the first time in the above code, so that you don't warn about it the second time. More generally, you learn to ignore values (and execution paths) that correspond to something you have already warned about.

Then, you realize that not many people are writing critical code, so you make other, lightweight analyses for the rest of them, based on the results of the initial, correct analysis. Say, a C program slicer.

And you tell "them": You don't have to check about all the (possibly, often false) alarms emitted by the first analysis. The sliced program behaves the same as the original program as long as none of them is triggered. The slicer produces programs that are equivalent for the slicing criterion for "defined" execution paths.

And users merrily ignore the alarms and use the slicer.

And then you realize that perhaps there is a misunderstanding. For instance, most implementations of memmove (you know, the one that handles overlapping blocks) actually invoke unspecified behavior when called with pointers that do not point to the same block (comparing addresses that do not point to the same block). And your analyzer ignore both execution paths, because both are unspecified, but in reality both execution paths are equivalent and all is well.

So there shouldn't be any misunderstanding on the meaning of alarms, and if one intends to ignore them, only unmistakable undefined behaviors should be excluded.

And this is how you end up with a strong interest in distinguishing between unspecified behavior and undefined behavior. No-one can blame you for ignoring the latter. But programmers will write the former without even thinking about it, and when you say that your slicer excludes "wrong behaviors" of the program, they will not feel as they are concerned.

And this is the end of a story that definitely did not fit in a comment. Apologies to anyone who read that far.

回复收藏 0 原文

~没有更多了~