您遇到的 C 语言常见的未定义/未指定行为有哪些?

发布于 2024-07-04 16:17:26 字数 1453 浏览 25 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

画中仙 2024-07-11 16:17:26

语言律师问题。 嗯凯。

我个人的前三名:

  1. 违反严格的别名规则

  2. 违反严格别名规则

  3. 违反严格别名规则

    :-)

编辑 这是一个犯了两次错误的小例子:(

假设 32 位整数和小端)

float funky_float_abs (float a)
{
  unsigned int temp = *(unsigned int *)&a;
  temp &= 0x7fffffff;
  return *(float *)&temp;
}

该代码尝试按位获取浮点数的绝对值 -直接在浮点数的表示中摆弄符号位。

但是,通过从一种类型转换为另一种类型来创建指向对象的指针的结果不是有效的 C。编译器可能会假设指向不同类型的指针不指向同一内存块。 对于除 void* 和 char* 之外的所有类型的指针都是如此(符号性并不重要)。

在上面的例子中,我这样做了两次。 一次为 float a 获取 int 别名,一次将值转换回 float。

有三种有效的方法可以做到这一点。

在转换期间使用 char 或 void 指针。 它们总是别名任何东西,所以它们是安全的。

float funky_float_abs (float a)
{
  float temp_float = a;
  // valid, because it's a char pointer. These are special.
  unsigned char * temp = (unsigned char *)&temp_float;
  temp[3] &= 0x7f;
  return temp_float;
}

使用内存复制。 Memcpy 采用 void 指针,因此它也会强制使用别名。

float funky_float_abs (float a)
{
  int i;
  float result;
  memcpy (&i, &a, sizeof (int));
  i &= 0x7fffffff;
  memcpy (&result, &i, sizeof (int));
  return result;
}

第三种有效方法:使用联合体。 自 C99 以来,这显然是未定义的:

float funky_float_abs (float a)
{
  union 
  {
     unsigned int i;
     float f;
  } cast_helper;

  cast_helper.f = a;
  cast_helper.i &= 0x7fffffff;
  return cast_helper.f;
}

A language lawyer question. Hmkay.

My personal top3:

  1. violating the strict aliasing rule

  2. violating the strict aliasing rule

  3. violating the strict aliasing rule

    :-)

Edit Here is a little example that does it wrong twice:

(assume 32 bit ints and little endian)

float funky_float_abs (float a)
{
  unsigned int temp = *(unsigned int *)&a;
  temp &= 0x7fffffff;
  return *(float *)&temp;
}

That code tries to get the absolute value of a float by bit-twiddling with the sign bit directly in the representation of a float.

However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).

In the case above I do that twice. Once to get an int-alias for the float a, and once to convert the value back to float.

There are three valid ways to do the same.

Use a char or void pointer during the cast. These always alias to anything, so they are safe.

float funky_float_abs (float a)
{
  float temp_float = a;
  // valid, because it's a char pointer. These are special.
  unsigned char * temp = (unsigned char *)&temp_float;
  temp[3] &= 0x7f;
  return temp_float;
}

Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.

float funky_float_abs (float a)
{
  int i;
  float result;
  memcpy (&i, &a, sizeof (int));
  i &= 0x7fffffff;
  memcpy (&result, &i, sizeof (int));
  return result;
}

The third valid way: use unions. This is explicitly not undefined since C99:

float funky_float_abs (float a)
{
  union 
  {
     unsigned int i;
     float f;
  } cast_helper;

  cast_helper.f = a;
  cast_helper.i &= 0x7fffffff;
  return cast_helper.f;
}
魔法少女 2024-07-11 16:17:26

如果函数原型不可用,编译器不必告诉您正在调用参数数量错误/参数类型错误的函数。

A compiler doesn't have to tell you that you're calling a function with the wrong number of parameters/wrong parameter types if the function prototype isn't available.

A君 2024-07-11 16:17:26

clang 开发人员发布了一些很棒的示例不久前,一篇文章是每个 C 程序员都应该阅读的。 之前没有提到的一些有趣的内容:

  • 有符号整数溢出 - 不,将有符号变量包装超过其最大值是不行的。
  • 取消引用 NULL 指针 - 是的,这是未定义的,并且可能会被忽略,请参阅链接的第 2 部分。

The clang developers posted some great examples a while back, in a post every C programmer should read. Some interesting ones not mentioned before:

  • Signed integer overflow - no it's not ok to wrap a signed variable past its max.
  • Dereferencing a NULL Pointer - yes this is undefined, and might be ignored, see part 2 of the link.
逆光下的微笑 2024-07-11 16:17:26

EE 刚刚发现 a>>-2 有点令人担忧。

我点点头并告诉他们这不自然。

The EE's here just discovered that a>>-2 is a bit fraught.

I nodded and told them it was not natural.

空‖城人不在 2024-07-11 16:17:26

请务必在使用变量之前对其进行初始化! 当我刚开始接触 C 语言时,这让我很头疼。

Be sure to always initialize your variables before you use them! When I had just started with C, that caused me a number of headaches.

感性不性感 2024-07-11 16:17:26

将某物除以指向某物的指针。 只是由于某种原因无法编译...:-)

result = x/*y;

Dividing something by a pointer to something. Just won't compile for some reason... :-)

result = x/*y;
远山浅 2024-07-11 16:17:26

我遇到的另一个问题(已定义,但绝对是意外的)。

炭是邪恶的。

  • 有符号或无符号取决于编译器认为
  • 强制为 8 位

Another issue I encountered (which is defined, but definitely unexpected).

char is evil.

  • signed or unsigned depending on what the compiler feels
  • not mandated as 8 bits
花落人断肠 2024-07-11 16:17:26

我无法计算我纠正 printf 格式说明符以匹配其参数的次数。 任何不匹配都是未定义的行为

  • 否,您不得将 int(或 long)传递给 %x - 需要 unsigned int
  • 否,您不得将 unsigned int 传递给 %d - 需要 int
  • 不,您不得传递 size_t code> 到 %u%d - 使用 %zu
  • 不,您不能使用 %d 打印指针> 或 %x - 使用 %p 并转换为 void *

I can't count the number of times I've corrected printf format specifiers to match their argument. Any mismatch is undefined behavior.

  • No, you must not pass an int (or long) to %x - an unsigned int is required
  • No, you must not pass an unsigned int to %d - an int is required
  • No, you must not pass a size_t to %u or %d - use %zu
  • No, you must not print a pointer with %d or %x - use %p and cast to a void *
一曲爱恨情仇 2024-07-11 16:17:26

我见过很多相对缺乏经验的程序员被多字符常量所困扰。

"x"

是一个字符串文字(其类型为 char[2] 并在大多数情况下衰减为 char*)。

'x'

是一个普通的字符常量(由于历史原因,其类型为int)。

this:

'xy'

也是一个完全合法的字符常量,但它的值(仍然是 int 类型)是实现定义的。 这是一个几乎无用的语言功能,主要是造成混乱。

I've seen a lot of relatively inexperienced programmers bitten by multi-character constants.

This:

"x"

is a string literal (which is of type char[2] and decays to char* in most contexts).

This:

'x'

is an ordinary character constant (which, for historical reasons, is of type int).

This:

'xy'

is also a perfectly legal character constant, but its value (which is still of type int) is implementation-defined. It's a nearly useless language feature that serves mostly to cause confusion.

梦在夏天 2024-07-11 16:17:26

我个人最喜欢的未定义行为是,如果非空源文件不以换行符结尾,则行为是未定义的。

我怀疑这是真的,尽管我见过的编译器都没有根据源文件是否换行符来不同地处理它,除了发出警告之外。 因此,这并不是真正会让不知情的程序员感到惊讶的事情,除了他们可能会对警告感到惊讶之外。

因此,对于真正的可移植性问题(主要是依赖于实现而不是未指定或未定义,但我认为这符合问题的精神):

  • char 不一定是(未)签名的。
  • int 可以是 16 位以上的任意大小。
  • 浮点数不一定是 IEEE 格式或符合 IEEE 格式。
  • 整数类型不一定是二进制补码,并且整数算术溢出会导致未定义的行为(现代硬件不会崩溃,但某些编译器优化将导致与回绕不同的行为,即使这是硬件所做的。例如 if (x当 x 具有签名类型时,+1 < x) 可能会被优化为始终为 false:请参阅 GCC 中的 -fstrict-overflow 选项)。
  • “/”,“。” #include 中的“..”和“..”没有定义的含义,不同的编译器可以以不同的方式处理(这实际上有所不同,如果出错,就会毁了你的一天)。

即使在您开发的平台上,也可能会令人惊讶,因为行为只是部分未定义/未指定:

  • POSIX 线程和 ANSI 内存模型。 对内存的并发访问并不像新手想象的那么明确。 挥发性并不像新手想象的那样。 内存访问的顺序并不像新手想象的那么明确。 访问可以以某些方向跨过内存屏障。 不需要内存缓存一致性。

  • 分析代码并不像您想象的那么容易。 如果您的测试循环没有效果,编译器可以删除部分或全部。 内联没有定义的效果。

而且,正如我认为尼尔斯顺便提到的那样:

  • 违反了严格的别名规则。

My personal favourite undefined behaviour is that if a non-empty source file doesn't end in a newline, behaviour is undefined.

I suspect it's true though that no compiler I will ever see has treated a source file differently according to whether or not it is newline terminated, other than to emit a warning. So it's not really something that will surprise unaware programmers, other than that they might be surprised by the warning.

So for genuine portability issues (which mostly are implementation-dependent rather than unspecified or undefined, but I think that falls into the spirit of the question):

  • char is not necessarily (un)signed.
  • int can be any size from 16 bits.
  • floats are not necessarily IEEE-formatted or conformant.
  • integer types are not necessarily two's complement, and integer arithmetic overflow causes undefined behaviour (modern hardware won't crash, but some compiler optimizations will result in behavior different from wraparound even though that's what the hardware does. For example if (x+1 < x) may be optimized as always false when x has signed type: see -fstrict-overflow option in GCC).
  • "/", "." and ".." in a #include have no defined meaning and can be treated differently by different compilers (this does actually vary, and if it goes wrong it will ruin your day).

Really serious ones that can be surprising even on the platform you developed on, because behaviour is only partially undefined / unspecified:

  • POSIX threading and the ANSI memory model. Concurrent access to memory is not as well defined as novices think. volatile doesn't do what novices think. Order of memory accesses is not as well defined as novices think. Accesses can be moved across memory barriers in certain directions. Memory cache coherency is not required.

  • Profiling code is not as easy as you think. If your test loop has no effect, the compiler can remove part or all of it. inline has no defined effect.

And, as I think Nils mentioned in passing:

  • VIOLATING THE STRICT ALIASING RULE.
如梦亦如幻 2024-07-11 16:17:26

我最喜欢的是:

// what does this do?
x = x++;

回答一些评论,根据标准,这是未定义的行为。 看到这一点,编译器就可以执行任何操作,包括格式化硬盘。
例如,请参阅 此评论在这里。 重点不在于您可以看到对某些行为可能有合理的期望。 由于 C++ 标准和序列点的定义方式,这行代码实际上是未定义的行为。

例如,如果我们在上面的行之前有 x = 1,那么之后的有效结果是什么? 有人评论说应该是

x 增加 1

所以之后我们应该看到 x == 2。 然而这实际上并非如此,您会发现一些编译器之后有 x == 1,甚至 x == 3。您必须仔细查看生成的程序集以了解为什么会出现这种情况,但差异是由于到根本问题。 本质上,我认为这是因为编译器可以按照它喜欢的任何顺序评估两个赋值语句,因此它可以先执行 x++ ,或先执行 x =

My favorite is this:

// what does this do?
x = x++;

To answer some comments, it is undefined behaviour according to the standard. Seeing this, the compiler is allowed to do anything up to and including format your hard drive.
See for example this comment here. The point is not that you can see there is a possible reasonable expectation of some behaviour. Because of the C++ standard and the way the sequence points are defined, this line of code is actually undefined behaviour.

For example, if we had x = 1 before the line above, then what would the valid result be afterwards? Someone commented that it should be

x is incremented by 1

so we should see x == 2 afterwards. However this is not actually true, you will find some compilers that have x == 1 afterwards, or maybe even x == 3. You would have to look closely at the generated assembly to see why this might be, but the differences are due to the underlying problem. Essentially, I think this is because the compiler is allowed to evaluate the two assignments statements in any order it likes, so it could do the x++ first, or the x = first.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文