在上面的例子中,我这样做了两次。 一次为 float a 获取 int 别名,一次将值转换回 float。
有三种有效的方法可以做到这一点。
在转换期间使用 char 或 void 指针。 它们总是别名任何东西,所以它们是安全的。
float funky_float_abs (float a)
{
float temp_float = a;
// valid, because it's a char pointer. These are special.
unsigned char * temp = (unsigned char *)&temp_float;
temp[3] &= 0x7f;
return temp_float;
}
使用内存复制。 Memcpy 采用 void 指针,因此它也会强制使用别名。
float funky_float_abs (float a)
{
int i;
float result;
memcpy (&i, &a, sizeof (int));
i &= 0x7fffffff;
memcpy (&result, &i, sizeof (int));
return result;
}
第三种有效方法:使用联合体。 自 C99 以来,这显然是未定义的:
float funky_float_abs (float a)
{
union
{
unsigned int i;
float f;
} cast_helper;
cast_helper.f = a;
cast_helper.i &= 0x7fffffff;
return cast_helper.f;
}
A language lawyer question. Hmkay.
My personal top3:
violating the strict aliasing rule
violating the strict aliasing rule
violating the strict aliasing rule
:-)
Edit Here is a little example that does it wrong twice:
(assume 32 bit ints and little endian)
float funky_float_abs (float a)
{
unsigned int temp = *(unsigned int *)&a;
temp &= 0x7fffffff;
return *(float *)&temp;
}
That code tries to get the absolute value of a float by bit-twiddling with the sign bit directly in the representation of a float.
However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).
In the case above I do that twice. Once to get an int-alias for the float a, and once to convert the value back to float.
There are three valid ways to do the same.
Use a char or void pointer during the cast. These always alias to anything, so they are safe.
float funky_float_abs (float a)
{
float temp_float = a;
// valid, because it's a char pointer. These are special.
unsigned char * temp = (unsigned char *)&temp_float;
temp[3] &= 0x7f;
return temp_float;
}
Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.
float funky_float_abs (float a)
{
int i;
float result;
memcpy (&i, &a, sizeof (int));
i &= 0x7fffffff;
memcpy (&result, &i, sizeof (int));
return result;
}
The third valid way: use unions. This is explicitly not undefined since C99:
float funky_float_abs (float a)
{
union
{
unsigned int i;
float f;
} cast_helper;
cast_helper.f = a;
cast_helper.i &= 0x7fffffff;
return cast_helper.f;
}
A compiler doesn't have to tell you that you're calling a function with the wrong number of parameters/wrong parameter types if the function prototype isn't available.
也是一个完全合法的字符常量,但它的值(仍然是 int 类型)是实现定义的。 这是一个几乎无用的语言功能,主要是造成混乱。
I've seen a lot of relatively inexperienced programmers bitten by multi-character constants.
This:
"x"
is a string literal (which is of type char[2] and decays to char* in most contexts).
This:
'x'
is an ordinary character constant (which, for historical reasons, is of type int).
This:
'xy'
is also a perfectly legal character constant, but its value (which is still of type int) is implementation-defined. It's a nearly useless language feature that serves mostly to cause confusion.
My personal favourite undefined behaviour is that if a non-empty source file doesn't end in a newline, behaviour is undefined.
I suspect it's true though that no compiler I will ever see has treated a source file differently according to whether or not it is newline terminated, other than to emit a warning. So it's not really something that will surprise unaware programmers, other than that they might be surprised by the warning.
So for genuine portability issues (which mostly are implementation-dependent rather than unspecified or undefined, but I think that falls into the spirit of the question):
char is not necessarily (un)signed.
int can be any size from 16 bits.
floats are not necessarily IEEE-formatted or conformant.
integer types are not necessarily two's complement, and integer arithmetic overflow causes undefined behaviour (modern hardware won't crash, but some compiler optimizations will result in behavior different from wraparound even though that's what the hardware does. For example if (x+1 < x) may be optimized as always false when x has signed type: see -fstrict-overflow option in GCC).
"/", "." and ".." in a #include have no defined meaning and can be treated differently by different compilers (this does actually vary, and if it goes wrong it will ruin your day).
Really serious ones that can be surprising even on the platform you developed on, because behaviour is only partially undefined / unspecified:
POSIX threading and the ANSI memory model. Concurrent access to memory is not as well defined as novices think. volatile doesn't do what novices think. Order of memory accesses is not as well defined as novices think. Accesses can be moved across memory barriers in certain directions. Memory cache coherency is not required.
Profiling code is not as easy as you think. If your test loop has no effect, the compiler can remove part or all of it. inline has no defined effect.
回答一些评论,根据标准,这是未定义的行为。 看到这一点,编译器就可以执行任何操作,包括格式化硬盘。 例如,请参阅 此评论在这里。 重点不在于您可以看到对某些行为可能有合理的期望。 由于 C++ 标准和序列点的定义方式,这行代码实际上是未定义的行为。
例如,如果我们在上面的行之前有 x = 1,那么之后的有效结果是什么? 有人评论说应该是
x 增加 1
所以之后我们应该看到 x == 2。 然而这实际上并非如此,您会发现一些编译器之后有 x == 1,甚至 x == 3。您必须仔细查看生成的程序集以了解为什么会出现这种情况,但差异是由于到根本问题。 本质上,我认为这是因为编译器可以按照它喜欢的任何顺序评估两个赋值语句,因此它可以先执行 x++ ,或先执行 x = 。
My favorite is this:
// what does this do?
x = x++;
To answer some comments, it is undefined behaviour according to the standard. Seeing this, the compiler is allowed to do anything up to and including format your hard drive. See for example this comment here. The point is not that you can see there is a possible reasonable expectation of some behaviour. Because of the C++ standard and the way the sequence points are defined, this line of code is actually undefined behaviour.
For example, if we had x = 1 before the line above, then what would the valid result be afterwards? Someone commented that it should be
x is incremented by 1
so we should see x == 2 afterwards. However this is not actually true, you will find some compilers that have x == 1 afterwards, or maybe even x == 3. You would have to look closely at the generated assembly to see why this might be, but the differences are due to the underlying problem. Essentially, I think this is because the compiler is allowed to evaluate the two assignments statements in any order it likes, so it could do the x++ first, or the x = first.
发布评论
评论(11)
语言律师问题。 嗯凯。
我个人的前三名:
违反严格的别名规则
违反严格别名规则
违反严格别名规则
:-)
编辑 这是一个犯了两次错误的小例子:(
假设 32 位整数和小端)
该代码尝试按位获取浮点数的绝对值 -直接在浮点数的表示中摆弄符号位。
但是,通过从一种类型转换为另一种类型来创建指向对象的指针的结果不是有效的 C。编译器可能会假设指向不同类型的指针不指向同一内存块。 对于除 void* 和 char* 之外的所有类型的指针都是如此(符号性并不重要)。
在上面的例子中,我这样做了两次。 一次为 float a 获取 int 别名,一次将值转换回 float。
有三种有效的方法可以做到这一点。
在转换期间使用 char 或 void 指针。 它们总是别名任何东西,所以它们是安全的。
使用内存复制。 Memcpy 采用 void 指针,因此它也会强制使用别名。
第三种有效方法:使用联合体。 自 C99 以来,这显然是未定义的:
A language lawyer question. Hmkay.
My personal top3:
violating the strict aliasing rule
violating the strict aliasing rule
violating the strict aliasing rule
:-)
Edit Here is a little example that does it wrong twice:
(assume 32 bit ints and little endian)
That code tries to get the absolute value of a float by bit-twiddling with the sign bit directly in the representation of a float.
However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).
In the case above I do that twice. Once to get an int-alias for the float a, and once to convert the value back to float.
There are three valid ways to do the same.
Use a char or void pointer during the cast. These always alias to anything, so they are safe.
Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.
The third valid way: use unions. This is explicitly not undefined since C99:
如果函数原型不可用,编译器不必告诉您正在调用参数数量错误/参数类型错误的函数。
A compiler doesn't have to tell you that you're calling a function with the wrong number of parameters/wrong parameter types if the function prototype isn't available.
clang 开发人员发布了一些很棒的示例不久前,一篇文章是每个 C 程序员都应该阅读的。 之前没有提到的一些有趣的内容:
The clang developers posted some great examples a while back, in a post every C programmer should read. Some interesting ones not mentioned before:
EE 刚刚发现 a>>-2 有点令人担忧。
我点点头并告诉他们这不自然。
The EE's here just discovered that a>>-2 is a bit fraught.
I nodded and told them it was not natural.
请务必在使用变量之前对其进行初始化! 当我刚开始接触 C 语言时,这让我很头疼。
Be sure to always initialize your variables before you use them! When I had just started with C, that caused me a number of headaches.
将某物除以指向某物的指针。 只是由于某种原因无法编译...:-)
Dividing something by a pointer to something. Just won't compile for some reason... :-)
我遇到的另一个问题(已定义,但绝对是意外的)。
炭是邪恶的。
Another issue I encountered (which is defined, but definitely unexpected).
char is evil.
我无法计算我纠正 printf 格式说明符以匹配其参数的次数。 任何不匹配都是未定义的行为。
int
(或long
)传递给%x
- 需要unsigned int
unsigned int
传递给%d
- 需要int
size_t
code> 到%u
或%d
- 使用%zu
%d
打印指针> 或%x
- 使用%p
并转换为void *
I can't count the number of times I've corrected printf format specifiers to match their argument. Any mismatch is undefined behavior.
int
(orlong
) to%x
- anunsigned int
is requiredunsigned int
to%d
- anint
is requiredsize_t
to%u
or%d
- use%zu
%d
or%x
- use%p
and cast to avoid *
我见过很多相对缺乏经验的程序员被多字符常量所困扰。
这
是一个字符串文字(其类型为
char[2]
并在大多数情况下衰减为char*
)。这
是一个普通的字符常量(由于历史原因,其类型为
int
)。this:
也是一个完全合法的字符常量,但它的值(仍然是
int
类型)是实现定义的。 这是一个几乎无用的语言功能,主要是造成混乱。I've seen a lot of relatively inexperienced programmers bitten by multi-character constants.
This:
is a string literal (which is of type
char[2]
and decays tochar*
in most contexts).This:
is an ordinary character constant (which, for historical reasons, is of type
int
).This:
is also a perfectly legal character constant, but its value (which is still of type
int
) is implementation-defined. It's a nearly useless language feature that serves mostly to cause confusion.我个人最喜欢的未定义行为是,如果非空源文件不以换行符结尾,则行为是未定义的。
我怀疑这是真的,尽管我见过的编译器都没有根据源文件是否换行符来不同地处理它,除了发出警告之外。 因此,这并不是真正会让不知情的程序员感到惊讶的事情,除了他们可能会对警告感到惊讶之外。
因此,对于真正的可移植性问题(主要是依赖于实现而不是未指定或未定义,但我认为这符合问题的精神):
if (x当
可能会被优化为始终为 false:请参阅 GCC 中的x
具有签名类型时,+1 < x)-fstrict-overflow
选项)。即使在您开发的平台上,也可能会令人惊讶,因为行为只是部分未定义/未指定:
POSIX 线程和 ANSI 内存模型。 对内存的并发访问并不像新手想象的那么明确。 挥发性并不像新手想象的那样。 内存访问的顺序并不像新手想象的那么明确。 访问可以以某些方向跨过内存屏障。 不需要内存缓存一致性。
分析代码并不像您想象的那么容易。 如果您的测试循环没有效果,编译器可以删除部分或全部。 内联没有定义的效果。
而且,正如我认为尼尔斯顺便提到的那样:
My personal favourite undefined behaviour is that if a non-empty source file doesn't end in a newline, behaviour is undefined.
I suspect it's true though that no compiler I will ever see has treated a source file differently according to whether or not it is newline terminated, other than to emit a warning. So it's not really something that will surprise unaware programmers, other than that they might be surprised by the warning.
So for genuine portability issues (which mostly are implementation-dependent rather than unspecified or undefined, but I think that falls into the spirit of the question):
if (x+1 < x)
may be optimized as always false whenx
has signed type: see-fstrict-overflow
option in GCC).Really serious ones that can be surprising even on the platform you developed on, because behaviour is only partially undefined / unspecified:
POSIX threading and the ANSI memory model. Concurrent access to memory is not as well defined as novices think. volatile doesn't do what novices think. Order of memory accesses is not as well defined as novices think. Accesses can be moved across memory barriers in certain directions. Memory cache coherency is not required.
Profiling code is not as easy as you think. If your test loop has no effect, the compiler can remove part or all of it. inline has no defined effect.
And, as I think Nils mentioned in passing:
我最喜欢的是:
回答一些评论,根据标准,这是未定义的行为。 看到这一点,编译器就可以执行任何操作,包括格式化硬盘。
例如,请参阅 此评论在这里。 重点不在于您可以看到对某些行为可能有合理的期望。 由于 C++ 标准和序列点的定义方式,这行代码实际上是未定义的行为。
例如,如果我们在上面的行之前有
x = 1
,那么之后的有效结果是什么? 有人评论说应该是所以之后我们应该看到 x == 2。 然而这实际上并非如此,您会发现一些编译器之后有 x == 1,甚至 x == 3。您必须仔细查看生成的程序集以了解为什么会出现这种情况,但差异是由于到根本问题。 本质上,我认为这是因为编译器可以按照它喜欢的任何顺序评估两个赋值语句,因此它可以先执行
x++
,或先执行x =
。My favorite is this:
To answer some comments, it is undefined behaviour according to the standard. Seeing this, the compiler is allowed to do anything up to and including format your hard drive.
See for example this comment here. The point is not that you can see there is a possible reasonable expectation of some behaviour. Because of the C++ standard and the way the sequence points are defined, this line of code is actually undefined behaviour.
For example, if we had
x = 1
before the line above, then what would the valid result be afterwards? Someone commented that it should beso we should see x == 2 afterwards. However this is not actually true, you will find some compilers that have x == 1 afterwards, or maybe even x == 3. You would have to look closely at the generated assembly to see why this might be, but the differences are due to the underlying problem. Essentially, I think this is because the compiler is allowed to evaluate the two assignments statements in any order it likes, so it could do the
x++
first, or thex =
first.