什么时候指针类型之间的转换不是 C 中未定义的行为?
作为 C 语言的新手,我很困惑什么时候强制转换指针实际上是可以的。
据我了解,您几乎可以将任何指针类型转换为任何其他类型,并且编译器会让您这样做。例如:
int a = 5;
int* intPtr = &a;
char* charPtr = (char*) intPtr;
但是,通常这会调用未定义的行为(尽管它恰好适用于许多平台)。 这就是说,似乎有一些例外:
- 您可以自由地与
void*
进行转换(?) - 您可以自由地与
char*
进行转换(?)
(位于至少我在代码中见过它......)。
那么,哪些指针类型之间的转换在 C 中不是未定义行为?
编辑:
我尝试研究 C 标准(“6.3.2.3 指针”部分,位于 http://c0x.coding-guidelines.com/6.3.2.3.html ),但除了有关 void*
的内容外,并没有真正理解它。
Edit2:
只是为了澄清:我明确只询问“普通”指针,即而不关于函数指针。我意识到转换函数指针的规则非常严格。事实上,我已经问过这个问题了:-): 如果我转换函数指针并更改参数数量,会发生什么
As a newcomer to C, I'm confused about when casting a pointer is actually OK.
As I understand, you can pretty much cast any pointer type to any other type, and the compiler will let you do it. For example:
int a = 5;
int* intPtr = &a;
char* charPtr = (char*) intPtr;
However, in general this invokes undefined behavior (though it happens to work on many platforms).
This said, there seem to be some exceptions:
- you can cast to and from
void*
freely (?) - you can cast to and from
char*
freely (?)
(at least I've seen it in code...).
So which casts between pointer types are not undefined behaviour in C?
Edit:
I tried looking into the C standard (section "6.3.2.3 Pointers", at http://c0x.coding-guidelines.com/6.3.2.3.html ), but didn't really understand it, apart from the bit about void*
.
Edit2:
Just for clarification: I'm explicitly only asking about "normal" pointers, i.e. not about function pointers. I realize that the rules for casting function pointers are very restrictive. As I matter of fact, I've already asked about that :-): What happens if I cast a function pointer, changing the number of parameters
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
基本上:
T *
可以自由地转换为void *
并再次转换回来(其中T *
不是函数指针),并且您就会得到原来的指针。T *
可以自由地转换为U *
并再次转换回来(其中T *
和U *
是不是函数指针),如果对齐要求相同,你将得到原始指针。如果不是,则行为未定义。注意:
T *
(对于非函数指针)始终满足char *
的对齐要求。重要提示:这些规则都没有说明将
T *
转换为U * 然后尝试取消引用它。这是标准的一个完全不同的领域。
Basically:
T *
may be freely converted to avoid *
and back again (whereT *
is not a function pointer), and you will get the original pointer.T *
may be freely converted to aU *
and back again (whereT *
andU *
are not function pointers), and you will get the original pointer if the alignment requirements are the same. If not, the behaviour is undefined.Note:
T *
(for non-function-pointers) always satisfies the alignment requirements forchar *
.Important: None of these rules says anything about what happens if you convert, say, a
T *
to aU *
and then try to dereference it. That's a whole different area of the standard.Oli Charlesworth 的出色答案列出了将指针转换为不同类型的指针给出明确定义的结果的所有情况。
此外,在四种情况下,转换指针会产生实现定义的结果:
intptr_t
和uintptr_t
。结果是实现定义的。在将内存寻址为连续字节流的平台上(大多数现代平台使用的“线性内存模型”),它通常返回指针指向的内存地址的数值,因此只是一个字节计数。然而,并非所有平台都使用线性内存模型,这就是为什么这是实现定义的:-)。intptr_t
或uintptr_t
来说足够大,并且是通过转换指针创建的,则将其转换回相同的指针类型将返回该指针(但可能不再有效)。否则结果是实现定义的。请注意,实际上取消引用指针(而不是仅仅读取其值)可能仍然是UB。char*
。然后结果指向对象的最低寻址字节,并且您可以通过递增指针直至对象的大小来读取对象的剩余字节。当然,您实际获得的值又是实现定义的...来源:C99 标准,第 6.3.2.3 节“指针”和 7.18.1.4“能够保存对象指针的整数类型”。
据我所知,指针到指针的所有其他转换不同类型的行为是未定义的。特别是,如果您没有转换为
char
或足够大的整数类型,则将指针转换为不同的指针类型可能总是 - 即使没有取消引用它。这是因为类型可能具有不同的对齐方式,并且没有通用的、可移植的方法来确保不同类型具有兼容的对齐方式(除了一些特殊情况,例如有符号/无符号整数类型对)。
Oli Charlesworth's excellent answer lists all cases where casting a pointer to a pointer of a different type gives a well-defined result.
In addition, there are four cases where casting a pointer gives implementation-defined results:
intptr_t
anduintptr_t
for this purpose. The result is implementation-defined. On platforms that address memory as a contiguous stream of bytes ("linear memory model", used by most modern platforms), it usually returns the numeric value of the memory address the pointer points to, thus simply a byte count. However, not all platforms use a linear memory model, which is why this is implementation-defined :-).intptr_t
oruintptr_t
and was created by casting a pointer, casting it back to the same pointer type will give you back that pointer (which however may no longer be valid). Otherwise the result is implementation-defined. Note that actually dereferencing the pointer (as opposed to just reading its value) may still be UB.char*
. Then the result points to the lowest addressed byte of the object, and you can read the remaining bytes of the object by incrementing the pointer, up to the object's size. Of course, which values you actually get is again implementation-defined...Source: C99 standard, sections 6.3.2.3 "Pointers", and 7.18.1.4 "Integer types capable of holding object pointers".
As far as I can tell, all other casts of a pointer to a pointer of a different type are undefined behavior. In particular, if you are not casting to
char
or a sufficiently large integer type, it may always be UB to cast a pointer to a different pointer type - even without dereferencing it.This is because the types may have different alignment, and there is no general, portable way to make sure different types have compatible alignment (except for some special cases, such as signed/unsigned integer type pairs).
一般来说,如果像现在一样,指针本身具有相同的对齐属性,则问题不在于强制转换本身,而在于是否可以通过指针访问数据。
对于任何对象类型
T
,保证将任何类型T*
强制转换为void*
并返回:这保证给你完全相同的指针。void*
是捕获所有对象指针类型。对于对象类型之间的其他转换,无法保证,通过此类指针访问对象可能会导致各种问题,例如对齐(总线错误)、整数的陷阱表示。不同的指针类型甚至不能保证具有相同的宽度,因此理论上您甚至可能会丢失信息。
不过,一种应该始终有效的强制转换是
(unsigned char*)
。通过这样的指针,您可以研究对象的各个字节。Generally, if as usual nowadays the pointers themselves have the same alignment properties, the problem is not the cast itself, but whether or not you may access the data through the pointer.
Casting any type
T*
tovoid*
and back is guaranteed for any object typeT
: this is guaranteed to give you exactly the same pointer back.void*
is the catch all object pointer type.For other casts between object types there is no guarantee, accessing an object through such a pointer may cause all sorts of problems, such as alignments (bus error), trap representations of integers. Different pointer types are not even guaranteed to have the same width, so theoretically you might even loose information.
One cast that should always work, though, is to
(unsigned char*)
. Through such a pointer you may then investigate the individual bytes of your object.该标准的作者没有尝试权衡在支持昂贵的平台上支持大多数指针类型组合之间的转换的成本和收益,因为:
大多数此类转换成本高昂的平台可能是晦涩难懂的标准作者不知道的平台。
使用此类平台的人们将比标准的作者更好地了解此类支持的成本和收益。
如果某些特定平台对
int*
和double*
使用不同的表示形式,我认为标准会故意允许从double 进行圆滴转换的可能性*
到int*
并返回到double*
会一致工作,但从int*
到double* 并返回到
int*
可能会失败。我认为该标准的作者并不打算让此类操作在此类转换无需任何费用的平台上失败。他们在章程和基本原理文档中将 C 精神描述为包括“不要阻止(或不必要地阻碍)程序员做需要做的事情”的原则。考虑到这一原则,标准就没有必要强制要求实现以一种帮助程序员完成他们需要做的事情的方式处理操作,而这样做不需要花费任何成本,因为实现是真诚地努力维护不管有没有授权,C Spirit 都会以这种方式行事。
The authors of the Standard made no attempt to weigh the costs and benefits of supporting conversions among most combinations of pointer types on platforms where such support would be expensive, since:
Most platforms where such conversions would be expensive would likely have been obscure ones the authors of the Standard didn't know about.
People using such platforms would be better placed than the authors of the Standard with the costs and benefits of such support.
If some particular platform uses a different representation for
int*
anddouble*
, I think the Standard would deliberately allow for the possibility that e.g. round-drip conversion fromdouble*
toint*
and back todouble*
would work consistently but conversions fromint*
todouble*
and back toint*
might fail.I don't think the authors of the Standard intended that such operations might fail on platforms where such conversions cost nothing. They described the Spirit of C in the charter and rationale documents as including the principle "Don't prevent [or needlessly obstruct] the programmer from doing what needs to be done." Given that principle, there would be no need for the Standard to mandate that implementations process actions in a way that helps programmers accomplish what they need to do in cases where doing so would cost nothing, since implementations that make a bona fide effort to uphold the Spirit of C will behave in such fashion with or without a mandate.