在 POSIX 中将整数值转换为 void* 并再次转换回来总是安全的吗?

发布于 2024-12-10 22:17:31 字数 1357 浏览 0 评论 0原文

这个问题几乎与我发现的其他一些问题重复,但这特别涉及 POSIX,并且是我多次遇到的 pthreads 中一个非常常见的示例。我主要关心当前的情况(即 C99 和 POSIX.1-2008 或更高版本),但任何有趣的历史信息当然也很有趣。

问题基本上归结为 b 是否总是与以下代码中的 a 取相同的值:

long int a = /* some valid value */
void *ptr = (void *)a;
long int b = (long int)ptr;

我知道这通常有效,但问题是这是否是正确的做法(即,C99 和/或POSIX 标准保证它能够工作)。

当谈到C99时似乎没有,我们有6.3.2.3:

5 整数可以转换为任何指针类型。除非作为 先前指定,结果是实现定义的,可能不是 正确对齐,可能不指向引用的实体 类型,并且可能是陷阱表示。56)

6 任何指针类型都可以 转换为整数类型。除先前指定的情况外, 结果是实现定义的。如果结果无法表示 在整数类型中,行为是未定义的。结果不一定是 在任何整数类型的值范围内。

即使使用 intptr_t,标准似乎也只能保证任何有效的 void* 可以转换为 intptr_t 并再次转换回来,但它不保证任何 intptr_t 可以转换为 void* 并再次转换回来。

然而 POSIX 标准仍然有可能允许这样做。

我不太希望使用 void* 作为任何变量的存储空间(即使 POSIX 应该允许它,我发现它也很难看),但我觉得我必须问,因为 pthreads_create 函数的常见示例使用其中start_routine 的参数是一个整数,它作为 void* 传入,并在 start_routine 函数中转换为 int 或 long int。例如此手册页有这样一个示例(完整代码请参阅链接):

//Last argument casts int to void *
pthread_create(&tid[i], NULL, sleeping, (void *)SLEEP_TIME);
/* ... */
void * sleeping(void *arg){
    //Casting void * back to int
    int sleep_time = (int)arg;
    /* ... */
}

我也在教科书中看到过类似的例子(Peter S. Pacheco 的《并行编程简介》)。考虑到这似乎是一个比我更了解这些东西的人使用的常见示例,我想知道我是否错了,这实际上是一件安全且便携的事情。

This question is almost a duplicate of some others I've found, but this specifically concerns POSIX, and a very common example in pthreads that I've encountered several times. I'm mostly concerned with the current state of affairs (i.e., C99 and POSIX.1-2008 or later), but any interesting historical information is of course interesting as well.

The question basically boils down to whether b will always take the same value as a in the following code:

long int a = /* some valid value */
void *ptr = (void *)a;
long int b = (long int)ptr;

I am aware that this usually works, but the question is whether it is a proper thing to do (i.e., does the C99 and/or POSIX standards guarantee that it will work).

When it comes to C99 it seems it does not, we have 6.3.2.3:

5 An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might not be
correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation.56)

6 Any pointer type may be
converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented
in the integer type, the behavior is undefined. The result need not be
in the range of values of any integer type.

Even using intptr_t the standard seems to only guarantee that any valid void* can be converted to intptr_t and back again, but it does not guarantee that any intptr_t can be converted to void* and back again.

However it is still possible that the POSIX standard allows this.

I have no great desire to use a void* as a storage space for any variable (I find it pretty ugly even if POSIX should allow it), but I feel I have to ask because of the common example use of the pthreads_create function where the argument to start_routine is an integer, and it is passed in as void* and converted to int or long int in the start_routine function. For example this manpage has such an example (see link for full code):

//Last argument casts int to void *
pthread_create(&tid[i], NULL, sleeping, (void *)SLEEP_TIME);
/* ... */
void * sleeping(void *arg){
    //Casting void * back to int
    int sleep_time = (int)arg;
    /* ... */
}

I've also seen a similar example in a textbook (An Introduction to Parallel Programming by Peter S. Pacheco). Considering that it seems to be a common example used by people who should know this stuff much better than me, I'm wondering if I'm wrong and this is actually a safe and portable thing to be doing.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

笔芯 2024-12-17 22:17:31

正如您所说,C99 不保证任何整数类型都可以转换为 void* 并再次转换回来,而不会丢失信息。它确实为 中定义的 intptr_tuintptr_t 提供了类似的保证,但这些类型是可选的。 (保证 void* 可以转换为 {u,}intptr_t 并转换回来,而不会丢失信息;对于任意整数值没有这样的保证。)

POSIX 不似乎也没有做出任何此类保证。

的 POSIX 描述要求 intunsigned int 至少为 32 位。这超出了 C99 要求它们至少为 16 位的要求。 (实际上,要求是范围,而不是大小,但效果是 intunsigned int 必须至少为 32(在 POSIX 下)或 16(在C99) 位,因为 C99 需要二进制表示。)

的 POSIX 描述表示 intptr_tuintptr_t 必须至少为 16 位,这与 C 标准的要求相同。由于 void* 可以转换为 intptr_t 并再次转换回来,而不会丢失信息,这意味着 void* 可能小至 16 位。将其与 int 至少为 32 位的 POSIX 要求(以及 long 至少为 32 位的 POSIX 和 C 要求)相结合,并且有可能 void* 只是不够大,无法容纳 intlong 值而不丢失信息。

pthread_create() 的 POSIX 描述与此并不矛盾。它只是说 argpthread_create()void* 第四个参数)被传递给 start_routine() 。据推测,其意图是 arg 指向 start_routine() 可以使用的一些数据。 POSIX 没有显示 arg 用法的示例。

您可以在此处查看 POSIX 标准;您必须创建一个免费帐户才能访问它。

As you say, C99 doesn't guarantee that any integer type may be converted to void* and back again without loss of information. It does make a similar guarantee for intptr_t and uintptr_t defined in <stdint.h>, but those types are optional. (The guarantee is that a void* may be converted to {u,}intptr_t and back without loss of information; there's no such guarantee for arbitrary integer values.)

POSIX doesn't appear to make any such guarantee either.

The POSIX description of <limits.h> requires int and unsigned int to be at least 32 bits. This exceeds the C99 requirement that they be at least 16 bits. (Actually, the requirements are in terms of ranges, not sizes, but the effect is that int and unsigned int must be at least 32 (under POSIX) or 16 (under C99) bits, since C99 requires a binary representation.)

The POSIX description of <stdint.h> says that intptr_t and uintptr_t must be at least 16 bits, the same requirement imposed by the C standard. Since void* can be converted to intptr_t and back again without loss of information, this implies that void* may be as small as 16 bits. Combine that with the POSIX requirement that int is at least 32 bits (and the POSIX and C requirement that long is at least 32 bits), and it's possible that a void* just isn't big enough to hold an int or long value without loss of information.

The POSIX description of pthread_create() doesn't contradict this. It merely says that arg (the void* 4th argument to pthread_create()) is passed to start_routine(). Presumably the intent is that arg points to some data that start_routine() can use. POSIX has no examples showing the usage of arg.

You can see the POSIX standard here; you have to create a free account to access it.

天暗了我发光 2024-12-17 22:17:31

到目前为止,答案的焦点似乎是指针的宽度,事实上,正如@Nico 指出的那样(@Quantumboredom 也在评论中指出),intptr_t 可能是比指针宽。 @Kevin的回答暗示了另一个重要问题,但没有完全描述它。

另外,虽然我不确定标准中的确切段落,但 Harbison & Steele 指出 intptr_tuintptr_t 也是可选类型,甚至可能不存在于有效的 C99 实现中。 OpenGroup 表示符合 XSI 的系统必须支持这两种类型,但这意味着普通 POSIX 因此不需要它们(至少从 2003 版开始)。

但这里真正被忽略的部分是,指针并不总是需要具有与整数的内部表示相匹配的简单数字表示。一直如此(自 K&R 1978 以来),而且我很确定 POSIX 也很小心,不否认这种可能性。

因此,C99 确实要求可以将指针转换为该类型存在的 intptr_t IFF,然后再次转换回指针,以便新指针仍指向与旧指针位于内存中的同一对象上,实际上,如果指针具有非整数表示形式,则意味着存在一种算法,可以将一组特定的整数值转换为有效指针。然而,这也意味着并非 INTPTR_MININTPTR_MAX 之间的所有整数都一定是有效的指针值,即使 intptr_t 的宽度(并且/或uintptr_t)与指针的宽度完全相同

因此,标准不能保证任何 intptr_tuintptr_t 可以转换为指针并返回到相同的整数值,甚至不能保证哪一组整数值可以在此类转换中幸存,因为它们不可能定义将整数值转换为指针值的所有可能的规则和算法。即使对于所有已知的架构,这样做仍然可能妨碍该标准对尚未发明的新型架构的适用性。

The focus in answers so far seems to be on the width of a pointer, and indeed as @Nico points out (and @Quantumboredom also points out in a comment), there is a possibility that intptr_t may be wider than a pointer. @Kevin's answer hints at the other important issue, but doesn't completely describe it.

Also, though I'm not sure of the exact paragraph in the standard, Harbison & Steele point out that intptr_t and uintptr_t are optional types too and may not even exist in a valid C99 implementation. OpenGroup says that XSI-conformant systems must support both types, but that means plain POSIX therefore does does not require them (at least as of the 2003 edition).

The part that's really been missed here though is that pointers need not always have a simple numerical representation that matches the internal representation of an integer. This has always been so (since K&R 1978), and I'm pretty sure POSIX is careful not to overrule this possibility either.

So, C99 does require that it be possible to convert a pointer to an intptr_t IFF that type exists, and then back to a pointer again such that the new pointer will still point at the same object in memory as the old pointer, and indeed if pointers have a non-integer representation this implies that an algorithm exists which can convert a a specific set of integer values into valid pointers. However this also means that not all integers between INTPTR_MIN and INTPTR_MAX are necessarily valid pointer values, even if the width of intptr_t (and/or uintptr_t) is exactly the same as the width of a pointer.

So, the standards cannot guarantee that any intptr_t or uintptr_t can be converted to a pointer and back to the same integer value, or even which set of integer values can survive such conversion, because they cannot possibly define all of the possible rules and algorithms for converting integer values into pointer values. Doing so even for all known architectures could still prevent the applicability of the standard to novel types of architectures yet to be invented.

秉烛思 2024-12-17 22:17:31

(u)intptr_t 仅保证足够大以容纳指针,但它们也可能“更大”,这就是为什么 C99 标准仅保证 (void*)->(u)intptr_t->(void* ),但在其他情况下可能会发生数据丢失(并且被认为是未定义的)。

(u)intptr_t are only guarateed to be large enough to hold a pointer, but they may also be "larger", which is why the C99 standard only guarantees (void*)->(u)intptr_t->(void*), but in the other case loss of data may occur (and is considered undefined).

洒一地阳光 2024-12-17 22:17:31

不确定你所说的“总是”是什么意思。标准中没有写到这是可以的,但没有系统会失败。

如果您的整数非常小(例如限制为 16 位),您可以通过声明使其严格符合:

static const char dummy_base[65535];

然后传递 dummy_base+i 作为参数并将其恢复为 i=(char *) start_arg-dummy_base;

Not sure what you mean by "always". It's not written anywhere in the standard that this is okay, but there are no systems it fails on.

If your integers are really small (say limited to 16bit) you can make it strictly conforming by declaring:

static const char dummy_base[65535];

and then passing dummy_base+i as the argument and recovering it as i=(char *)start_arg-dummy_base;

故事灯 2024-12-17 22:17:31

我想你的答案就在你引用的文字中:

如果结果不能用整数类型表示,则行为未定义。结果不必在任何整数类型的值范围内。

所以,不一定。假设您有一个 64 位 long 并将其转换为 32 位计算机上的 void*。指针可能是 32 位,因此要么丢失前 32 位,要么取回 INT_MAX。或者,可能是完全不同的东西(未定义,正如标准所说)。

I think your answer is in the text you quoted:

If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

So, not necessarily. Say you had a 64-bit long and cast it to a void* on a 32-bit machine. The pointer is likely 32 bits, so either you lose the top 32 bits or get INT_MAX back. Or, potentially, something else entirely (undefined, as the standard says).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文