指针(地址)可以为负数吗?
我有一个函数,我希望能够为失败和未初始化返回特殊值(它返回成功时的指针)。
目前,它返回 NULL 表示失败,返回 -1
表示未初始化,这似乎有效......但我可能会欺骗系统。 IIRC,地址总是正数,不是吗? (尽管由于编译器允许我将地址设置为-1,这看起来很奇怪)。
[更新]
我的另一个想法(如果 -1 有风险)是在全局范围内 malloc
一个 char @
,并使用该地址作为哨兵。
I have a function that I would like to be able to return special values for failure and uninitialized (it returns a pointer on success).
Currently it returns NULL
for failure, and -1
for uninitialized, and this seems to work... but I could be cheating the system. IIRC, addresses are always positive, are they not? (although since the compiler is allowing me to set an address to -1, this seems strange).
[update]
Another idea I had (in the event that -1 was risky) is to malloc
a char @
the global scope, and use that address as a sentinel.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
正数或负数不是指针类型的有意义的方面。它们与有符号整数有关,包括有符号 char、short、int 等。
人们谈论负指针主要是在将指针的机器表示视为整数类型的情况下。例如
reinterpret_cast(ptr)
。在这种情况下,他们实际上是在谈论强制转换的整数,而不是指针本身。在某些情况下,我认为指针本质上是无符号的,我们在下面或上面的术语中讨论地址。
0xFFFF.FFFF
在0x0AAAA.0000
之上,这是人类直观的。虽然0xFFFF.FFFF
实际上是“负数”,而0x0AAA.0000
是正数。但在其他场景中,例如指针减法
(ptr1 - ptr2)
会产生类型为ptrdiff_t
的有符号值,与整数减法比较时会不一致,signed_int_a -signed_int_b
生成有符号 int 类型,unsigned_int_a - unsigned_int_b
生成无符号类型。但对于指针减法,它会产生一个有符号类型,因为语义是两个指针之间的距离,单位是元素数。总之,我建议将指针类型视为独立类型,每种类型都有其一组操作。对于指针(不包括函数指针、成员函数指针和
void *
):+
、+=
ptr + 任何_整数_类型
-
、-=
ptr - 任意整数类型
ptr1 - ptr2
++
前缀和后缀--
前缀和后缀注意,指针没有
/ * %
操作。还支持指针应被视为独立类型,而不是“类似于 int 的类型”或“基础类型为 int 的类型,因此它应该看起来像 int”。Positive or negative is not a meaningful facet of pointer type. They pertain to signed integer including signed char, short, int etc.
People talk about negative pointer mostly in a situation that treats pointer's machine representation as an integer type. e.g.
reinterpret_cast<intptr_t>(ptr)
. In this case, they are actually talking about the cast integer, not the pointer itself.In some scenario I think pointer is inherently unsigned, we talk about address in terms below or above.
0xFFFF.FFFF
is above0x0AAAA.0000
, which is intuitively for human beings. Although0xFFFF.FFFF
is actually a "negative" while0x0AAA.0000
is positive.But in other scenarios such as pointer subtraction
(ptr1 - ptr2)
that results in a signed value whose type isptrdiff_t
, it's inconsistent when you compare with integer's subtraction,signed_int_a - signed_int_b
results in a signed int type,unsigned_int_a - unsigned_int_b
produces an unsigned type. But for pointer subtraction, it produces a signed type, because the semantic is the distance between two pointers, the unit is number of elements.In summary I suggest treating pointer type as standalone type, every type has it's set of operation on it. For pointers (excluding function pointer, member function pointer, and
void *
):+
,+=
ptr + any_integer_type
-
,-=
ptr - any_integer_type
ptr1 - ptr2
++
both prefix and postfix--
both prefix and postfixNote there are no
/ * %
operations for pointer. That's also supported that pointer should be treated as a standalone type, instead of "A type similar to int" or "A type whose underlying type is int so it should looks like int".在这种情况下,
NULL
是唯一有效的错误返回,任何时候返回无符号值(例如指针)时都是如此。在某些情况下,指针可能不够大,无法使用符号位作为数据位,但是由于指针是由操作系统而不是程序控制的,因此我不会依赖这种行为。请记住,指针基本上是一个 32 位值;这是否是可能的负数或始终为正数只是一个解释问题(即,第 32 位是否被解释为符号位或数据位)。因此,如果将 0xFFFFFFF 解释为有符号数,则为 -1;如果将其解释为无符号数,则为 4294967295。从技术上讲,指针不太可能有这么大,但无论如何都应考虑这种情况。
作为替代方案,您可以使用额外的 out 参数(对于所有失败返回 NULL),但这将要求客户端创建并传递一个值,即使他们不这样做不需要区分特定的错误。
另一种选择是使用
GetLastError
/SetLastError
机制来提供额外的错误信息(这将特定于 Windows,不知道这是否是一个问题) ,或者在错误时抛出异常。NULL
is the only valid error return in this case, this is true anytime an unsigned value such as a pointer is returned. It may be true that in some cases pointers will not be large enough to use the sign bit as a data bit, however since pointers are controlled by the OS not the program I would not rely on this behavior.Remember that a pointer is basically a 32-bit value; whether or not this is a possible negative or always positive number is just a matter of interpretation (i.e.) whether the 32nd bit is interpreted as the sign bit or as a data bit. So if you interpreted 0xFFFFFFF as a signed number it would be -1, if you interpreted it as an unsigned number it would be 4294967295. Technically, it is unlikely that a pointer would ever be this large, but this case should be considered anyway.
As far as an alternative you could use an additional out parameter (returning NULL for all failures), however this would require clients to create and pass a value even if they don't need to distinguish between specific errors.
Another alternative would be to use the
GetLastError
/SetLastError
mechanism to provide additional error information (This would be specific to Windows, don't know if that is an issue or not), or to throw an exception on error instead.您不需要关心指针的符号,因为它是实现定义的。这里真正的问题是“如何从返回指针的函数返回特殊值?”,我在问题的回答中详细解释了各种平台上的指针地址范围
总之,全一位模式(-1)(几乎)总是安全的,因为它已经位于频谱的末端,并且数据不能被存储在第一个地址周围,并且
malloc
系列永远不会返回 -1。事实上,许多 Linux 系统调用和 Win32 API 甚至返回该值来指示指针的另一种状态。因此,如果您只需要失败和未初始化,那么这是一个不错的选择,但是您可以利用变量必须正确对齐的事实返回更多错误状态(除非您指定了一些错误状态)其他选项)。例如,在指向 int32_t 的指针中,低 2 位始终为零,这意味着只有 1⁄4 的可能值是有效地址,剩下所有剩余的位模式供您使用。因此,一个简单的解决方案就是只检查最低位。
在这种情况下,您可以同时返回有效指针和一些附加数据。
您还可以使用高位在 64 位系统中存储数据。在 ARM 上,有一个标志告诉 CPU 忽略地址中的高位。在 x86 上没有类似的东西,但只要在取消引用之前使其规范化,您仍然可以使用这些位。请参阅在 64 位指针中使用额外的 16 位
另请参阅
You don't need to care about the signness of a pointer, because it's implementation defined. The real question here is "how to return special values from a function returning pointer?" which I've explained in detail in my answer to the question Pointer address span on various platforms
In summary, the all-one bit pattern (-1) is (almost) always safe, because it's already at the end of the spectrum and data cannot be stored wrapped around to the first address, and the
malloc
family never returns -1. In fact this value is even returned by many Linux system calls and Win32 APIs to indicate another state for the pointer. So if you need just failure and uninitialized then it's a good choiceBut you can return far more error states by utilizing the fact that variables must be aligned properly (unless you specified some other options). For example in a pointer to
int32_t
the low 2 bits are always zero which means only ¹⁄₄ of the possible values are valid addresses, leaving all of the remaining bit patterns for you to use. So a simple solution would be just checking the lowest bitIn this case you can return both a valid pointer and some additional data at the same time
You can also use the high bits for storing data in 64-bit systems. On ARM there's a flag that tells the CPU to ignore the high bits in the addresses. On x86 there isn't a similar thing but you can still use those bits as long as you make it canonical before dereferencing. See Using the extra 16 bits in 64-bit pointers
See also
请勿将
malloc
用于此目的。它可能会占用不必要的内存(例如,当调用 malloc 且哨兵在高地址分配时,大量内存已在使用中),并且会使内存调试器/泄漏检测器感到困惑。相反,只需返回一个指向本地static const char
对象的指针。该指针永远不会与程序以任何其他方式获得的任何指针进行比较,并且它只浪费一个字节的 bss。Do not use
malloc
for this purpose. It might keep unnecessary memory tied up (if a lot of memory is already in use whenmalloc
gets called and the sentinel gets allocated at a high address, for example) and it confuses memory debuggers/leak detectors. Instead simply return a pointer to a localstatic const char
object. This pointer will never compare equal to any pointer the program could obtain in any other way, and it only wastes one byte of bss.实际上,(至少在 x86 上),NULL 指针异常不仅是由取消引用 NULL 指针生成的,而且是由更大范围的地址(例如,第一个 65kb)生成的。这有助于捕获此类错误,
因此,在取消引用时,可以保证有更多地址生成 NULL 指针异常。
现在考虑这段代码(可供 AndreyT 编译):
这在某些情况下可能很有用。
它不适用于某些哈佛架构,但适用于冯诺依曼架构。
Actually, (at least on x86), the NULL-pointer exception is generated not only by dereferencing the NULL pointer, but by a larger range of addresses (eg, first 65kb). This helps catching such errors as
So, there are more addresses that are garanteed to generate the NULL pointer exception when dereferenced.
Now consider this code (made compilable for AndreyT):
this could be useful in some cases.
It won't work on some Harvard architectures, but will work on von Neumann ones.
詹姆斯的回答可能是正确的,但当然描述了一个实现选择,而不是您可以做出的选择。
就我个人而言,我认为地址“直观地”是未签名的。找到一个比较小于空指针的指针似乎是错误的。但是
~0
和-1
对于相同的整数类型,给出相同的值。如果它直观上是无符号的,~0
可能会产生更直观的特殊情况值 - 我经常将它用于错误情况下的无符号整数。它并没有真正不同(默认情况下零是一个int,所以~0
是-1
,直到你转换它),但它看起来不同。顺便说一句,32 位系统上的指针可以使用所有 32 位,尽管
-1
或~0
对于真正的指针来说是极不可能出现的。实际分配。还有特定于平台的规则 - 例如在 32 位 Windows 上,一个进程只能有 2GB 地址空间,并且有很多代码将某种标志编码到指针的最高位中(例如用于平衡平衡二叉树中的标志)。James answer is probably correct, but of course describes an implementation choice, not a choice that you can make.
Personally, I think addresses are "intuitively" unsigned. Finding a pointer that compares as less-than a null pointer would seem wrong. But
~0
and-1
, for the same integer type, give the same value. If it's intuitively unsigned,~0
may make a more intuitive special-case value - I use it for error-case unsigned ints quite a lot. It's not really different (zero is an int by default, so~0
is-1
until you cast it) but it looks different.Pointers on 32-bit systems can use all 32 bits BTW, though
-1
or~0
is an extremely unlikely pointer to occur for a genuine allocation in practice. There are also platform-specific rules - for example on 32-bit Windows, a process can only have a 2GB address space, and there's a lot of code around that encodes some kind of flag into the top bit of a pointer (e.g. for balancing flags in balanced binary trees).@James 当然是正确的,但我想补充一点,指针并不总是代表绝对内存地址,理论上它总是正数。指针还表示内存中某个点的相对地址,通常是堆栈或帧指针,并且这些指针可以是正数,也可以是负数。
因此,最好的选择是让函数接受指向指针的指针作为参数,并在成功时用有效的指针值填充该指针,同时从实际函数返回结果代码。
@James is correct, of course, but I'd like to add that pointers don't always represent absolute memory addresses, which theoretically would always be positive. Pointers also represent relative addresses to some point in memory, often a stack or frame pointer, and those can be both positive and negative.
So your best bet is to have your function accept a pointer to a pointer as a parameter and fill that pointer with a valid pointer value on success while returning a result code from the actual function.
失败和统一有什么区别。如果统一化不是另一种失败,那么您可能需要重新设计接口以分离这两种情况。
可能最好的方法是通过参数返回结果,因此返回值仅指示错误。例如,您可以在其中写入:
将其更改为
What's the difference between failure and unitialized. If unitialized is not another kind of failure, then you probably want to redesign the interface to separate these two conditions.
Probably the best way to do this is to return the result through a parameter, so the return value only indicates an error. For example where you would write:
Change this to
指针可以为负数,就像无符号整数可以为负数一样。也就是说,当然,在补码解释中,您可以将数值解释为负数,因为最高有效位已打开。
Pointers can be negative like an unsigned integer can be negative. That is, sure, in a two's-complement interpretation, you could interpret the numerical value to be negative because the most-significant-bit is on.
C 语言没有为指针定义“否定性”的概念。 “为负”的属性主要是算术属性,无论如何都不适用于指针类型的值。
如果您有一个返回指针的函数,那么您无法从该函数中有意义地返回
-1
的值。在 C 语言中,整数值(零除外)不能隐式转换为指针类型。尝试从指针返回函数返回-1
是立即违反约束,将导致诊断消息。简而言之,这是一个错误。如果您的编译器允许它,则仅意味着它不会过于严格地强制执行该约束(大多数情况下,它们这样做是为了与标准前的代码兼容)。如果通过显式强制转换将
-1
的值强制转换为指针类型,则转换的结果将由实现定义。语言本身对此不做任何保证。它可能很容易被证明与其他一些有效的指针值相同。如果您想创建保留指针值,则无需进行任何
malloc
操作。您可以简单地声明所需类型的全局变量并将其地址用作保留值。它保证是唯一的。The C language does not define the notion of "negativity" for pointers. The property of "being negative" is a chiefly arithmetical one, not in any way applicable to values of pointer type.
If you have a pointer-returning function, then you cannot meaningfully return the value of
-1
from that function. In C language integral values (other than zero) are not implicitly convertible to pointer types. An attempt to return-1
from a pointer-returning function is an immediate constraint violation that will result in diagnostic message. In short, it is an error. If your compiler allows it, it simply means that it doesn't enforce that constraint too strictly (most of the time they do it for compatibility with pre-standard code).If you force the value of
-1
to pointer type by an explicit cast, the result of the cast will be implementation-defined. The language itself makes no guarantees about it. It might easily prove to be the same as some other, valid pointer value.If you want to create a reserved pointer value, there no need to
malloc
anything. You can simple declare a global variable of the desired type and use its address as the reserved value. It is guaranteed to be unique.尝试将特殊值复用到返回值上通常是一个糟糕的设计......您试图用单个值做太多事情。通过参数返回“成功指针”而不是返回值会更干净。这会在返回值中为您想要描述的所有条件留下大量不冲突的空间:
您还应该进行典型的参数检查(确保“p”不为 NULL)。
It's generally a bad design to try to multiplex special values onto a return value... you're trying to do too much with a single value. It would be cleaner to return your "success pointer" via argument, rather than the return value. That leaves lots of non-conflicting space in the return value for all of the conditions you want to describe:
You should also do typical argument checking (ensure that 'p' isn't NULL).
指针的有效值完全取决于实现,因此,是的,指针地址可能为负数。
然而,更重要的是,请考虑(作为可能的实现选择的示例)您位于具有 32 位指针大小的 32 位平台上的情况。任何可以由该 32 位值表示的值都可能是有效的指针。除了空指针之外,任何指针值都可能是指向对象的有效指针。
对于您的特定用例,您应该考虑返回状态代码,并可能将指针作为函数的参数。
The valid values for a pointer are entirely implementation-dependent, so, yes, a pointer address could be negative.
More importantly, however, consider (as an example of a possible implementation choice) the case where you are on a 32-bit platform with a 32-bit pointer size. Any value that can be represented by that 32-bit value might be a valid pointer. Other than the null pointer, any pointer value might be a valid pointer to an object.
For your specific use case, you should consider returning a status code and perhaps taking the pointer as a parameter to the function.
不,地址并不总是正数 - 在 x86_64 上,指针是符号扩展的,并且地址空间对称地聚集在 0 周围(尽管“负”地址通常是内核地址)。
然而这一点大多没有实际意义,因为 C 只定义了指向同一对象的一部分或过去的指针之间的
<
和>
指针比较的含义。数组的末尾。指向完全不同对象的指针除了精确相等之外无法进行有意义的比较,至少在标准 C 中是这样 -if (p < NULL)
没有明确定义的语义。您应该创建一个具有静态存储持续时间的虚拟对象,并将其地址用作您的
未初始化
值:保证在您的程序中拥有一个唯一的地址。
No, addresses aren't always positive - on x86_64, pointers are sign-extended and the address space is clustered symmetrically around 0 (though it is usual for the "negative" addresses to be kernel addresses).
However the point is mostly moot, since C only defines the meaning of
<
and>
pointer comparisons between pointers that are to part of the same object, or one past the end of an array. Pointers to completely different objects cannot be meaningfully compared other than for exact equality, at least in standard C -if (p < NULL)
has no well defined semantics.You should create a dummy object with static storage duration and use its address as your
unintialised
value:It's guaranteed to have a single, unique address across your program.