通过下标获取末尾一位数组元素的地址:在 C++ 中合法 标准还是不标准?
我已经多次看到它断言 C++ 标准不允许使用以下代码:
int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];
在此上下文中 &array[5]
合法的 C++ 代码吗?
如果可能的话,我希望得到一个参考标准的答案。
了解它是否符合 C 标准也很有趣。 如果它不是标准 C++,为什么决定将其与 array + 5
或 &array[4] + 1
区别对待?
I have seen it asserted several times now that the following code is not allowed by the C++ Standard:
int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];
Is &array[5]
legal C++ code in this context?
I would like an answer with a reference to the Standard if possible.
It would also be interesting to know if it meets the C standard. And if it isn't standard C++, why was the decision made to treat it differently from array + 5
or &array[4] + 1
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
是的,这是合法的。 来自 C99 标准草案:
§6.5。 2.1,第 2 段:
§6.5.3.2,第 3 段(强调我的):
§6.5.6,第 8 段:
请注意,标准明确允许指针指向超出数组末尾的一个元素,前提是它们没有被取消引用。 到 6.5.2.1 和 6.5.3.2,表达式
&array[5]
等价于&*(array + 5)
,相当于(array+5)
,它指向数组末尾之后的一个。 这不会导致取消引用(根据 6.5.3.2),因此它是合法的。Yes, it's legal. From the C99 draft standard:
§6.5.2.1, paragraph 2:
§6.5.3.2, paragraph 3 (emphasis mine):
§6.5.6, paragraph 8:
Note that the standard explicitly allows pointers to point one element past the end of the array, provided that they are not dereferenced. By 6.5.2.1 and 6.5.3.2, the expression
&array[5]
is equivalent to&*(array + 5)
, which is equivalent to(array+5)
, which points one past the end of the array. This does not result in a dereference (by 6.5.3.2), so it is legal.您的示例是合法的,但只是因为您实际上并未使用越界指针。
让我们首先处理越界指针(因为这就是我最初解释你的问题的方式,在我注意到该示例使用了一个过去的指针之前):
一般来说,你甚至不允许 创建一个越界指针。 指针必须指向数组中的一个元素,或者指向末尾的元素。 无处。
该指针甚至不允许存在,这意味着您显然也不允许取消引用它。
以下是该标准关于该主题的规定:
5.7:5:
(强调我的)
当然,这是针对operator+的。 所以为了确定一下,以下是标准对数组下标的规定:
5.2.1:1:
当然,有一个明显的警告:您的示例不实际上显示了一个越界指针。 它使用“一过末尾”指针,这是不同的。 指针是允许存在的(如上所述),但据我所知,标准没有提到取消引用它。 我能找到的最接近的是 3.9.2:3:
在我看来,这意味着是的,您可以合法地取消引用它,但读取或写入该位置的结果未指定。
感谢 ilproxyil 更正了这里的最后一点,回答了问题的最后部分:
array + 5
实际上并不取消引用任何东西,它只是
创建一个指向末尾的指针
数组。
&array[4] + 1
取消引用array+4
(这是完全安全的),获取该左值的地址,并且
该地址加一,即
结果是一个尾数指针
(但是那个指针永远不会得到
取消引用。
&array[5]
取消引用 array+5(据我所知这是合法的,
并导致“一个不相关的对象
数组的元素类型”,作为
上面说的),然后取
该元素的地址,也
似乎足够合法。
所以他们不会做完全相同的事情,尽管在这种情况下,最终结果是相同的。
Your example is legal, but only because you're not actually using an out of bounds pointer.
Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):
In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.
The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.
Here's what the standard has to say on the subject:
5.7:5:
(emphasis mine)
Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:
5.2.1:1:
Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:
Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.
Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:
array + 5
doesn't actuallydereference anything, it simply
creates a pointer to one past the end
of
array
.&array[4] + 1
dereferencesarray+4
(which is perfectly safe),takes the address of that lvalue, and
adds one to that address, which
results in a one-past-the-end pointer
(but that pointer never gets
dereferenced.
&array[5]
dereferences array+5(which as far as I can see is legal,
and results in "an unrelated object
of the array’s element type", as the
above said), and then takes the
address of that element, which also
seems legal enough.
So they don't do quite the same thing, although in this case, the end result is the same.
它是合法的。
根据 C++ 的 gcc 文档,
&array[ 5]
是合法的。 在 C++ 和 C 中,您可以安全地对数组末尾后面的元素进行寻址- 您将得到一个有效的指针。 所以&array[5]
作为表达式是合法的。但是,尝试取消引用指向未分配内存的指针仍然是未定义的行为,即使指针指向有效地址。 因此,即使指针本身有效,尝试取消引用该表达式生成的指针仍然是未定义的行为(即非法)。
但实际上,我想它通常不会导致崩溃。
编辑:顺便说一句,这通常是 STL 容器的 end() 迭代器的实现方式(作为指向末尾一位的指针),因此这是对这种做法合法性的一个很好的证明。
编辑:哦,现在我明白你并不是真的在问持有指向该地址的指针是否合法,而是在问获取指针的确切方式是否合法。 我会听从其他回答者的意见。
It is legal.
According to the gcc documentation for C++,
&array[5]
is legal. In both C++ and in C you may safely address the element one past the end of an array - you will get a valid pointer. So&array[5]
as an expression is legal.However, it is still undefined behavior to attempt to dereference pointers to unallocated memory, even if the pointer points to a valid address. So attempting to dereference the pointer generated by that expression is still undefined behavior (i.e. illegal) even though the pointer itself is valid.
In practice, I imagine it would usually not cause a crash, though.
Edit: By the way, this is generally how the end() iterator for STL containers is implemented (as a pointer to one-past-the-end), so that's a pretty good testament to the practice being legal.
Edit: Oh, now I see you're not really asking if holding a pointer to that address is legal, but if that exact way of obtaining the pointer is legal. I'll defer to the other answerers on that.
我相信这是合法的,并且它取决于发生的“左值到右值”转换。 最后一行核心问题 232 具有以下内容:
虽然这个例子略有不同,但它确实表明 '*' 不会导致左值到右值的转换,因此,假设表达式是 ' 的直接操作数&' 它需要一个左值,然后定义行为。
I believe that this is legal, and it depends on the 'lvalue to rvalue' conversion taking place. The last line Core issue 232 has the following:
Although this is slightly different example, what it does show is that the '*' does not result in lvalue to rvalue conversion and so, given that the expression is the immediate operand of '&' which expects an lvalue then the behaviour is defined.
我不认为这是非法的,但我确实认为 &array[5] 的行为是未定义的。
5.2.1 [expr.sub] E1[E2] 与 *((E1)+(E2)) 相同(根据定义)
5.3.1 [expr.unary.op] 一元 * 运算符 ... 结果是引用表达式指向的对象或函数的左值.
此时,您的行为未定义,因为表达式 ((E1)+(E2)) 实际上并未指向对象,并且标准确实说明了结果应该是什么,除非它确实指向该对象。
如其他地方所述,
array + 5
和&array[0] + 5
是获取超出数组末尾的指针的有效且定义良好的方法。I don't believe that it is illegal, but I do believe that the behaviour of &array[5] is undefined.
5.2.1 [expr.sub] E1[E2] is identical (by definition) to *((E1)+(E2))
5.3.1 [expr.unary.op] unary * operator ... the result is an lvalue referring to the object or function to which the expression points.
At this point you have undefined behaviour because the expression ((E1)+(E2)) didn't actually point to an object and the standard does say what the result should be unless it does.
As noted elsewhere,
array + 5
and&array[0] + 5
are valid and well defined ways of obtaining a pointer one beyond the end of array.除了上面的答案之外,我还要指出operator& 可以被类覆盖。 因此,即使它对 POD 有效,但对您知道无效的对象执行此操作可能不是一个好主意(很像首先重写运算符&())。
In addition to the above answers, I'll point out operator& can be overridden for classes. So even if it was valid for PODs, it probably isn't a good idea to do for an object you know isn't valid (much like overriding operator&() in the first place).
这是合法的:
因此我们可以说 array_end 也是等价的:
上面的重要部分:
一元运算符“*”返回引用 int 的左值(无取消引用)。 一元运算符 '&' 然后获取左值的地址。
只要没有取消对越界指针的引用,那么该操作就完全被标准覆盖,并且所有行为都被定义。 所以根据我的阅读,上述内容是完全合法的。
事实上,许多 STL 算法依赖于明确定义的行为,这在某种程度上暗示标准委员会已经考虑到了这一点,并且我确信有一些东西明确地涵盖了这一点。
下面的评论部分提出了两个论点:(
请阅读:但它很长,我们俩最终都会恶搞)
论点 1
这是非法的,因为第 5.7 节第 5 段
尽管该部分是相关的; 它不显示未定义的行为。 我们讨论的数组中的所有元素要么在数组内,要么在数组末尾(上一段已经很好地定义了)。
参数 2:
下面介绍的第二个参数是:
*
是取消引用运算符。尽管这是用于描述“*”运算符的常用术语; 该术语在标准中被故意避免,因为术语“取消引用”在语言方面以及它对底层硬件的含义没有明确定义。
尽管访问超出数组末尾的内存肯定是未定义的行为。 我不相信
一元 * 运算符
在这种情况下访问内存(读/写内存)(不是以标准定义的方式)。 在此上下文中(如标准所定义(参见 5.3.1.1)),一元 * 运算符
返回一个引用对象的左值
。 根据我对语言的理解,这并不是对底层内存的访问。 该表达式的结果立即被一元 & 使用。 运算符 运算符,返回引用对象的左值所引用的对象的地址
。还提供了对维基百科和非规范来源的许多其他参考。 所有这些我都觉得无关紧要。 C++ 由标准定义。
结论:
我愿意承认该标准的许多部分我可能没有考虑到,并且可能证明我的上述论点是错误的。 下面提供了NON。 如果你给我看一个标准参考,表明这是 UB。 我会
这不是一个论点:
This is legal:
So by this we can say that array_end is equivalent too:
The important part of the above:
The unary operator '*' is returning a lvalue referring to the int (no de-refeference). The unary operator '&' then gets the address of the lvalue.
As long as there is no de-referencing of an out of bounds pointer then the operation is fully covered by the standard and all behavior is defined. So by my reading the above is completely legal.
The fact that a lot of the STL algorithms depend on the behavior being well defined, is a sort of hint that the standards committee has already though of this and I am sure there is a something that covers this explicitly.
The comment section below presents two arguments:
(please read: but it is long and both of us end up trollish)
Argument 1
this is illegal because of section 5.7 paragraph 5
And though the section is relevant; it does not show undefined behavior. All the elements in the array we are talking about are either within the array or one past the end (which is well defined by the above paragraph).
Argument 2:
The second argument presented below is:
*
is the de-reference operator.And though this is a common term used to describe the '*' operator; this term is deliberately avoided in the standard as the term 'de-reference' is not well defined in terms of the language and what that means to the underlying hardware.
Though accessing the memory one beyond the end of the array is definitely undefined behavior. I am not convinced the
unary * operator
accesses the memory (reads/writes to memory) in this context (not in a way the standard defines). In this context (as defined by the standard (see 5.3.1.1)) theunary * operator
returns alvalue referring to the object
. In my understanding of the language this is not access to the underlying memory. The result of this expression is then immediately used by theunary & operator
operator that returns the address of the object referred to by thelvalue referring to the object
.Many other references to Wikipedia and non canonical sources are presented. All of which I find irrelevant. C++ is defined by the standard.
Conclusion:
I am wiling to concede there are many parts of the standard that I may have not considered and may prove my above arguments wrong. NON are provided below. If you show me a standard reference that shows this is UB. I will
This is not an argument:
工作草案(n2798):
据我所知,数组 [5] 不是一个合格的 ID(列表位于第 87 页);最接近的似乎是标识符,但虽然数组是标识符数组[5] 不是左值,因为“左值指的是对象或函数。 " (p. 76)。 array[5] 显然不是一个函数,并且不能保证引用一个有效的对象(因为 array + 5 在最后分配的数组元素之后)。
显然,它在某些情况下可能有效,但它不是有效的 C++ 或安全
注:添加以获取数组后面的值是合法的(第 113 页):
但是使用 & 这样做是不合法的。
Working draft (n2798):
array[5] is not a qualified-id as best I can tell (the list is on p. 87); the closest would seem to be identifier, but while array is an identifier array[5] is not. It is not an lvalue because "An lvalue refers to an object or function. " (p. 76). array[5] is obviously not a function, and is not guaranteed to refer to a valid object (because array + 5 is after the last allocated array element).
Obviously, it may work in certain cases, but it's not valid C++ or safe.
Note: It is legal to add to get one past the array (p. 113):
But it is not legal to do so using &.
即使合法,为什么要背离惯例呢? 无论如何, array + 5 更短,而且在我看来,更具可读性。
编辑:如果你希望它是对称的,你可以写
Even if it is legal, why depart from convention? array + 5 is shorter anyway, and in my opinion, more readable.
Edit: If you want it to by symmetric you can write
它应该是未定义的行为,原因如下:
尝试访问越界元素会导致未定义的行为。 因此,该标准并不禁止在这种情况下抛出异常的实现(即在访问元素之前实现检查边界)。 如果<代码>& (array[size]) 被定义为
begin (array) + size
,在越界访问的情况下抛出异常的实现将不再符合标准。如果 array 不是数组而是任意集合类型,则不可能生成此生成
end (array)
。It should be undefined behaviour, for the following reasons:
Trying to access out-of-bounds elements results in undefined behaviour. Hence the standard does not forbid an implementation throwing an exception in that case (i.e. an implementation checking bounds before an element is accessed). If
& (array[size])
were defined to bebegin (array) + size
, an implementation throwing an exception in case of out-of-bound access would not conform to the standard anymore.It's impossible to make this yield
end (array)
if array is not an array but rather an arbitrary collection type.序言
这里有相当多的答案相当旧,并且引用了相对旧版本的 C++ 标准(或其草案)。 其他的都是基于C标准; C99 专门进行了修订,以使其合法并具有定义的行为,但这并不意味着 C++ 中也进行了匹配的更改。 看起来 C++ 标准中的文本随着时间的推移发生了一些变化,因此可能不清楚一些旧引用对于当前定义的 C++ 有何意义。
由于措辞随着时间的推移而发生变化,我将引用 C++ 标准的几个具体草案。 如果后来的草案再次修改措辞(这不会令我感到惊讶),则必须针对修改后的措辞再次分析问题。
N4835
因此,
array[5]
相当于*(array + 5)
。然后,我们尝试使用
&
运算符获取该表达式的地址。 定义如下(§[expr.unary.op]/3):这三种可能性中的第一种适用于类成员,因此这里无关紧要。
第二个适用于左值。 所以问题是 array + 5 是否是左值。 根据§[basic.lval]/1.1:
虽然我们可以在数组末尾形成一个地址,但该地址不能确定对象、位字段或函数的标识。 相关选项是“object”,但没有可以确定身份的对象1。 因此,当
array
定义了 N 个元素时,array + N
就不是左值。只剩下第三种选择:程序格式不正确。
N4944
N4944 的 §[expr.sub]/1 的措辞与 N4835 相同,因此这里不再引用。
在 N4944 中,有关
*
运算符的措辞略有变化。 它以 (§[expr.unary.op]/3) 开头:N4944 保留了相同的左值定义:
同样,指向数组末尾的指针不是左值,因此尝试对其应用
*
运算符的代码格式错误。结论
在最新版本的 C++ 标准中,类似以下的代码
格式不正确。
1. Well, it could happen that there's some object at that address, but if so it's an accidental coincidence. Nothing on the standard requires there to be an object that address.
Preamable
Quite a few of the answers here are fairly old, and quote relatively old versions of the C++ standard (or drafts thereof). Others are based on the C standard; C99 was revised specifically to make this legal, with defined behavior, but that doesn't mean a matching change was made in C++. It looks like the text in the C++ standard has changed somewhat over time, so it may be unclear how meaningful some of the older citations are for C++ as currently defined.
Since the wording has changed over time, I'm going to cite a couple of specific drafts of the C++ standard. If later drafts revise the wording again (which wouldn't surprise me) the issue would have to be analyzed again with respect to the revised wording.
N4835
So,
array[5]
is equivalent to*(array + 5)
.We then attempt to take the address of that expression using the
&
operator. This is defined as follows (§[expr.unary.op]/3):The first of these three possibilities applies to class members, so it's irrelevant here.
The second applies to an lvalue. So the question is whether
array + 5
is an lvalue or not. According to §[basic.lval]/1.1:While we can form an address one past the end of an array, that address does not determine the identity of an object, bit-field or function. The relevant option would be "object", but there is no object there whose identity it can determine1. As such, when
array
has been defined with N elements,array + N
is not an lvalue.That leaves only the third option: the program is ill-formed.
N4944
N4944 has identical wording for §[expr.sub]/1 as N4835, so I won't quote it again here.
In N4944 the wording with respect to the
*
operator has changed slightly. It starts with (§[expr.unary.op]/3):N4944 retains the same definition of an lvalue though:
As such, again, a pointer to one past the end of an array is not an lvalue, so code that attempts to apply the
*
operator to it is ill-formed.Conclusion
In recent versions of the C++ standard, code like:
...is ill formed.
1. Well, it could happen that there's some object at that address, but if so it's an accidental coincidence. Nothing on the standard requires there to be an object that address.
C++ 标准,5.19,第 4 段:
在我看来,
&array[5]
是合法的 C++,是一个地址常量表达式。C++ standard, 5.19, paragraph 4:
Looks to me like
&array[5]
is legal C++, being an address constant expression.