C++ 到底在哪里?标准说取消引用未初始化的指针是未定义的行为?

发布于 2024-10-04 09:24:44 字数 293 浏览 13 评论 0原文

到目前为止,我找不到如何推断以下内容:

int* ptr;
*ptr = 0;

是未定义的行为。

首先,5.3.1/1 规定 * 表示将 T* 转换为 T 的间接。但这并没有说明UB的任何事情。

然后经常引用3.7.3.2/4,说在非空指针上使用释放函数会导致指针无效,并且稍后使用无效指针是UB。但在上面的代码中没有任何关于释放的内容。

上面的代码中的UB是如何推导出来的呢?

So far I can't find how to deduce that the following:

int* ptr;
*ptr = 0;

is undefined behavior.

First of all, there's 5.3.1/1 that states that * means indirection which converts T* to T. But this doesn't say anything about UB.

Then there's often quoted 3.7.3.2/4 saying that using deallocation function on a non-null pointer renders the pointer invalid and later usage of the invalid pointer is UB. But in the code above there's nothing about deallocation.

How can UB be deduced in the code above?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

我ぃ本無心為│何有愛 2024-10-11 09:24:44

第 4.1 节看起来像是一个候选节(强调我的):

a 的左值 (3.10)
非函数、非数组类型 T 可以是
转换为右值。如果 T 是一个
不完全类型,一个程序
需要进行此转换的是
格式不正确。如果对象
左值引用的不是类型的对象
T 并且不是类型的对象
从 T 派生,或者如果对象是
未初始化的程序
需要进行此转换
未定义的行为
。如果 T 是一个
非类类型,右值的类型
是 T 的 cv-unqualified 版本。
否则,右值的类型为


我相信只要在规范中搜索“uninitial”就可以找到更多候选者。

Section 4.1 looks like a candidate (emphasis mine):

An lvalue (3.10) of a
non-function, non-array type T can be
converted to an rvalue. If T is an
incomplete type, a program that
necessitates this conversion is
ill-formed. If the object to which the
lvalue refers is not an object of type
T and is not an object of a type
derived from T, or if the object is
uninitialized
, a program that
necessitates this conversion has
undefined behavior
. If T is a
non-class type, the type of the rvalue
is the cv-unqualified version of T.
Otherwise, the type of the rvalue is
T.

I'm sure just searching on "uninitial" in the spec can find you more candidates.

何其悲哀 2024-10-11 09:24:44

OP的问题纯属无稽之谈。标准没有要求某些行为是未定义的,事实上,我认为所有这些措辞都应该从标准中删除,因为它会让人们感到困惑,并使标准变得比必要的更加冗长。

该标准定义了某些行为。问题是,它是否指定了这种情况下的任何行为?如果没有,则无论是否明确说明,该行为都是未定义的。

事实上,一些未定义的规范留在标准中主要是为了标准编写者的调试辅助,其想法是,如果一个地方的要求与另一个地方未定义行为的显式声明相冲突,就会产生矛盾:这是证明标准缺陷的一种方法。如果没有未定义行为的明确声明,规定行为的其他条款将是规范性的且不受质疑。

The OP's question is nonsense. There is no requirement that the Standard say certain behaviours are undefined, and indeed I would argue that all such wording be removed from the Standard because it confuses people and makes the Standard more verbose than necessary.

The Standard defines certain behaviour. The question is, does it specify any behaviour in this case? If it does not, the behaviour is undefined whether or not it says so explicitly.

In fact the specification that some things are undefined is left in the Standard primarily as a debugging aid for the Standards writers, the idea being to generate a contradiction if there is a requirement in one place which conflicts with an explicit statement of undefined behaviour in another: that's a way to prove a defect in the Standard. Without the explicit statement of undefined behaviour, the other clause prescribing behaviour would be normative and unchallenged.

染火枫林 2024-10-11 09:24:44

我发现这个问题的答案是C++标准草案的一个意想不到的角落24.2迭代器要求,特别是 24.2.1一般 段落 5 10 分别表示(强调我的):

[...][ 示例:在声明未初始化的指针 x(与 int* x; 一样)之后,x 必须始终假定为具有奇异值 a指针。 —结束示例] [...] 可解除引用的值始终是非奇异的。


并且:

无效迭代器是可能为单数的迭代器。268

和脚注 268 说:

此定义适用于指针,因为指针是迭代器。取消引用已失效的迭代器的效果未定义。

尽管看起来确实存在一些关于空指针是否是单数的争议或不,看起来术语奇异值需要以更通用的方式正确定义。

单数的意图似乎在缺陷报告278。迭代器有效性是什么意思?在基本原理部分说:

为什么我们说“可能是单数”,而不是“是单数”?这是因为有效迭代器是已知的非奇异迭代器。使迭代器无效意味着以不再已知它是非奇异的方式对其进行更改。举个例子:正确地说,将一个元素插入到向量的中间会使所有指向该向量的迭代器无效。这并不一定意味着它们都变得单一

因此,失效未初始化可能会创建一个奇异的值,但由于我们无法证明它们是 >非奇异我们必须假设它们是奇异

更新

另一种常识性方法是注意标准草案部分5.3.1一元运算符段落1其中说(强调我的):

一元 * 运算符执行间接寻址:应用它的表达式应是指向对象类型的指针,或指向函数类型的指针,并且结果是引用该对象的左值或表达式指向的函数。[...]

然后,如果我们转到 3.10 部分 左值和右值 段落 1 说(强调我的):

左值(历史上如此称呼,因为左值可能出现在赋值表达式的左侧)指定一个函数或一个对象。 [...]

ptr 除非偶然,否则不会指向有效的对象

I found the answer to this question is a unexpected corner of the C++ draft standard, section 24.2 Iterator requirements, specifically section 24.2.1 In general paragraph 5 and 10 which respectively say (emphasis mine):

[...][ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] [...] Dereferenceable values are always non-singular.

and:

An invalid iterator is an iterator that may be singular.268

and footnote 268 says:

This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.

Although it does look like there is some controversy over whether a null pointer is singular or not and it looks like the term singular value needs to be properly defined in a more general manner.

The intent of singular is seems to be summed up well in defect report 278. What does iterator validity mean? under the rationale section which says:

Why do we say "may be singular", instead of "is singular"? That's becuase a valid iterator is one that is known to be nonsingular. Invalidating an iterator means changing it in such a way that it's no longer known to be nonsingular. An example: inserting an element into the middle of a vector is correctly said to invalidate all iterators pointing into the vector. That doesn't necessarily mean they all become singular.

So invalidation and being uninitialized may create a value that is singular but since we can not prove they are nonsingular we must assume they are singular.

Update

An alternative common sense approach would be to note that the draft standard section 5.3.1 Unary operators paragraph 1 which says(emphasis mine):

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.[...]

and if we then go to section 3.10 Lvalues and rvalues paragraph 1 says(emphasis mine):

An lvalue (so called, historically, because lvalues could appear on the left-hand side of an assignment expression) designates a function or an object. [...]

but ptr will not, except by chance, point to a valid object.

塔塔猫 2024-10-11 09:24:44

评估未初始化的指针会导致未定义的行为。由于取消引用指针首先需要对其进行评估,这意味着取消引用也会导致未定义的行为。

尽管措辞发生了变化,但 C++11 和 C++14 中都是如此。

在 C++14 中,它被 [dcl.init]/12 完全覆盖:

当获得自动或动态存储期限的对象的存储时,该对象具有不确定值,如果没有对该对象执行初始化,则该对象将保留不确定值,直到该值被替换。

如果计算产生不确定值,则行为未定义,但以下情况除外:

其中“以下情况”是对 unsigned char 的特定操作。


在 C++11 中,[conv.lval/2] 在左值到右值转换过程中涵盖了这一点(即从 ptr 表示的存储区域检索指针值):

非函数、非数组类型 T 的左值可以转换为纯右值。如果 T 是不完整类型,则需要此转换的程序是格式错误的。如果左值引用的对象不是
类型 T 的对象,并且不是从 T 派生的类型的对象,或者如果该对象未初始化,则需要此转换的程序具有未定义的行为。

C++14 中的粗体部分已被删除,并替换为 [dcl.init/12] 中的额外文本。

Evaluating an uninitialized pointer causes undefined behaviour. Since dereferencing the pointer first requires evaluating it, this implies that dereferencing also causes undefined behaviour.

This was true in both C++11 and C++14, although the wording changed.

In C++14 it is fully covered by [dcl.init]/12:

When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced.

If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

where the "following cases" are particular operations on unsigned char.


In C++11, [conv.lval/2] covered this under the lvalue-to-rvalue conversion procedure (i.e. retrieving the pointer value from the storage area denoted by ptr):

A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not
an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

The bolded part was removed for C++14 and replaced with the extra text in [dcl.init/12].

友谊不毕业 2024-10-11 09:24:44

我不会假装我对此了解很多,但有些编译器会将指针初始化为 NULL,并且取消引用指向 NULL 的指针是 UB。

另外考虑到未初始化的指针可能指向任何内容(包括 NULL),当您取消引用它时,您可以得出结论,它是 UB。

第 8.3.2 节中的注释 [dcl.ref]

[注意:特别是空引用
不能存在于一个明确定义的
程序,因为唯一的方法
创建这样的引用是
将其绑定到通过以下方式获得的“对象”
取消引用空指针,这
导致未定义的行为
。作为
9.6 中描述的引用不能
直接绑定到位域。 ]

—ISO/IEC 14882:1998(E),ISO C++ 标准,第 8.3.2 节 [dcl.ref]

我认为我应该将其写为注释,我不太确定。

I'm not going to pretend I know a lot about this, but some compilers would initialize the pointer to NULL and dereferencing a pointer to NULL is UB.

Also considering that uninitialized pointer could point to anything (this includes NULL) you could concluded that it's UB when you dereference it.

A note in section 8.3.2 [dcl.ref]

[Note: in particular, a null reference
cannot exist in a well-defined
program, because the only way to
create such a reference would be to
bind it to the “object” obtained by
dereferencing a null pointer, which
causes undefined behavior
. As
described in 9.6, a reference cannot
be bound directly to a bitfield. ]

—ISO/IEC 14882:1998(E), the ISO C++ standard, in section 8.3.2 [dcl.ref]

I think I should have written this as comment instead, I'm not really that sure.

一曲爱恨情仇 2024-10-11 09:24:44

要取消引用指针,您需要读取指针变量(不是谈论它指向的对象)。读取未初始化的变量是未定义的行为。

读取指针的值后,您对指针的值执行的操作此时不再重要,无论是写入(如您的示例中)还是从它指向的对象中读取。

To dereference the pointer, you need to read from the pointer variable (not talking about the object it points to). Reading from an uninitialized variable is undefined behaviour.

What you do with the value of pointer after you have read it, doesn't matter anymore at this point, be it writing to (like in your example) or reading from the object it points to.

通知家属抬走 2024-10-11 09:24:44

即使内存中某些内容的正常存储没有“空间”用于任何陷阱位或陷阱表示,也不需要实现以与静态持续时间变量相同的方式存储自动变量,除非用户代码可能保存指向它们某处的指针。这种行为对于整数类型最为明显。在典型的 32 位系统上,给定代码:

uint16_t foo(void);
uint16_t bar(void);
uint16_t blah(uint32_t q)
{
  uint16_t a;
  if (q & 1) a=foo();
  if (q & 2) a=bar();
  return a;
}
unsigned short test(void)
{
  return blah(65540);
}

即使该值超出了 uint16_t 可表示的范围,test 生成 65540 也不会特别令人惊讶。没有陷阱表示的类型。如果 uint16_t 类型的局部变量保存不确定值,则不要求读取它会产生 uint16_t 范围内的值。由于以这种方式使用无符号整数时也可能会导致意外行为,因此没有理由期望指针不会以更糟糕的方式表现。

Even if the normal storage of something in memory would have no "room" for any trap bits or trap representations, implementations are not required to store automatic variables the same way as static-duration variables except when there is a possibility that user code might hold a pointer to them somewhere. This behavior is most visible with integer types. On a typical 32-bit system, given the code:

uint16_t foo(void);
uint16_t bar(void);
uint16_t blah(uint32_t q)
{
  uint16_t a;
  if (q & 1) a=foo();
  if (q & 2) a=bar();
  return a;
}
unsigned short test(void)
{
  return blah(65540);
}

it would not be particularly surprising for test to yield 65540 even though that value is outside the representable range of uint16_t, a type which has no trap representations. If a local variable of type uint16_t holds Indeterminate Value, there is no requirement that reading it yield a value within the range of uint16_t. Since unexpected behaviors could result when using even unsigned integers in such fashion, there's no reason to expect that pointers couldn't behave in even worse fashion.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文