什么时候通过空指针调用成员函数会导致未定义的行为？

发布于 2024-08-25 12:39:26 字数 676 浏览 7 评论 0原文

考虑以下代码：

#include <iostream>

struct foo
{
    // (a):
    void bar() { std::cout << "gman was here" << std::endl; }

    // (b):
    void baz() { x = 5; }

    int x;
};

int main()
{
    foo* f = 0;

    f->bar(); // (a)
    f->baz(); // (b)
}

我们预计 (b) 会崩溃，因为空指针没有对应的成员 x。实际上，(a) 不会崩溃，因为 this 指针从未被使用过。

因为 (b) 取消引用 this 指针 ((*this).x = 5;)，并且 this为 null 时，程序会进入未定义的行为，因为取消引用 null 总是被认为是未定义的行为。

(a) 是否会导致未定义的行为？如果两个函数（和x）都是静态的怎么办？

原文

Consider the following code:

#include <iostream>

struct foo
{
    // (a):
    void bar() { std::cout << "gman was here" << std::endl; }

    // (b):
    void baz() { x = 5; }

    int x;
};

int main()
{
    foo* f = 0;

    f->bar(); // (a)
    f->baz(); // (b)
}

We expect (b) to crash, because there is no corresponding member x for the null pointer. In practice, (a) doesn't crash because the this pointer is never used.

Because (b) dereferences the this pointer ((*this).x = 5;), and this is null, the program enters undefined behavior, as dereferencing null is always said to be undefined behavior.

Does (a) result in undefined behavior? What about if both functions (and x) are static?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

叹倦 2024-09-01 12:39:26

(a) 和 (b) 都会导致未定义的行为。通过空指针调用成员函数始终是未定义的行为。如果函数是静态的，那么它在技术上也是未定义的，但存在一些争议。

首先要理解的是为什么取消引用空指针是未定义的行为。在C++03中，这里实际上存在一些歧义。

尽管在第 1.9/4 节和第 8.3.2/4 节的注释中都提到了“取消引用空指针会导致未定义的行为”，但从未明确说明过。（注释是非规范性的。）

但是，可以尝试从 §3.10/2 中推导出来：

左值指的是对象或函数。

取消引用时，结果是左值。空指针不引用对象，因此当我们使用左值时，我们会出现未定义的行为。问题是前面的句子从未被声明过，那么“使用”左值是什么意思呢？甚至只是生成它，或者以更正式的方式使用它来执行左值到右值的转换？

无论如何，它绝对不能转换为右值（§4.1/1）：

如果左值引用的对象不是 T 类型的对象，也不是从 T 派生的类型的对象，或者该对象未初始化，则需要此转换的程序具有未定义的行为。

这绝对是未定义的行为。

歧义来自于是否尊重但不使用来自无效指针的值（即获取左值但不将其转换为右值）是未定义的行为。如果不是，则 int *i = 0; *我; &(*i); 定义明确。这是一个活跃问题。

因此，我们有一个严格的“取消引用空指针，获得未定义的行为”视图和一个弱的“使用取消引用的空指针，获得未定义的行为”视图。

现在我们考虑这个问题。

是的，(a) 会导致未定义的行为。事实上，如果 this 为 null，则无论函数的内容如何，结果都是未定义的。

这源自 §5.2.5/3：

如果 E1 的类型为“指向类 X 的指针”，则表达式 E1->E2 将转换为等效形式 (*(E1 )).E2;

*(E1) 将导致严格解释的未定义行为，.E2 将其转换为右值，使其成为弱函数的未定义行为解释。

它还表明它是直接来自 (§9.3.1/1) 的未定义行为：

如果为不属于 X 类型或不属于 X 派生类型的对象调用类 X 的非静态成员函数，则行为未定义。

对于静态函数，严格解释与弱解释会产生差异。严格来说，它是未定义的：

可以使用类成员访问语法来引用静态成员，在这种情况下将对对象表达式进行求值。

也就是说，它的计算就好像它是非静态的一样，我们再次使用 (*(E1)).E2 取消引用空指针。

但是，由于 E1 未在静态成员函数调用中使用，因此如果我们使用弱解释，则该调用是明确定义的。 *(E1) 产生左值，静态函数被解析，*(E1) 被丢弃，并且函数被调用。没有左值到右值的转换，因此不存在未定义的行为。

在 C++0x 中，从 n3126 开始，歧义仍然存在。现在，请注意安全：使用严格的解释。

Both (a) and (b) result in undefined behavior. It's always undefined behavior to call a member function through a null pointer. If the function is static, it's technically undefined as well, but there's some dispute.

The first thing to understand is why it's undefined behavior to dereference a null pointer. In C++03, there's actually a bit of ambiguity here.

Although "dereferencing a null pointer results in undefined behavior" is mentioned in notes in both §1.9/4 and §8.3.2/4, it's never explicitly stated. (Notes are non-normative.)

However, one can try to deduced it from §3.10/2:

An lvalue refers to an object or function.

When dereferencing, the result is an lvalue. A null pointer does not refer to an object, therefore when we use the lvalue we have undefined behavior. The problem is that the previous sentence is never stated, so what does it mean to "use" the lvalue? Just even generate it at all, or to use it in the more formal sense of perform lvalue-to-rvalue conversion?

Regardless, it definitely cannot be converted to an rvalue (§4.1/1):

If the object to which the lvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

Here it's definitely undefined behavior.

The ambiguity comes from whether or not it's undefined behavior to deference but not use the value from an invalid pointer (that is, get an lvalue but not convert it to an rvalue). If not, then int *i = 0; *i; &(*i); is well-defined. This is an active issue.

So we have a strict "dereference a null pointer, get undefined behavior" view and a weak "use a dereferenced null pointer, get undefined behavior" view.

Now we consider the question.

Yes, (a) results in undefined behavior. In fact, if this is null then regardless of the contents of the function the result is undefined.

This follows from §5.2.5/3:

If E1 has the type “pointer to class X,” then the expression E1->E2 is converted to the equivalent form (*(E1)).E2;

*(E1) will result in undefined behavior with a strict interpretation, and .E2 converts it to an rvalue, making it undefined behavior for the weak interpretation.

It also follows that it's undefined behavior directly from (§9.3.1/1):

If a nonstatic member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.

With static functions, the strict versus weak interpretation makes the difference. Strictly speaking, it is undefined:

A static member may be referred to using the class member access syntax, in which case the object-expression is evaluated.

That is, it's evaluated just as if it were non-static and we once again dereference a null pointer with (*(E1)).E2.

However, because E1 is not used in a static member-function call, if we use the weak interpretation the call is well-defined. *(E1) results in an lvalue, the static function is resolved, *(E1) is discarded, and the function is called. There is no lvalue-to-rvalue conversion, so there's no undefined behavior.

In C++0x, as of n3126, the ambiguity remains. For now, be safe: use the strict interpretation.

回复收藏 0 原文

苍景流年 2024-09-01 12:39:26

显然未定义意味着它未定义，但有时它是可以预测的。我将要提供的信息永远不应该被依赖于工作代码，因为它当然不能得到保证，但在调试时它可能会很有用。

您可能认为对对象指针调用函数会取消引用该指针并导致 UB。实际上，如果函数不是虚拟的，编译器会将其转换为普通函数调用，将指针作为第一个参数传递 this，绕过取消引用并为被调用的成员函数创建定时炸弹。如果成员函数没有引用任何成员变量或虚函数，它实际上可能会成功而不会出现错误。请记住，成功属于“未定义”的宇宙！

Microsoft 的 MFC 函数 GetSafeHwnd 实际上依赖于这种行为。我不知道他们抽的是什么烟。

如果您正在调用虚拟函数，则必须取消对指针的引用才能到达虚函数表，并且您肯定会得到 UB（可能会崩溃，但请记住，没有任何保证）。

回复收藏 0 原文

梦亿 2024-09-01 12:39:26

对于 C++26 的 CWG 2823 ，作为缺陷报告，它有现在已澄清，即使没有对结果应用左值到右值转换或以任何其他方式使用结果，即“空左值”不存在，取消引用空指针本身也具有未定义的行为。

[expr.unary.op]/1现在介绍内置一元 * 运算符的行为：

[...] 如果操作数指向一个对象或函数，则结果表示该对象或函数；否则，除非 [expr.typeid] 中指定，否则行为未定义。

对于 C++26 的 CWG 2748 ，作为缺陷报告，它有还澄清了即使成员是静态的并且实际上不需要取消引用的结果，也会评估成员访问表达式中的指针取消引用。

[expr.ref]/3 现在声明

计算点之前的后缀表达式； [...]

根据 [expr.ref]/2：

[...] 表达式 E1->E2 转换为等效形式 (*(E1)).E2； [expr.ref] 的其余部分将仅处理第一个选项（点）。

[expr.ref] 中的注释进一步阐明：

如果对类成员访问表达式进行求值，则即使不需要结果来确定整个后缀表达式的值，也会发生子表达式求值，例如，如果 id 表达式表示静态成员。

因此，现在不再存在歧义，即成员函数调用（无论是静态还是非静态）对空指针值具有未定义的行为。

MSVC 当前错误地不考虑对空指针值 UB 上的非静态成员函数的调用，而 GCC 和 Clang 都正确地考虑了这一点。请参阅https://godbolt.org/z/zeGPzacEE。

对于静态成员函数，目前三个编译器都无法将空指针值上的静态成员函数调用识别为 UB，请参见 https://godbolt.org/z/G1bPYPqEP。一般来说，他们目前不认为空指针取消引用本身为 UB，请参阅 https://godbolt.org/z/T7Yo1E1Yh 。据推测，缺陷报告尚未实施。

With CWG 2823 for C++26, and as defect report, it has now been clarified that dereferencing a null pointer has in itself undefined behavior even if no lvalue-to-rvalue conversion is applied on the result or the result is used in any other way, i.e. "empty lvalues" don't exist.

[expr.unary.op]/1 now says about the behavior of the built-in unary * operator:

[...] If the operand points to an object or function, the result denotes that object or function; otherwise, the behavior is undefined except as specified in [expr.typeid].

With CWG 2748 for C++26, and as defect report, it has also been clarified that the pointer dereference in a member access expression is evaluated even if the member is static and the result of the dereference isn't actually needed.

[expr.ref]/3 now states

The postfix expression before the dot is evaluated; [...]

after transformation of E1->E2 to ((*E1)).E2 per [expr.ref]/2:

[...] The expression E1->E2 is converted to the equivalent form (*(E1)).E2; the remainder of [expr.ref] will address only the first option (dot).

A note in [expr.ref] further clarifies:

If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.

Therefore there is now no ambiguity anymore that a member function call, whether static or non-static, has undefined behavior on a null pointer value.

MSVC currently mistakenly does not consider a call to a non-static member function on a null pointer value UB, while both GCC and Clang correctly do. See https://godbolt.org/z/zeGPzacEE.

For static member functions, currently all three compilers fail to identify the static member function call on a null pointer value as UB, see https://godbolt.org/z/G1bPYPqEP. Generally they do not consider the null pointer dereference itself as UB currently, see https://godbolt.org/z/T7Yo1E1Yh. Presumably the defect reports haven't been implemented yet.

回复收藏 0 原文

甜味拾荒者 2024-09-01 12:39:26

当目标为 null 时，将使用方法语法进行的所有调用视为调用未定义行为，允许编译器给出类似以下内容：

void test()
{
  if (this) doSomething(this); else handleNullCase();
}

用对 doSomething(this); 的无条件调用替换该代码。一些客户碰巧喜欢这种行为的编译器会以这种方式工作，并且标准的作者不想强制要求此类编译器以客户不太喜欢的方式运行。

请注意，标准还允许其客户希望将 somePtr->nonVirtualMember 传递给 nonVirtualMember somePtr 值的编译器code> 以一种不知道 somePtr 是否为 null 的方式，并让被调用的函数同样处理 this 的检查，以执行这些操作。

当编写 C 和 C++ 标准时，作者经常希望以他们认为最能让编译器编写者满足各种客户需求的任何方式来编写标准。

Treating all call made using method syntax as invoking Undefined Behavior when the target is null allows a compiler given something like:

void test()
{
  if (this) doSomething(this); else handleNullCase();
}

to replace that code with an unconditional call to doSomething(this);. Some compilers whose customers happen to like that behavior work that way, and the authors of the Standard don't want to mandate that such compilers behave in a manner their customers wouldn't like as much.

Note that the Standard also allows compilers whose customers would prefer to have somePtr->nonVirtualMember pass nonVirtualMember the value of somePtr in a manner agnostic to whether somePtr is null, and have the called function process the check of this likewise, to do those things.

When the C and C++ Standard were written, the authors often wanted to write the standards in whatever way they thought would best allow compiler writers to satisfy the desires of a variety of customers.

回复收藏 0 原文

~没有更多了~