const 成员和赋值运算符。如何避免未定义的行为?

发布于 2024-10-01 13:18:38 字数 1319 浏览 0 评论 0原文

回答了有关std::vector of objects and const- Correctness,并收到有关未定义行为的评论。我不同意,因此我有一个问题。

考虑带有 const 成员的类:

class A { 
public: 
    const int c; // must not be modified! 
    A(int c) : c(c) {} 
    A(const A& copy) : c(copy.c) { }     
    // No assignment operator
}; 

我想要一个赋值运算符,但我不想使用 const_cast 就像答案之一的以下代码所示:

A& operator=(const A& assign) 
{ 
    *const_cast<int*> (&c)= assign.c;  // very very bad, IMHO, it is undefined behavior
    return *this; 
} 

我的解决方案是

// Custom-defined assignment operator
A& operator=(const A& right)  
{  
    if (this == &right) return *this;  

    // manually call the destructor of the old left-side object
    // (`this`) in the assignment operation to clean it up
    this->~A(); 
    // use "placement new" syntax to copy-construct a new `A` 
    // object from `right` into left (at address `this`)
    new (this) A(right); 
    return *this;  
}  

我是否有未定义的行为 (UB)?

没有 UB 的解决方案是什么?

I answered the question about std::vector of objects and const-correctness, and received a comment about undefined behavior. I do not agree and therefore I have a question.

Consider the class with const member:

class A { 
public: 
    const int c; // must not be modified! 
    A(int c) : c(c) {} 
    A(const A& copy) : c(copy.c) { }     
    // No assignment operator
}; 

I want to have an assignment operator but I do not want to use const_cast like in the following code from one of the answers:

A& operator=(const A& assign) 
{ 
    *const_cast<int*> (&c)= assign.c;  // very very bad, IMHO, it is undefined behavior
    return *this; 
} 

My solution is

// Custom-defined assignment operator
A& operator=(const A& right)  
{  
    if (this == &right) return *this;  

    // manually call the destructor of the old left-side object
    // (`this`) in the assignment operation to clean it up
    this->~A(); 
    // use "placement new" syntax to copy-construct a new `A` 
    // object from `right` into left (at address `this`)
    new (this) A(right); 
    return *this;  
}  

Do I have undefined behavior (UB)?

What would be a solution without UB?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

∞梦里开花 2024-10-08 13:18:38

您的代码会导致未定义的行为。

不仅仅是“如果 A 用作基类以及这个、那个或另一个则未定义”。实际上始终是未定义的。 return *this 已经是 UB,因为 this 不能保证引用新对象。

具体来说,考虑 3.8/7:

如果,在对象的生命周期之后
已结束且在存储之前
占用的对象被重复使用或
释放后,会创建一个新对象
的存储位置
原来占用的对象,一个指针
指向原始对象,a
参考文献 提到了
原始对象,或名称
原始对象将自动
引用新对象,一旦
新对象的生命周期有
启动后,可用于操纵
新对象,如果:

...

——原始对象的类型是
不是 const 限定的,并且,如果一个类
类型,不包含任何非静态
其类型为的数据成员
const 限定或引用类型,

现在,“在对象的生命周期结束后,在重用或释放该对象占用的存储空间之前,在原始对象占用的存储位置创建一个新对象”正是这样你正在做。

您的对象属于类类型,并且它确实包含类型为 const 限定的非静态数据成员。因此,在赋值运算符运行后,引用旧对象的指针、引用和名称不能保证引用新对象并可用于操作它。

作为可能出错的具体示例,请考虑:

A x(1);
B y(2);
std::cout << x.c << "\n";
x = y;
std::cout << x.c << "\n";

期望此输出吗?

1
2

错误的!您可能会得到该输出,但 const 成员是 3.8/7 中所述规则的例外的原因是编译器可以将 xc 视为它声称的 const 对象。换句话说,编译器可以将此代码视为:

A x(1);
B y(2);
int tmp = x.c
std::cout << tmp << "\n";
x = y;
std::cout << tmp << "\n";

因为(非正式地)const 对象不会更改其值。当优化涉及 const 对象的代码时,这种保证的潜在价值应该是显而易见的。为了有任何方法可以在不调用 UB 的情况下修改 xc ,必须删除此保证。所以,只要标准编写者没有错误地完成了他们的工作,就没有办法做你想做的事。

[*] 事实上,我对使用 this 作为放置新的参数存有疑问 - 也许您应该首先将其复制到 void* 并使用它。但我并不关心这是否是 UB,因为它不会保存整个函数。

Your code causes undefined behavior.

Not just "undefined if A is used as a base class and this, that or the other". Actually undefined, always. return *this is already UB, because this is not guaranteed to refer to the new object.

Specifically, consider 3.8/7:

If, after the lifetime of an object
has ended and before the storage which
the object occupied is reused or
released, a new object is created at
the storage location which the
original object occupied, a pointer
that pointed to the original object, a
reference that referred to the
original object, or the name of the
original object will automatically
refer to the new object and, once the
lifetime of the new object has
started, can be used to manipulate the
new object, if:

...

— the type of the original object is
not const-qualified, and, if a class
type, does not contain any non-static
data member whose type is
const-qualified or a reference type,

Now, "after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied" is exactly what you are doing.

Your object is of class type, and it does contain a non-static data member whose type is const-qualified. Therefore, after your assignment operator has run, pointers, references and names referring to the old object are not guaranteed to refer to the new object and to be usable to manipulate it.

As a concrete example of what might go wrong, consider:

A x(1);
B y(2);
std::cout << x.c << "\n";
x = y;
std::cout << x.c << "\n";

Expect this output?

1
2

Wrong! It's plausible you might get that output, but the reason const members are an exception to the rule stated in 3.8/7, is so that the compiler can treat x.c as the const object that it claims to be. In other words, the compiler is allowed to treat this code as if it was:

A x(1);
B y(2);
int tmp = x.c
std::cout << tmp << "\n";
x = y;
std::cout << tmp << "\n";

Because (informally) const objects do not change their values. The potential value of this guarantee when optimizing code involving const objects should be obvious. For there to be any way to modify x.c without invoking UB, this guarantee would have to be removed. So, as long as the standard writers have done their job without errors, there is no way to do what you want.

[*] In fact I have my doubts about using this as the argument to placement new - possibly you should have copied it to a void* first, and used that. But I'm not bothered whether that specifically is UB, since it wouldn't save the function as a whole.

方觉久 2024-10-08 13:18:38

第一:当您将数据成员设为const时,您就是在告诉编译器和全世界该数据成员永远不会改变。当然,那么你不能给它赋值,而且你当然一定不能欺骗编译器接受这样做的代码,无论这个伎俩多么聪明。
您可以使用 const 数据成员为所有数据成员分配一个赋值运算符。 两者不可兼得。

至于您对问题的“解决方案”:
我想在为该对象调用的成员函数中调用该对象的析构函数会立即调用UB对未初始化的原始数据调用构造函数,以从成员函数中创建一个对象,该成员函数是为一个对象调用的,该对象位于现在对原始数据调用构造函数的位置...也非常对我来说听起来很像UB。 (天哪,光是把这个拼出来就让我的脚趾甲卷曲了。)而且,不,我没有这方面标准的章节和诗句。我讨厌阅读标准。我想我无法忍受它的仪表。

然而,抛开技术细节不谈,我承认您可能会在几乎所有平台上使用您的“解决方案”只要代码保持与示例中一样简单。尽管如此,这并不能使它成为一个好的解决方案。事实上,我认为这甚至不是一个可接受的解决方案,因为 IME 代码永远不会那么简单。多年来,它会被扩展、改变、变异和扭曲,然后它会悄无声息地失败,并且需要令人麻木的 36 小时轮班调试才能找到问题。我不知道你是怎么想的,但每当我发现一段像这样的代码可以带来 36 小时的调试乐趣时,我就想掐死那个对我做出这种事的可怜的傻瓜。

Herb Sutter,在他的GotW #23< /a>,逐条剖析这个想法,最后得出结论:它“充满陷阱,它经常是错误的,并且它使派生类的作者的生活变成了地狱...永远不要使用通过复制构造来实现复制分配的技巧使用显式析构函数,然后放置 new,尽管这个技巧每三个月就会出现在新闻组中”(强调我的)。

First: When you make a data member const, you're telling the compiler and all the world that this data member never changes. Of course then you cannot assign to it and you certainly must not trick the compiler into accepting code that does so, no matter how clever the trick.
You can either have a const data member or an assignment operator assigning to all data members. You can't have both.

As for your "solution" to the problem:
I suppose that calling the destructor on an object within a member function invoked for that objects would invoke UB right away. Invoking a constructor on uninitialized raw data to create an object from within a member function that's been invoked for an object that resided where now the constructor is invoked on raw data... also very much sounds like UB to me. (Hell, just spelling this out makes my toenails curl.) And, no, I don't have chapter and verse of the standard for that. I hate reading the standard. I think I can't stand its meter.

However, technicalities aside, I admit that you might get away with your "solution" on just about every platform as long as the code stays as simple as in your example. Still, this doesn't make it a good solution. In fact, I'd argue it's not even an acceptable solution, because IME code never stays as simple as that. Over the years it will get extended, changed, mutated, and twisted and then it will silently fail and require a mind-numbing 36hrs shift of debugging in order to find the problem. I don't know about you, but whenever I find a piece of code like this responsible for 36hrs of debugging fun I want to strangle the miserable dumb-wit who did this to me.

Herb Sutter, in his GotW #23, dissects this idea piece by piece and finally concludes that it "is full of pitfalls, it's often wrong, and it makes life a living hell for the authors of derived classes... never use the trick of implementing copy assignment in terms of copy construction by using an explicit destructor followed by placement new, even though this trick crops up every three months on the newsgroups" (emphasize mine).

白馒头 2024-10-08 13:18:38

如果 A 有 const 成员,你怎么可能分配给它呢?你正试图完成一些根本上不可能的事情。您的解决方案与原始解决方案相比没有新的行为,这不一定是 UB,但您的解决方案绝对是。

简单的事实是,您正在更改 const 成员。您要么需要取消您的成员,要么放弃赋值运算符。你的问题没有解决办法——这完全是矛盾的。

编辑以获得更清晰的信息:

Const 强制转换并不总是引入未定义的行为。然而,你肯定做到了。除此之外,在放入它之前不调用所有析构函数是未定义的 - 而且您甚至没有调用正确的析构函数,除非您确定 T 是一个 POD 类。此外,还有一些与各种形式的继承相关的未定义行为。

您确实会调用未定义的行为,并且可以通过不尝试分配给 const 对象来避免这种情况。

How can you possibly assign to an A if it has a const member? You're trying to accomplish something that's fundamentally impossible. Your solution has no new behaviour over the original, which is not necessarily UB but yours most definitely is.

The simple fact is, you're changing a const member. You either need to un-const your member, or ditch the assignment operator. There is no solution to your problem- it's a total contradiction.

Edit for more clarity:

Const cast does not always introduce undefined behaviour. You, however, most certainly did. Apart from anything else, it is undefined not to call all destructors- and you didn't even call the right one- before you placed into it unless you knew for certain that T is a POD class. In addition, there's owch-time undefined behaviours involved with various forms of inheritance.

You do invoke undefined behaviour, and you can avoid this by not trying to assign to a const object.

爱她像谁 2024-10-08 13:18:38

首先,您使用“placement new”作为实现赋值运算符 operator=() 的手段(我可能会说非常巧妙)的整个动机是:受到这个问题的煽动(std::vector of objects and const- Correctness ),现已作废。从 C++11 开始,该问题的代码现在没有错误。请参阅我的回答

其次, C++11 的 emplace() 函数现在的功能与您使用 placement new 的功能几乎完全相同,只是它们都是现在几乎由编译器本身保证按照 C++ 标准定义良好的行为。

第三,接受的答案指出:

因为 this 不能保证引用新对象

我想知道这是否是因为 this 变量中包含的值可能会被放置新的复制构造操作更改,不是因为使用该类实例的任何内容都可能保留它的缓存值以及旧实例数据,而不是从内存中读取对象实例的新值。如果是前者,在我看来,您可以通过使用 this 指针的临时副本来确保 this 在赋值运算符函数中是正确的,如下所示:

// Custom-defined assignment operator
A& operator=(const A& right)  
{  
    if (this == &right) return *this;  

    // manually call the destructor of the old left-side object
    // (`this`) in the assignment operation to clean it up
    this->~A(); 

    // Now back up `this` in case it gets corrupted inside this function call
    // only during the placement new copy-construction operation which 
    // overwrites this objct:
    void * thisBak = this;

    // use "placement new" syntax to copy-construct a new `A` 
    // object from `right` into left (at address `this`)
    new (this) A(right); 

    // Note: we cannot write to or re-assign `this`. 
    // See here: https://stackoverflow.com/a/18227566/4561887

    // Return using our backup copy of `this` now
    return *thisBak;  
}  

但是,如果与被缓存的对象有关,并且每次使用时都不会重新读取它,我想知道易失性是否可以解决这个问题!即:使用volatile const int c;作为类成员,而不是const int c;

第四,在我的回答的其余部分中,我重点关注应用于类成员的易失性的使用,看看这是否可以解决这两个潜在的未定义行为情况中的第二个:< /strong>

  1. 您自己的解决方案中潜在的 UB:

     // 自定义赋值运算符
     A&运算符=(const A&右)  
     {  
         if (this == &right) return *this;  
    
         // 手动调用旧左侧对象的析构函数
         // (`this`) 在赋值操作中清理它
         这->~A(); 
         // 使用“placement new”语法复制构造一个新的 `A` 
         // 对象从“右”到左(在地址“this”处)
         新(这个)A(右); 
         返回*这个;  
     }  
    
  2. 您提到的潜在 UB 可能存在于其他解决方案中。

     // (你的话,不是我的话):“非常非常糟糕,恕我直言,这是 
     // 未定义的行为”
     *const_cast; (&c)= 分配.c;
    

虽然我认为也许添加易失性可能会解决上面的两种情况,但我在这个答案的其余部分的重点是上面的第二种情况。

tldr;

在我看来,如果您添加 volatile 并进行类成员变量volatile const int c;,而不仅仅是const int c;。我不能说这是一个好主意,但我认为抛弃 const 并写入 c 然后就成为定义明确的行为并且完全有效。否则,该行为未定义,只是因为 c读取可能会被缓存和/或优化,因为它只是 const,而不是<代码>易失性。

请阅读下面的内容以获取更多详细信息和理由,包括查看一些示例和一些汇编。

const 成员和赋值运算符。如何避免未定义的行为?

写入 const 成员只是未定义的行为...

...因为编译器可能会优化对变量的进一步读取,因为它是 const 。换句话说,即使您已经正确更新了内存中给定地址处包含的值,编译器也可能会告诉代码只返回寄存器中最后一个保存它第一次读取的值的内容,而不是返回到内存地址并在每次从该变量读取时实际检查新值。

所以这:

// class member variable:
const int c;    

// anywhere
*const_cast<int*>(&c) = assign.c;

可能是未定义的行为。它可能在某些情况下起作用,但在其他情况下不起作用,在某些编译器上起作用,但在其他编译器上不起作用,或者在某些版本的编译器上起作用,但在其他编译器上不起作用。我们不能依赖它来实现可预测的行为,因为该语言没有指定每次我们将变量设置为 const 然后写入和读取时会发生什么。

例如,这个程序(请参见此处:https://godbolt.org/z/EfPPba):

#include <cstdio>
int main() {
  const int i = 5;
  *(int*)(&i) = 8;
  printf("%i\n", i);
  return 0;
}

打印 5 (尽管我们希望它打印 8)并在 main 中生成此程序集。 (请注意,我不是组装专家)。我已经标记了 printf 行。您可以看到,即使 8 写入该位置 (mov DWORD PTR [rax], 8),printf 行也不会读取出这个新值。他们读出了之前存储的 5,因为他们不希望它发生变化,尽管它确实发生了变化。该行为未定义,因此在这种情况下省略读取。

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     DWORD PTR [rbp-4], 5
lea     rax, [rbp-4]
mov     DWORD PTR [rax], 8

// printf lines
mov     esi, 5
mov     edi, OFFSET FLAT:.LC0
mov     eax, 0
call    printf

mov     eax, 0
leave
ret

然而,写入 易失性 const 变量不是未定义的行为......

因为易失性告诉编译器它最好读取以下内容:每次读取该变量时的实际内存位置,因为它可能随时更改!

你可能会想:“这还有道理吗?” (有一个 易失性 const 变量。我的意思是:“什么可能会改变 const 变量,使我们需要将其标记为 易失性!?)答案是:“嗯,是的!它确实有道理!”在微控制器和其他低级内存映射嵌入式设备上,一些寄存器(可能随时由底层硬件更改)是只读的。为了将它们标记为只读,仅在 C 或 C++ 中,我们将它们设置为 const,但为了确保编译器知道,每次我们读取变量时,它都会更好地实际读取其地址位置的内存, >我们不是依赖于保留先前缓存的值的优化,而是将它们标记为易失性,因此,将地址0xF000标记为只读8。 -bit 寄存器名为 REG1,我们会在头文件中的某个位置定义它:

// define a read-only 8-bit register
#define REG1 (*(volatile const uint8_t*)(0xF000))

现在,我们可以随心所欲地读取它,并且每次我们要求代码读取变量,它会的。这是明确定义的行为。现在,我们可以执行类似的操作,并且此代码不会得到优化,因为编译器知道该寄存器值实际上可能会更改。在任何给定时间,因为它是易失性

while (REG1 == 0x12)
{
    // busy wait until REG1 gets changed to a new value
}

并且,要将REG2标记为8位读/写寄存器,当然,我们只需删除常量。然而,在这两种情况下,都需要 volatile ,因为这些值可能会在任何给定时间由硬件更改,因此编译器最好不要对这些变量做出任何假设或尝试缓存它们的值并依赖于缓存读数。

// define a read/write 8-bit register
#define REG2 (*(volatile uint8_t*)(0xF001))

因此,以下不是未定义的行为!据我所知,这是非常明确定义的行为:

// class member variable:
volatile const int c;    

// anywhere
*const_cast<int*>(&c) = assign.c;

即使变量是 const,我们也可以抛弃 const 并写入它,编译器会尊重它并实际写入它。 而且,现在变量标记为易失性,编译器将每次读取它,并且也尊重这一点,与阅读上面的 REG1REG2 相同。

因此,现在我们添加了 易失性(请参见此处:https:// godbolt.org/z/6K8dcG):

#include <cstdio>
int main() {
  volatile const int i = 5;
  *(int*)(&i) = 8;
  printf("%i\n", i);
  return 0;
}

打印 8,现在是正确的,并在 main 中生成此程序集。我再次标记了 printf 行。请注意我标记的新的和不同的行!这些是对程序集输出的唯一更改!其他每一行都完全相同。下面标记的新行将输出并实际读取变量的新值并将其存储到寄存器eax中。接下来,在准备打印时,它不会像之前那样将硬编码的 5 移动到寄存器 esi 中,而是移动寄存器 eax 的内容code>,刚刚被读取,现在包含一个 8,进入寄存器 esi。解决了!添加易失性修复了它!

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     DWORD PTR [rbp-4], 5
lea     rax, [rbp-4]
mov     DWORD PTR [rax], 8

// printf lines
mov     eax, DWORD PTR [rbp-4]  // NEW!
mov     esi, eax                // DIFFERENT! Was `mov     esi, 5`
mov     edi, OFFSET FLAT:.LC0
mov     eax, 0
call    printf

mov     eax, 0
leave
ret

这是一个更大的演示(在线运行:https://onlinegdb.com/HyU6fyCNv)。您可以看到,我们可以通过将变量转换为非常量引用或非常量指针来写入变量。

在所有情况下(转换为非常量引用或非常量指针以修改 const 值),我们可以使用 C++ 样式转换或 C 样式转换。

在上面的简单示例中,我验证了在所有四种情况下(甚至使用 C 样式强制转换来强制转换为引用:(int&)(i) = 8;,奇怪的是,因为 C没有引用:)) 程序集输出是相同的。

#include <stdio.h>

int main()
{
    printf("Hello World\n");

    // This does NOT work!
    const int i1 = 5;
    printf("%d\n", i1);
    *const_cast<int*>(&i1) = 6;
    printf("%d\n\n", i1); // output is 5, when we want it to be 6!
    
    // BUT, if you make the `const` variable also `volatile`, then it *does* work! (just like we do
    // for writing to microcontroller registers--making them `volatile` too). The compiler is making
    // assumptions about that memory address when we make it just `const`, but once you make it
    // `volatile const`, those assumptions go away and it has to actually read that memory address
    // each time you ask it for the value of `i`, since `volatile` tells it that the value at that
    // address could change at any time, thereby making this work.

    // Reference casting: WORKS! (since the `const` variable is now `volatile` too)

    volatile const int i2 = 5;
    printf("%d\n", i2);
    const_cast<int&>(i2) = 7;
    // So, the output of this is 7:
    printf("%d\n\n", i2);
    
    // C-style reference cast (oddly enough, since C doesn't have references :))
    
    volatile const int i3 = 5;
    printf("%d\n", i3);
    (int&)(i3) = 8;
    printf("%d\n\n", i3);
    

    // It works just fine with pointer casting too instead of reference casting, ex:
    
    volatile const int i4 = 5;
    printf("%d\n", i4);
    *(const_cast<int*>(&i4)) = 9;
    printf("%d\n\n", i4);

    // or C-style:
    
    volatile const int i5 = 5;
    printf("%d\n", i5);
    *(int*)(&i5) = 10;
    printf("%d\n\n", i5);


    return 0;
}

示例输出:

Hello World
5
5

5
7

5
8

5
9

5
10

注意:

  1. 我还注意到,在修改 const 类成员时,即使它们不是 易失性,上述方法也有效。请参阅我的“std_Optional_copy_test”程序!例如:https://onlinegdb.com/HkyNyTt4D。然而,这可能是未定义的行为。为了使其定义良好,请将成员变量设置为volatile const,而不仅仅是const
  2. 您不必从 易失性 const int 转换为 易失性 int 的原因(即:为什么只转换为 int 引用或 int 指针)工作得很好,因为 volatile 影响变量的读取,而不是变量的写入。因此,只要我们通过易失性变量方式读取变量(我们这样做了),就保证我们的读取不会被优化。这就是给我们明确定义的行为的原因。即使变量不是易失性,写入始终有效。

参考文献:

  1. [我自己的答案] “放置”有什么用途新的”?
  2. x86 组装指南
  3. 更改对象的“this”指针以指向不同的对象
  4. 编译器资源管理器输出(含汇编),来自 godbolt.org:
    1. 此处:https://godbolt.org/z/EfPPba
    2. 这里:https://godbolt.org/z/6K8dcG
  5. [我的答案] STM32 微控制器上的寄存器级 GPIO 访问:像STM8一样编程STM32(寄存器级GPIO)

First off, the whole motivation for your (quite ingenious I might say) usage of "placement new" as a means of implementing the assignment operator, operator=(), as instigated by this question (std::vector of objects and const-correctness), is now nullified. As of C++11, that question's code now has no errors. See my answer here.

Secondly, C++11's emplace() functions now do pretty much exactly what your usage of placement new was doing, except that they are all virtually guaranteed by the compilers themselves now to be well-defined behavior, per the C++ standard.

Third, when the accepted answer states:

because this is not guaranteed to refer to the new object

I wonder if this is because the value contained in the this variable might be changed by the placement new copy-construction operation, NOT because anything using that instance of the class might retain a cached value of it, with the old instance data, rather than read a new value of the object instance from memory. If the former, it seems to me you could ensure this is correct inside the assignment operator function by using a temporary copy of the this pointer, like this:

// Custom-defined assignment operator
A& operator=(const A& right)  
{  
    if (this == &right) return *this;  

    // manually call the destructor of the old left-side object
    // (`this`) in the assignment operation to clean it up
    this->~A(); 

    // Now back up `this` in case it gets corrupted inside this function call
    // only during the placement new copy-construction operation which 
    // overwrites this objct:
    void * thisBak = this;

    // use "placement new" syntax to copy-construct a new `A` 
    // object from `right` into left (at address `this`)
    new (this) A(right); 

    // Note: we cannot write to or re-assign `this`. 
    // See here: https://stackoverflow.com/a/18227566/4561887

    // Return using our backup copy of `this` now
    return *thisBak;  
}  

But, if it has to do with an object being cached and not re-read each time it is used, I wonder if volatile would solve this! ie: use volatile const int c; as the class member instead of const int c;.

Fourth, in the rest of my answer I focus on the usage of volatile, as applied to the class members, to see if this might solve the 2nd of these two potential undefined behavior cases:

  1. The potential UB in your own solution:

     // Custom-defined assignment operator
     A& operator=(const A& right)  
     {  
         if (this == &right) return *this;  
    
         // manually call the destructor of the old left-side object
         // (`this`) in the assignment operation to clean it up
         this->~A(); 
         // use "placement new" syntax to copy-construct a new `A` 
         // object from `right` into left (at address `this`)
         new (this) A(right); 
         return *this;  
     }  
    
  2. The potential UB you mention may exist in the other solution.

     // (your words, not mine): "very very bad, IMHO, it is 
     // undefined behavior"
     *const_cast<int*> (&c)= assign.c;
    

Although I think perhaps adding volatile might fix both cases above, my focus in the rest of this answer is on the 2nd case just above.

tldr;

It seems to me this (the 2nd case just above, in particular) becomes valid and well-defined behavior by the standard if you add volatile and make the class member variable volatile const int c; instead of just const int c;. I can't say this is a great idea, but I think casting away const and writing to c then becomes well-defined behavior and perfectly valid. Otherwise, the behavior is undefined only because reads of c may be cached and/or optimized out since it is only const, and not also volatile.

Read below for more details and justification, including a look at some examples and a little assembly.

const member and assignment operator. How to avoid the undefined behavior?

Writing to const members is only undefined behavior...

...because the compiler may optimize out further reads to the variable, since it's const. In other words, even though you've correctly updated the value contained at a given address in memory, the compiler may tell the code to just regurgitate whatever was last in the register holding the value it first read, rather than going back to the memory address and actually checking for a new value each time you read from that variable.

So this:

// class member variable:
const int c;    

// anywhere
*const_cast<int*>(&c) = assign.c;

probably is undefined behavior. It may work in some cases but not others, on some compilers but not others, or in some versions of compilers, but not others. We can't rely on it to have predictable behavior because the language does not specify what should happen each and every time we set a variable as const and then write to and read from it.

This program, for instance (see here: https://godbolt.org/z/EfPPba):

#include <cstdio>
int main() {
  const int i = 5;
  *(int*)(&i) = 8;
  printf("%i\n", i);
  return 0;
}

prints 5 (although we wanted it to print 8) and produces this assembly in main. (Note that I'm no assembly expert). I've marked the printf lines. You can see that even though 8 is written to that location (mov DWORD PTR [rax], 8), the printf lines do NOT read out that new value. They read out the previously-stored 5 because they don't expect it to have changed, even though it did. The behavior is undefined, so the read is omitted in this case.

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     DWORD PTR [rbp-4], 5
lea     rax, [rbp-4]
mov     DWORD PTR [rax], 8

// printf lines
mov     esi, 5
mov     edi, OFFSET FLAT:.LC0
mov     eax, 0
call    printf

mov     eax, 0
leave
ret

Writing to volatile const variables, however, is not undefined behavior...

...because volatile tells the compiler it better read the contents at the actual memory location on every read to that variable, since it might change at any time!

You might think: "Does this even make sense?" (having a volatile const variable. I mean: "what might change a const variable to make us need to mark it volatile!?) The answer is: "well, yes! It does make sense!" On microcontrollers and other low-level memory-mapped embedded devices, some registers, which could change at any moment by the underlying hardware, are read-only. To mark them read-only in C or C++ we make them const, but to ensure the compiler knows it better actually read the memory at their address location every single time we read the variable, rather than relying on optimizations which retain previously-cached values, we also mark them as volatile. So, to mark address 0xF000 as a read-only 8-bit register named REG1, we'd define it like this in a header file somewhere:

// define a read-only 8-bit register
#define REG1 (*(volatile const uint8_t*)(0xF000))

Now, we can read to it at our whim, and each and every time we ask the code to read the variable, it will. This is well-defined behavior. Now, we can do something like this, and this code will NOT get optimized out, because the compiler knows that this register value actually could change at any given time, since it's volatile:

while (REG1 == 0x12)
{
    // busy wait until REG1 gets changed to a new value
}

And, to mark REG2 as an 8-bit read/write register, of course, we'd just remove const. In both cases, however, volatile is required, as the values could change at any given time by the hardware, so the compiler better not make any assumptions about these variables or try to cache their values and rely on cached readings.

// define a read/write 8-bit register
#define REG2 (*(volatile uint8_t*)(0xF001))

Therefore, the following is not undefined behavior! This is very well-defined behavior as far as I can tell:

// class member variable:
volatile const int c;    

// anywhere
*const_cast<int*>(&c) = assign.c;

Even though the variable is const, we can cast away const and write to it, and the compiler will respect that and actually write to it. And, now that the variable is also marked as volatile, the compiler will read it every single time, and respect that too, the same as reading REG1 or REG2 above.

This program, therefore, now that we added volatile (see it here: https://godbolt.org/z/6K8dcG):

#include <cstdio>
int main() {
  volatile const int i = 5;
  *(int*)(&i) = 8;
  printf("%i\n", i);
  return 0;
}

prints 8, which is now correct, and produces this assembly in main. Again, I've marked the printf lines. Notice the new and different lines I've marked too! These are the only changes to the assembly output! Every other line is exactly identical. The new line, marked below, goes out and actually reads the new value of the variable and stores it into register eax. Next, in preparation for printing, instead of moving a hard-coded 5 into register esi, as was done before, it moves the contents of register eax, which is just read, and which now contains an 8, into register esi. Solved! Adding volatile fixed it!

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     DWORD PTR [rbp-4], 5
lea     rax, [rbp-4]
mov     DWORD PTR [rax], 8

// printf lines
mov     eax, DWORD PTR [rbp-4]  // NEW!
mov     esi, eax                // DIFFERENT! Was `mov     esi, 5`
mov     edi, OFFSET FLAT:.LC0
mov     eax, 0
call    printf

mov     eax, 0
leave
ret

Here's a bigger demo (run it online: https://onlinegdb.com/HyU6fyCNv). You can see that we can write to a variable by casting it to a non-const reference OR a non-const pointer.

In all cases (casting to both non-const references or non-const pointers in order to modify the const value), we can use C++-style casts, OR C-style casts.

In the simple example above, I verified that in all four cases (even using a C-style cast to cast to a reference: (int&)(i) = 8;, oddly enough, since C doesn't have references :)) the assembly output was the same.

#include <stdio.h>

int main()
{
    printf("Hello World\n");

    // This does NOT work!
    const int i1 = 5;
    printf("%d\n", i1);
    *const_cast<int*>(&i1) = 6;
    printf("%d\n\n", i1); // output is 5, when we want it to be 6!
    
    // BUT, if you make the `const` variable also `volatile`, then it *does* work! (just like we do
    // for writing to microcontroller registers--making them `volatile` too). The compiler is making
    // assumptions about that memory address when we make it just `const`, but once you make it
    // `volatile const`, those assumptions go away and it has to actually read that memory address
    // each time you ask it for the value of `i`, since `volatile` tells it that the value at that
    // address could change at any time, thereby making this work.

    // Reference casting: WORKS! (since the `const` variable is now `volatile` too)

    volatile const int i2 = 5;
    printf("%d\n", i2);
    const_cast<int&>(i2) = 7;
    // So, the output of this is 7:
    printf("%d\n\n", i2);
    
    // C-style reference cast (oddly enough, since C doesn't have references :))
    
    volatile const int i3 = 5;
    printf("%d\n", i3);
    (int&)(i3) = 8;
    printf("%d\n\n", i3);
    

    // It works just fine with pointer casting too instead of reference casting, ex:
    
    volatile const int i4 = 5;
    printf("%d\n", i4);
    *(const_cast<int*>(&i4)) = 9;
    printf("%d\n\n", i4);

    // or C-style:
    
    volatile const int i5 = 5;
    printf("%d\n", i5);
    *(int*)(&i5) = 10;
    printf("%d\n\n", i5);


    return 0;
}

Sample output:

Hello World
5
5

5
7

5
8

5
9

5
10

Notes:

  1. I've also noticed that the above works when modifying const class members even when they are NOT volatile. See my "std_optional_copy_test" program! Ex: https://onlinegdb.com/HkyNyTt4D. This, however, is probably undefined behavior. To make it well-defined, make the member variable volatile const instead of just const.
  2. The reason you don't have to cast from volatile const int to volatile int (ie: why just to int reference or int pointer) works just fine, is because volatile affects the reading of the variable, NOT the writing of the variable. So, so long as we read the variable through a volatile variable means, which we do, our reads are guaranteed not to be optimized out. That's what gives us the well-defined behavior. The writes always worked--even when the variable wasn't volatile.

Refences:

  1. [my own answer] What uses are there for "placement new"?
  2. x86 Assembly Guide
  3. Change 'this' pointer of an object to point different object
  4. Compiler Explorer outputs, with assembly, from godbolt.org:
    1. Here: https://godbolt.org/z/EfPPba
    2. And here: https://godbolt.org/z/6K8dcG
  5. [my answer] Register-level GPIO access on STM32 microcontrollers: Programing STM32 like STM8(register level GPIO )
水中月 2024-10-08 13:18:38

如果你确实想要一个不可变(但可分配)的成员,那么在没有 UB 的情况下,你可以这样布置:

#include <iostream>

class ConstC
{
    int c;
protected:
    ConstC(int n): c(n) {}
    int get() const { return c; }
};

class A: private ConstC
{
public:
    A(int n): ConstC(n) {}
    friend std::ostream& operator<< (std::ostream& os, const A& a)
    {
        return os << a.get();
    }
};

int main()
{
    A first(10);
    A second(20);
    std::cout << first << ' ' << second << '\n';
    first = second;
    std::cout << first << ' ' << second << '\n';
}

If you definitely want to have an immutable (but assignable) member, then without UB you can lay things out like this:

#include <iostream>

class ConstC
{
    int c;
protected:
    ConstC(int n): c(n) {}
    int get() const { return c; }
};

class A: private ConstC
{
public:
    A(int n): ConstC(n) {}
    friend std::ostream& operator<< (std::ostream& os, const A& a)
    {
        return os << a.get();
    }
};

int main()
{
    A first(10);
    A second(20);
    std::cout << first << ' ' << second << '\n';
    first = second;
    std::cout << first << ' ' << second << '\n';
}
救赎№ 2024-10-08 13:18:38

根据较新的 C++ 标准草案版本 N4861,它似乎不再是未定义的行为 (链接)

如果在一个对象的生命周期结束之后并且在该对象的存储之前
占用被重用或释放,在存储位置创建一个新对象
占用的原始对象、指向原始对象的指针、引用原始对象的引用或原始对象的名称将
自动引用新对象,并且一旦新对象的生命周期开始,如果原始对象可以透明地由新对象替换(见下文),则可以用于操作新对象。
对象 o1 可以透明地被对象 o2 替换,如果:

  • o2 占用的存储空间恰好覆盖 o1 占用的存储空间,并且
  • o1 和 o2 属于同一类型(忽略顶级 cv 限定符),并且
  • o1 不是一个完整的 const 对象,并且
  • o1 和 o2 都不是潜在重叠的子对象 ([intro.object]),并且
  • 要么 o1 和 o2 都是完整的对象,要么 o1 和 o2 分别是对象 p1 和 p2 的直接子对象,并且 p1 可以透明地被 p2 替换。

在这里你只能找到关于 const 的“o1 不是一个完整的 const 对象”,在本例中确实如此。但当然,您还必须确保不违反所有其他条件。

According to the newer C++ standard draft version N4861 it seems to be no longer undefined behaviour (link):

If, after the lifetime of an object has ended and before the storage which the object
occupied is reused or released, a new object is created at the storage location which
the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will
automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if the original object is transparently replaceable (see below) by the new object.
An object o1 is transparently replaceable by an object o2 if:

  • the storage that o2 occupies exactly overlays the storage that o1 occupied, and
  • o1 and o2 are of the same type (ignoring the top-level cv-qualifiers), and
  • o1 is not a complete const object, and
  • neither o1 nor o2 is a potentially-overlapping subobject ([intro.object]), and
  • either o1 and o2 are both complete objects, or o1 and o2 are direct subobjects of objects p1 and p2, respectively, and p1 is transparently replaceable by p2.

Here you can find only "o1 is not a complete const object" regarding const, which is true in this case. But of course you have to ensure that all other conditions are not violated, too.

杯别 2024-10-08 13:18:38

在没有其他(非 const )成员的情况下,无论是否存在未定义的行为,这都没有任何意义。

A& operator=(const A& assign) 
{ 
    *const_cast<int*> (&c)= assign.c;  // very very bad, IMHO, it is UB
    return *this; 
}

AFAIK,这不是这里发生的未定义行为,因为 c 不是 static const 实例,或者您无法调用复制赋值运算符。但是,const_cast 应该敲响警钟并告诉您出现了问题。 const_cast 主要设计用于解决非 const 正确的 API,但这里的情况似乎并非如此。

另外,在以下代码片段中:

A& operator=(const A& right)  
{  
    if (this == &right) return *this;  
    this->~A() 
    new (this) A(right); 
    return *this;  
}

您有两个主要风险,其中第一个风险已经指出。

  1. 如果同时存在 A 派生类的实例和虚拟析构函数,这将导致仅部分重建原始实例。
  2. 如果 new(this) A(right); 中的构造函数调用抛出异常,则您的对象将被销毁两次。在这种特殊情况下,这不会成为问题,但如果您碰巧进行了重大清理,您会后悔的。

编辑:如果您的类具有此 const 成员,则该成员不被视为对象中的“状态”(即,它是用于跟踪实例的某种 ID,并且不属于operator== 等中的比较),那么以下内容可能有意义:

A& operator=(const A& assign) 
{ 
    // Copy all but `const` member `c`.
    // ...

    return *this;
}

In absence of other (non-const) members, this doesn't make any sense at all, regardless of undefined behavior or not.

A& operator=(const A& assign) 
{ 
    *const_cast<int*> (&c)= assign.c;  // very very bad, IMHO, it is UB
    return *this; 
}

AFAIK, this is no undefined behavior happening here because c is not a static const instance, or you couldn't invoke the copy-assignment operator. However, const_cast should ring a bell and tell you something is wrong. const_cast was primarily designed to work around non const-correct APIs, and it doesn't seem to be the case here.

Also, in the following snippet:

A& operator=(const A& right)  
{  
    if (this == &right) return *this;  
    this->~A() 
    new (this) A(right); 
    return *this;  
}

You have two major risks, the 1st of which has already been pointed out.

  1. In presence of both an instance of derived class of A and a virtual destructor, this will lead to only partial reconstruction of the original instance.
  2. If the constructor call in new(this) A(right); throws an exception, your object will be destroyed twice. In this particular case, it won't be a problem, but if you happen to have significant cleanup, you're going to regret it.

Edit: if your class has this const member that is not considered "state" in your object (i.e. it is some sort of ID used for tracking instances and is not part of comparisons in operator== and the like), then the following might make sense:

A& operator=(const A& assign) 
{ 
    // Copy all but `const` member `c`.
    // ...

    return *this;
}

阅读此链接:

http://www.informit。 com/guides/content.aspx?g=cplusplus&seqNum=368

特别是...

这个技巧据称可以阻止代码
重复。然而,它有一些
严重缺陷。为了工作,C
析构函数必须分配 NULLify every
它已删除的指针,因为
随后的复制构造函数调用
可能会再次删除相同的指针
当它为 char 重新分配新值时
数组。

Have a read of this link:

http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=368

In particular...

This trick allegedly prevents code
reduplication. However, it has some
serious flaws. In order to work, C’s
destructor must assign NULLify every
pointer that it has deleted because
the subsequent copy constructor call
might delete the same pointers again
when it reassigns a new value to char
arrays.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文