const 成员和赋值运算符。如何避免未定义的行为?
我回答了有关std::vector of objects and const- Correctness,并收到有关未定义行为的评论。我不同意,因此我有一个问题。
考虑带有 const 成员的类:
class A {
public:
const int c; // must not be modified!
A(int c) : c(c) {}
A(const A& copy) : c(copy.c) { }
// No assignment operator
};
我想要一个赋值运算符,但我不想使用 const_cast
就像答案之一的以下代码所示:
A& operator=(const A& assign)
{
*const_cast<int*> (&c)= assign.c; // very very bad, IMHO, it is undefined behavior
return *this;
}
我的解决方案是
// Custom-defined assignment operator
A& operator=(const A& right)
{
if (this == &right) return *this;
// manually call the destructor of the old left-side object
// (`this`) in the assignment operation to clean it up
this->~A();
// use "placement new" syntax to copy-construct a new `A`
// object from `right` into left (at address `this`)
new (this) A(right);
return *this;
}
我是否有未定义的行为 (UB)?
没有 UB 的解决方案是什么?
I answered the question about std::vector of objects and const-correctness, and received a comment about undefined behavior. I do not agree and therefore I have a question.
Consider the class with const member:
class A {
public:
const int c; // must not be modified!
A(int c) : c(c) {}
A(const A& copy) : c(copy.c) { }
// No assignment operator
};
I want to have an assignment operator but I do not want to use const_cast
like in the following code from one of the answers:
A& operator=(const A& assign)
{
*const_cast<int*> (&c)= assign.c; // very very bad, IMHO, it is undefined behavior
return *this;
}
My solution is
// Custom-defined assignment operator
A& operator=(const A& right)
{
if (this == &right) return *this;
// manually call the destructor of the old left-side object
// (`this`) in the assignment operation to clean it up
this->~A();
// use "placement new" syntax to copy-construct a new `A`
// object from `right` into left (at address `this`)
new (this) A(right);
return *this;
}
Do I have undefined behavior (UB)?
What would be a solution without UB?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您的代码会导致未定义的行为。
不仅仅是“如果 A 用作基类以及这个、那个或另一个则未定义”。实际上始终是未定义的。
return *this
已经是 UB,因为this
不能保证引用新对象。具体来说,考虑 3.8/7:
现在,“在对象的生命周期结束后,在重用或释放该对象占用的存储空间之前,在原始对象占用的存储位置创建一个新对象”正是这样你正在做。
您的对象属于类类型,并且它确实包含类型为 const 限定的非静态数据成员。因此,在赋值运算符运行后,引用旧对象的指针、引用和名称不能保证引用新对象并可用于操作它。
作为可能出错的具体示例,请考虑:
期望此输出吗?
错误的!您可能会得到该输出,但 const 成员是 3.8/7 中所述规则的例外的原因是编译器可以将 xc 视为它声称的 const 对象。换句话说,编译器可以将此代码视为:
因为(非正式地)const 对象不会更改其值。当优化涉及 const 对象的代码时,这种保证的潜在价值应该是显而易见的。为了有任何方法可以在不调用 UB 的情况下修改
xc
,,必须删除此保证。所以,只要标准编写者没有错误地完成了他们的工作,就没有办法做你想做的事。[*] 事实上,我对使用
this
作为放置新的参数存有疑问 - 也许您应该首先将其复制到void*
并使用它。但我并不关心这是否是 UB,因为它不会保存整个函数。Your code causes undefined behavior.
Not just "undefined if A is used as a base class and this, that or the other". Actually undefined, always.
return *this
is already UB, becausethis
is not guaranteed to refer to the new object.Specifically, consider 3.8/7:
Now, "after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied" is exactly what you are doing.
Your object is of class type, and it does contain a non-static data member whose type is const-qualified. Therefore, after your assignment operator has run, pointers, references and names referring to the old object are not guaranteed to refer to the new object and to be usable to manipulate it.
As a concrete example of what might go wrong, consider:
Expect this output?
Wrong! It's plausible you might get that output, but the reason const members are an exception to the rule stated in 3.8/7, is so that the compiler can treat
x.c
as the const object that it claims to be. In other words, the compiler is allowed to treat this code as if it was:Because (informally) const objects do not change their values. The potential value of this guarantee when optimizing code involving const objects should be obvious. For there to be any way to modify
x.c
without invoking UB, this guarantee would have to be removed. So, as long as the standard writers have done their job without errors, there is no way to do what you want.[*] In fact I have my doubts about using
this
as the argument to placement new - possibly you should have copied it to avoid*
first, and used that. But I'm not bothered whether that specifically is UB, since it wouldn't save the function as a whole.第一:当您将数据成员设为
const
时,您就是在告诉编译器和全世界该数据成员永远不会改变。当然,那么你不能给它赋值,而且你当然一定不能欺骗编译器接受这样做的代码,无论这个伎俩多么聪明。您可以使用
const
数据成员或为所有数据成员分配一个赋值运算符。 两者不可兼得。至于您对问题的“解决方案”:
我想在为该对象调用的成员函数中调用该对象的析构函数会立即调用UB。 对未初始化的原始数据调用构造函数,以从成员函数中创建一个对象,该成员函数是为一个对象调用的,该对象位于现在对原始数据调用构造函数的位置...也非常对我来说听起来很像UB。 (天哪,光是把这个拼出来就让我的脚趾甲卷曲了。)而且,不,我没有这方面标准的章节和诗句。我讨厌阅读标准。我想我无法忍受它的仪表。
然而,抛开技术细节不谈,我承认您可能会在几乎所有平台上使用您的“解决方案”只要代码保持与示例中一样简单。尽管如此,这并不能使它成为一个好的解决方案。事实上,我认为这甚至不是一个可接受的解决方案,因为 IME 代码永远不会那么简单。多年来,它会被扩展、改变、变异和扭曲,然后它会悄无声息地失败,并且需要令人麻木的 36 小时轮班调试才能找到问题。我不知道你是怎么想的,但每当我发现一段像这样的代码可以带来 36 小时的调试乐趣时,我就想掐死那个对我做出这种事的可怜的傻瓜。
Herb Sutter,在他的GotW #23< /a>,逐条剖析这个想法,最后得出结论:它“充满陷阱,它经常是错误的,并且它使派生类的作者的生活变成了地狱...永远不要使用通过复制构造来实现复制分配的技巧使用显式析构函数,然后放置 new,尽管这个技巧每三个月就会出现在新闻组中”(强调我的)。
First: When you make a data member
const
, you're telling the compiler and all the world that this data member never changes. Of course then you cannot assign to it and you certainly must not trick the compiler into accepting code that does so, no matter how clever the trick.You can either have a
const
data member or an assignment operator assigning to all data members. You can't have both.As for your "solution" to the problem:
I suppose that calling the destructor on an object within a member function invoked for that objects would invoke UB right away. Invoking a constructor on uninitialized raw data to create an object from within a member function that's been invoked for an object that resided where now the constructor is invoked on raw data... also very much sounds like UB to me. (Hell, just spelling this out makes my toenails curl.) And, no, I don't have chapter and verse of the standard for that. I hate reading the standard. I think I can't stand its meter.
However, technicalities aside, I admit that you might get away with your "solution" on just about every platform as long as the code stays as simple as in your example. Still, this doesn't make it a good solution. In fact, I'd argue it's not even an acceptable solution, because IME code never stays as simple as that. Over the years it will get extended, changed, mutated, and twisted and then it will silently fail and require a mind-numbing 36hrs shift of debugging in order to find the problem. I don't know about you, but whenever I find a piece of code like this responsible for 36hrs of debugging fun I want to strangle the miserable dumb-wit who did this to me.
Herb Sutter, in his GotW #23, dissects this idea piece by piece and finally concludes that it "is full of pitfalls, it's often wrong, and it makes life a living hell for the authors of derived classes... never use the trick of implementing copy assignment in terms of copy construction by using an explicit destructor followed by placement new, even though this trick crops up every three months on the newsgroups" (emphasize mine).
如果 A 有 const 成员,你怎么可能分配给它呢?你正试图完成一些根本上不可能的事情。您的解决方案与原始解决方案相比没有新的行为,这不一定是 UB,但您的解决方案绝对是。
简单的事实是,您正在更改 const 成员。您要么需要取消您的成员,要么放弃赋值运算符。你的问题没有解决办法——这完全是矛盾的。
编辑以获得更清晰的信息:
Const 强制转换并不总是引入未定义的行为。然而,你肯定做到了。除此之外,在放入它之前不调用所有析构函数是未定义的 - 而且您甚至没有调用正确的析构函数,除非您确定 T 是一个 POD 类。此外,还有一些与各种形式的继承相关的未定义行为。
您确实会调用未定义的行为,并且可以通过不尝试分配给 const 对象来避免这种情况。
How can you possibly assign to an A if it has a const member? You're trying to accomplish something that's fundamentally impossible. Your solution has no new behaviour over the original, which is not necessarily UB but yours most definitely is.
The simple fact is, you're changing a const member. You either need to un-const your member, or ditch the assignment operator. There is no solution to your problem- it's a total contradiction.
Edit for more clarity:
Const cast does not always introduce undefined behaviour. You, however, most certainly did. Apart from anything else, it is undefined not to call all destructors- and you didn't even call the right one- before you placed into it unless you knew for certain that T is a POD class. In addition, there's owch-time undefined behaviours involved with various forms of inheritance.
You do invoke undefined behaviour, and you can avoid this by not trying to assign to a const object.
首先,您使用“placement new”作为实现赋值运算符
operator=()
的手段(我可能会说非常巧妙)的整个动机是:受到这个问题的煽动(std::vector of objects and const- Correctness ),现已作废。从 C++11 开始,该问题的代码现在没有错误。请参阅我的回答。其次, C++11 的
emplace()
函数现在的功能与您使用 placement new 的功能几乎完全相同,只是它们都是现在几乎由编译器本身保证按照 C++ 标准定义良好的行为。第三,当接受的答案指出:
我想知道这是否是因为
this
变量中包含的值可能会被放置新的复制构造操作更改,不是因为使用该类实例的任何内容都可能保留它的缓存值以及旧实例数据,而不是从内存中读取对象实例的新值。如果是前者,在我看来,您可以通过使用this
指针的临时副本来确保this
在赋值运算符函数中是正确的,如下所示:但是,如果与被缓存的对象有关,并且每次使用时都不会重新读取它,我想知道
易失性
是否可以解决这个问题!即:使用volatile const int c;
作为类成员,而不是const int c;
。第四,在我的回答的其余部分中,我重点关注应用于类成员的
易失性
的使用,看看这是否可以解决这两个潜在的未定义行为情况中的第二个:< /strong>您自己的解决方案中潜在的 UB:
您提到的潜在 UB 可能存在于其他解决方案中。
虽然我认为也许添加
易失性
可能会解决上面的两种情况,但我在这个答案的其余部分的重点是上面的第二种情况。tldr;
在我看来,如果您添加
volatile
并进行类成员变量volatile const int c;
,而不仅仅是const int c;
。我不能说这是一个好主意,但我认为抛弃const
并写入c
然后就成为定义明确的行为并且完全有效。否则,该行为未定义,只是因为c
的读取可能会被缓存和/或优化,因为它只是const
,而不是<代码>易失性。请阅读下面的内容以获取更多详细信息和理由,包括查看一些示例和一些汇编。
写入
const
成员只是未定义的行为......因为编译器可能会优化对变量的进一步读取,因为它是
const
。换句话说,即使您已经正确更新了内存中给定地址处包含的值,编译器也可能会告诉代码只返回寄存器中最后一个保存它第一次读取的值的内容,而不是返回到内存地址并在每次从该变量读取时实际检查新值。所以这:
可能是未定义的行为。它可能在某些情况下起作用,但在其他情况下不起作用,在某些编译器上起作用,但在其他编译器上不起作用,或者在某些版本的编译器上起作用,但在其他编译器上不起作用。我们不能依赖它来实现可预测的行为,因为该语言没有指定每次我们将变量设置为 const 然后写入和读取时会发生什么。
例如,这个程序(请参见此处:https://godbolt.org/z/EfPPba):
打印
5
(尽管我们希望它打印8
)并在main
中生成此程序集。 (请注意,我不是组装专家)。我已经标记了printf
行。您可以看到,即使8
写入该位置 (mov DWORD PTR [rax], 8
),printf
行也不会读取出这个新值。他们读出了之前存储的5
,因为他们不希望它发生变化,尽管它确实发生了变化。该行为未定义,因此在这种情况下省略读取。然而,写入
易失性 const
变量不是未定义的行为......因为
易失性
告诉编译器它最好读取以下内容:每次读取该变量时的实际内存位置,因为它可能随时更改!你可能会想:“这还有道理吗?” (有一个
易失性 const
变量。我的意思是:“什么可能会改变const
变量,使我们需要将其标记为易失性
!?)答案是:“嗯,是的!它确实有道理!”在微控制器和其他低级内存映射嵌入式设备上,一些寄存器(可能随时由底层硬件更改)是只读的。为了将它们标记为只读,仅在 C 或 C++ 中,我们将它们设置为 const,但为了确保编译器知道,每次我们读取变量时,它都会更好地实际读取其地址位置的内存, >我们不是依赖于保留先前缓存的值的优化,而是将它们标记为易失性
,因此,将地址0xF000
标记为只读8。 -bit 寄存器名为REG1
,我们会在头文件中的某个位置定义它:现在,我们可以随心所欲地读取它,并且每次我们要求代码读取变量,它会的。这是明确定义的行为。现在,我们可以执行类似的操作,并且此代码不会得到优化,因为编译器知道该寄存器值实际上可能会更改。在任何给定时间,因为它是
易失性
:并且,要将
REG2
标记为8位读/写寄存器,当然,我们只需删除常量。然而,在这两种情况下,都需要 volatile ,因为这些值可能会在任何给定时间由硬件更改,因此编译器最好不要对这些变量做出任何假设或尝试缓存它们的值并依赖于缓存读数。因此,以下不是未定义的行为!据我所知,这是非常明确定义的行为:
即使变量是 const,我们也可以抛弃 const 并写入它,编译器会尊重它并实际写入它。 而且,现在变量也标记为
易失性
,编译器将每次读取它,并且也尊重这一点,与阅读上面的REG1
或REG2
相同。因此,现在我们添加了
易失性
(请参见此处:https:// godbolt.org/z/6K8dcG):打印
8
,现在是正确的,并在main
中生成此程序集。我再次标记了printf
行。请注意我标记的新的和不同的行!这些是对程序集输出的唯一更改!其他每一行都完全相同。下面标记的新行将输出并实际读取变量的新值并将其存储到寄存器eax
中。接下来,在准备打印时,它不会像之前那样将硬编码的5
移动到寄存器esi
中,而是移动寄存器eax
的内容code>,刚刚被读取,现在包含一个8
,进入寄存器esi
。解决了!添加易失性
修复了它!这是一个更大的演示(在线运行:https://onlinegdb.com/HyU6fyCNv)。您可以看到,我们可以通过将变量转换为非常量引用或非常量指针来写入变量。
在所有情况下(转换为非常量引用或非常量指针以修改 const 值),我们可以使用 C++ 样式转换或 C 样式转换。
在上面的简单示例中,我验证了在所有四种情况下(甚至使用 C 样式强制转换来强制转换为引用:
(int&)(i) = 8;
,奇怪的是,因为 C没有引用:)) 程序集输出是相同的。示例输出:
注意:
const
类成员时,即使它们不是易失性
,上述方法也有效。请参阅我的“std_Optional_copy_test”程序!例如:https://onlinegdb.com/HkyNyTt4D。然而,这可能是未定义的行为。为了使其定义良好,请将成员变量设置为volatile const
,而不仅仅是const
。volatile
影响变量的读取,而不是变量的写入。因此,只要我们通过易失性变量方式读取变量(我们这样做了),就保证我们的读取不会被优化。这就是给我们明确定义的行为的原因。即使变量不是易失性
,写入始终有效。参考文献:
First off, the whole motivation for your (quite ingenious I might say) usage of "placement new" as a means of implementing the assignment operator,
operator=()
, as instigated by this question (std::vector of objects and const-correctness), is now nullified. As of C++11, that question's code now has no errors. See my answer here.Secondly, C++11's
emplace()
functions now do pretty much exactly what your usage of placement new was doing, except that they are all virtually guaranteed by the compilers themselves now to be well-defined behavior, per the C++ standard.Third, when the accepted answer states:
I wonder if this is because the value contained in the
this
variable might be changed by the placement new copy-construction operation, NOT because anything using that instance of the class might retain a cached value of it, with the old instance data, rather than read a new value of the object instance from memory. If the former, it seems to me you could ensurethis
is correct inside the assignment operator function by using a temporary copy of thethis
pointer, like this:But, if it has to do with an object being cached and not re-read each time it is used, I wonder if
volatile
would solve this! ie: usevolatile const int c;
as the class member instead ofconst int c;
.Fourth, in the rest of my answer I focus on the usage of
volatile
, as applied to the class members, to see if this might solve the 2nd of these two potential undefined behavior cases:The potential UB in your own solution:
The potential UB you mention may exist in the other solution.
Although I think perhaps adding
volatile
might fix both cases above, my focus in the rest of this answer is on the 2nd case just above.tldr;
It seems to me this (the 2nd case just above, in particular) becomes valid and well-defined behavior by the standard if you add
volatile
and make the class member variablevolatile const int c;
instead of justconst int c;
. I can't say this is a great idea, but I think casting awayconst
and writing toc
then becomes well-defined behavior and perfectly valid. Otherwise, the behavior is undefined only because reads ofc
may be cached and/or optimized out since it is onlyconst
, and not alsovolatile
.Read below for more details and justification, including a look at some examples and a little assembly.
Writing to
const
members is only undefined behavior......because the compiler may optimize out further reads to the variable, since it's
const
. In other words, even though you've correctly updated the value contained at a given address in memory, the compiler may tell the code to just regurgitate whatever was last in the register holding the value it first read, rather than going back to the memory address and actually checking for a new value each time you read from that variable.So this:
probably is undefined behavior. It may work in some cases but not others, on some compilers but not others, or in some versions of compilers, but not others. We can't rely on it to have predictable behavior because the language does not specify what should happen each and every time we set a variable as
const
and then write to and read from it.This program, for instance (see here: https://godbolt.org/z/EfPPba):
prints
5
(although we wanted it to print8
) and produces this assembly inmain
. (Note that I'm no assembly expert). I've marked theprintf
lines. You can see that even though8
is written to that location (mov DWORD PTR [rax], 8
), theprintf
lines do NOT read out that new value. They read out the previously-stored5
because they don't expect it to have changed, even though it did. The behavior is undefined, so the read is omitted in this case.Writing to
volatile const
variables, however, is not undefined behavior......because
volatile
tells the compiler it better read the contents at the actual memory location on every read to that variable, since it might change at any time!You might think: "Does this even make sense?" (having a
volatile const
variable. I mean: "what might change aconst
variable to make us need to mark itvolatile
!?) The answer is: "well, yes! It does make sense!" On microcontrollers and other low-level memory-mapped embedded devices, some registers, which could change at any moment by the underlying hardware, are read-only. To mark them read-only in C or C++ we make themconst
, but to ensure the compiler knows it better actually read the memory at their address location every single time we read the variable, rather than relying on optimizations which retain previously-cached values, we also mark them asvolatile
. So, to mark address0xF000
as a read-only 8-bit register namedREG1
, we'd define it like this in a header file somewhere:Now, we can read to it at our whim, and each and every time we ask the code to read the variable, it will. This is well-defined behavior. Now, we can do something like this, and this code will NOT get optimized out, because the compiler knows that this register value actually could change at any given time, since it's
volatile
:And, to mark
REG2
as an 8-bit read/write register, of course, we'd just removeconst
. In both cases, however,volatile
is required, as the values could change at any given time by the hardware, so the compiler better not make any assumptions about these variables or try to cache their values and rely on cached readings.Therefore, the following is not undefined behavior! This is very well-defined behavior as far as I can tell:
Even though the variable is
const
, we can cast awayconst
and write to it, and the compiler will respect that and actually write to it. And, now that the variable is also marked asvolatile
, the compiler will read it every single time, and respect that too, the same as readingREG1
orREG2
above.This program, therefore, now that we added
volatile
(see it here: https://godbolt.org/z/6K8dcG):prints
8
, which is now correct, and produces this assembly inmain
. Again, I've marked theprintf
lines. Notice the new and different lines I've marked too! These are the only changes to the assembly output! Every other line is exactly identical. The new line, marked below, goes out and actually reads the new value of the variable and stores it into registereax
. Next, in preparation for printing, instead of moving a hard-coded5
into registeresi
, as was done before, it moves the contents of registereax
, which is just read, and which now contains an8
, into registeresi
. Solved! Addingvolatile
fixed it!Here's a bigger demo (run it online: https://onlinegdb.com/HyU6fyCNv). You can see that we can write to a variable by casting it to a non-const reference OR a non-const pointer.
In all cases (casting to both non-const references or non-const pointers in order to modify the const value), we can use C++-style casts, OR C-style casts.
In the simple example above, I verified that in all four cases (even using a C-style cast to cast to a reference:
(int&)(i) = 8;
, oddly enough, since C doesn't have references :)) the assembly output was the same.Sample output:
Notes:
const
class members even when they are NOTvolatile
. See my "std_optional_copy_test" program! Ex: https://onlinegdb.com/HkyNyTt4D. This, however, is probably undefined behavior. To make it well-defined, make the member variablevolatile const
instead of justconst
.volatile const int
tovolatile int
(ie: why just toint
reference orint
pointer) works just fine, is becausevolatile
affects the reading of the variable, NOT the writing of the variable. So, so long as we read the variable through a volatile variable means, which we do, our reads are guaranteed not to be optimized out. That's what gives us the well-defined behavior. The writes always worked--even when the variable wasn'tvolatile
.Refences:
如果你确实想要一个不可变(但可分配)的成员,那么在没有 UB 的情况下,你可以这样布置:
If you definitely want to have an immutable (but assignable) member, then without UB you can lay things out like this:
根据较新的 C++ 标准草案版本 N4861,它似乎不再是未定义的行为 (链接):
在这里你只能找到关于 const 的“o1 不是一个完整的 const 对象”,在本例中确实如此。但当然,您还必须确保不违反所有其他条件。
According to the newer C++ standard draft version N4861 it seems to be no longer undefined behaviour (link):
Here you can find only "o1 is not a complete const object" regarding const, which is true in this case. But of course you have to ensure that all other conditions are not violated, too.
在没有其他(非 const )成员的情况下,无论是否存在未定义的行为,这都没有任何意义。
AFAIK,这不是这里发生的未定义行为,因为
c
不是static const
实例,或者您无法调用复制赋值运算符。但是,const_cast
应该敲响警钟并告诉您出现了问题。const_cast
主要设计用于解决非const
正确的 API,但这里的情况似乎并非如此。另外,在以下代码片段中:
您有两个主要风险,其中第一个风险已经指出。
A
派生类的实例和虚拟析构函数,这将导致仅部分重建原始实例。new(this) A(right);
中的构造函数调用抛出异常,则您的对象将被销毁两次。在这种特殊情况下,这不会成为问题,但如果您碰巧进行了重大清理,您会后悔的。编辑:如果您的类具有此
const
成员,则该成员不被视为对象中的“状态”(即,它是用于跟踪实例的某种 ID,并且不属于operator==
等中的比较),那么以下内容可能有意义:In absence of other (non-
const
) members, this doesn't make any sense at all, regardless of undefined behavior or not.AFAIK, this is no undefined behavior happening here because
c
is not astatic const
instance, or you couldn't invoke the copy-assignment operator. However,const_cast
should ring a bell and tell you something is wrong.const_cast
was primarily designed to work around nonconst
-correct APIs, and it doesn't seem to be the case here.Also, in the following snippet:
You have two major risks, the 1st of which has already been pointed out.
A
and a virtual destructor, this will lead to only partial reconstruction of the original instance.new(this) A(right);
throws an exception, your object will be destroyed twice. In this particular case, it won't be a problem, but if you happen to have significant cleanup, you're going to regret it.Edit: if your class has this
const
member that is not considered "state" in your object (i.e. it is some sort of ID used for tracking instances and is not part of comparisons inoperator==
and the like), then the following might make sense:阅读此链接:
http://www.informit。 com/guides/content.aspx?g=cplusplus&seqNum=368
特别是...
Have a read of this link:
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=368
In particular...