C 和 C++ 中联合的目的

发布于 2024-08-23 05:09:46 字数 1953 浏览 6 评论 0原文

我之前就很舒服地使用了 union;今天,当我阅读这篇文章并了解到这段代码时,我感到震惊

union ARGB
{
    uint32_t colour;

    struct componentsTag
    {
        uint8_t b;
        uint8_t g;
        uint8_t r;
        uint8_t a;
    } components;

} pixel;

pixel.colour = 0xff040201;  // ARGB::colour is the active member from now on

// somewhere down the line, without any edit to pixel

if(pixel.components.a)      // accessing the non-active member ARGB::components

实际上是未定义的行为,即从联盟成员中读取而不是最近写入的成员会导致未定义的行为。如果这不是 union 的预期用途,那什么才是?有人可以详细解释一下吗?

更新:

事后我想澄清一些事情。

  • 对于 C 和 C++,这个问题的答案并不相同;我年轻时无知的自己将其标记为 C 和 C++。
  • 在仔细研究了 C++11 的标准之后,我不能最终地说它要求访问/检查非活动联合成员是未定义/未指定/实现定义的。我能找到的只是§9.5/1: <块引用>

    如果一个标准布局联合包含多个共享公共初始序列的标准布局结构,并且该标准布局联合类型的对象包含其中一个标准布局结构,则允许检查公共初始序列任何标准布局结构成员的序列。 §9.2/19:如果相应的成员具有布局兼容的类型,并且两个成员都不是位字段,或者对于一个或多个初始序列来说,两个成员都是具有相同宽度的位字段,则两个标准布局结构共享一个公共初始序列成员。

  • 在 C 语言中,(C99 TC3 - DR 283 开始)这样做是合法的(感谢 Pascal Cuoq 提出这个问题)。但是,如果读取的值恰好对于所读取的类型无效(所谓的“陷阱表示”),则尝试执行此操作仍然可能导致未定义的行为。否则,读取的值是实现定义的。
  • C89/90 在未指定的行为(附件 J)下指出了这一点,K&R 的书说它是实现定义的。引自 K&R:

    <块引用>

    这就是联合的​​目的 - 一个可以合法保存多种类型中的任何一种的变量。 [...]只要用法一致:检索的类型必须是最近存储的类型。程序员有责任跟踪联合体中当前存储的类型;如果某些内容存储为一种类型并提取为另一种类型,则结果取决于实现。

  • 摘自 Stroustrup 的 TC++PL(重点是我的)

    <块引用>

    联合的使用对于数据的兼容性至关重要[...]有时被误用于“类型转换”。

最重要的是,提出这个问题(自我提出问题以来其标题保持不变)的目的是为了理解联合的目的,而不是标准允许的内容例如,当然,使用继承进行代码重用是, C++ 标准允许,但这不是目的或目的将继承作为 C++ 语言特性引入的初衷。这就是安德烈的答案继续被接受的原因。

I have used unions earlier comfortably; today I was alarmed when I read this post and came to know that this code

union ARGB
{
    uint32_t colour;

    struct componentsTag
    {
        uint8_t b;
        uint8_t g;
        uint8_t r;
        uint8_t a;
    } components;

} pixel;

pixel.colour = 0xff040201;  // ARGB::colour is the active member from now on

// somewhere down the line, without any edit to pixel

if(pixel.components.a)      // accessing the non-active member ARGB::components

is actually undefined behaviour I.e. reading from a member of the union other than the one recently written to leads to undefined behaviour. If this isn't the intended usage of unions, what is? Can some one please explain it elaborately?

Update:

I wanted to clarify a few things in hindsight.

  • The answer to the question isn't the same for C and C++; my ignorant younger self tagged it as both C and C++.
  • After scouring through C++11's standard I couldn't conclusively say that it calls out accessing/inspecting a non-active union member is undefined/unspecified/implementation-defined. All I could find was §9.5/1:

    If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members. §9.2/19: Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

  • While in C, (C99 TC3 - DR 283 onwards) it's legal to do so (thanks to Pascal Cuoq for bringing this up). However, attempting to do it can still lead to undefined behavior, if the value read happens to be invalid (so called "trap representation") for the type it is read through. Otherwise, the value read is implementation defined.
  • C89/90 called this out under unspecified behavior (Annex J) and K&R's book says it's implementation defined. Quote from K&R:

    This is the purpose of a union - a single variable that can legitimately hold any of one of several types. [...] so long as the usage is consistent: the type retrieved must be the type most recently stored. It is the programmer's responsibility to keep track of which type is currently stored in a union; the results are implementation-dependent if something is stored as one type and extracted as another.

  • Extract from Stroustrup's TC++PL (emphasis mine)

    Use of unions can be essential for compatness of data [...] sometimes misused for "type conversion".

Above all, this question (whose title remains unchanged since my ask) was posed with an intention of understanding the purpose of unions AND not on what the standard allows E.g. Using inheritance for code reuse is, of course, allowed by the C++ standard, but it wasn't the purpose or the original intention of introducing inheritance as a C++ language feature. This is the reason Andrey's answer continues to remain as the accepted one.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(18

燕归巢 2024-08-30 05:09:46

工会的目的相当明显,但由于某种原因,人们经常忽视它。

联合的目的是通过使用相同的内存区域在不同时间存储不同的对象来节省内存就是这样。

这就像酒店的房间。不同的人在其中居住的时间不重叠。这些人从来没有见过面,通常彼此也不了解。通过妥善管理房间的分时(即确保不同的人不会同时分配到一个房间),相对较小的酒店可以为相对较多的人提供住宿,这就是酒店的作用是为了.

这正是工会所做的。如果您知道程序中的多个对象保存的值具有不重叠的值生命周期,那么您可以将这些对象“合并”到一个联合中,从而节省内存。就像酒店房间在每个时刻最多有一个“活跃”租户一样,工会在计划时间的每个时刻最多有一个“活跃”成员。只能读取“活动”成员。通过写入其他成员,您可以将“活动”状态切换到该其他成员。

由于某种原因,工会的最初目的被完全不同的东西“覆盖”:编写工会的一名成员,然后通过另一名成员检查它。这种内存重新解释(又名“类型双关”)不是联合的有效使用。它通常会导致未定义的行为在 C89/90 中被描述为产生实现定义的行为。

编辑: 使用联合来实现类型双关(即编写一个成员然后读取另一个成员)在 C99 标准的技术勘误表之一中给出了更详细的定义(请参阅 DR#257DR#283)。但是,请记住,从形式上来说,这并不能防止您尝试读取陷阱表示而遇到未定义的行为。

The purpose of unions is rather obvious, but for some reason people miss it quite often.

The purpose of union is to save memory by using the same memory region for storing different objects at different times. That's it.

It is like a room in a hotel. Different people live in it for non-overlapping periods of time. These people never meet, and generally don't know anything about each other. By properly managing the time-sharing of the rooms (i.e. by making sure different people don't get assigned to one room at the same time), a relatively small hotel can provide accommodations to a relatively large number of people, which is what hotels are for.

That's exactly what union does. If you know that several objects in your program hold values with non-overlapping value-lifetimes, then you can "merge" these objects into a union and thus save memory. Just like a hotel room has at most one "active" tenant at each moment of time, a union has at most one "active" member at each moment of program time. Only the "active" member can be read. By writing into other member you switch the "active" status to that other member.

For some reason, this original purpose of the union got "overridden" with something completely different: writing one member of a union and then inspecting it through another member. This kind of memory reinterpretation (aka "type punning") is not a valid use of unions. It generally leads to undefined behavior is described as producing implementation-defined behavior in C89/90.

EDIT: Using unions for the purposes of type punning (i.e. writing one member and then reading another) was given a more detailed definition in one of the Technical Corrigenda to the C99 standard (see DR#257 and DR#283). However, keep in mind that formally this does not protect you from running into undefined behavior by attempting to read a trap representation.

最偏执的依靠 2024-08-30 05:09:46

您可以使用联合来创建如下所示的结构,其中包含一个字段,该字段告诉我们实际使用了联合的哪个组件:

struct VAROBJECT
{
    enum o_t { Int, Double, String } objectType;

    union
    {
        int intValue;
        double dblValue;
        char *strValue;
    } value;
} object;

You could use unions to create structs like the following, which contains a field that tells us which component of the union is actually used:

struct VAROBJECT
{
    enum o_t { Int, Double, String } objectType;

    union
    {
        int intValue;
        double dblValue;
        char *strValue;
    } value;
} object;
权谋诡计 2024-08-30 05:09:46

从语言的角度来看,该行为是未定义的。请考虑不同的平台在内存对齐和字节序方面可能有不同的约束。大端机和小端机中的代码将以不同的方式更新结构中的值。修复语言中的行为将要求所有实现使用相同的字节顺序(和内存对齐约束......)限制使用。

如果您使用 C++(您使用两个标签)并且您确实关心可移植性,那么您可以只使用该结构并提供一个设置器,该设置器采用 uint32_t 并通过位掩码操作适当地设置字段。在 C 中可以用函数完成同样的事情。

编辑:我期待程序员写下投票的答案并关闭此投票。正如一些评论所指出的,标准的其他部分通过让每个实现决定要做什么来处理字节序,并且对齐和填充也可以以不同的方式处理。现在,AProgrammer 隐式引用的严格别名规则是这里的重点。编译器可以对变量的修改(或未修改)做出假设。在联合的情况下,编译器可以重新排序指令并将每个颜色分量的读取移动到颜色变量的写入上。

The behavior is undefined from the language point of view. Consider that different platforms can have different constraints in memory alignment and endianness. The code in a big endian versus a little endian machine will update the values in the struct differently. Fixing the behavior in the language would require all implementations to use the same endianness (and memory alignment constraints...) limiting use.

If you are using C++ (you are using two tags) and you really care about portability, then you can just use the struct and provide a setter that takes the uint32_t and sets the fields appropriately through bitmask operations. The same can be done in C with a function.

Edit: I was expecting AProgrammer to write down an answer to vote and close this one. As some comments have pointed out, endianness is dealt in other parts of the standard by letting each implementation decide what to do, and alignment and padding can also be handled differently. Now, the strict aliasing rules that AProgrammer implicitly refers to are a important point here. The compiler is allowed to make assumptions on the modification (or lack of modification) of variables. In the case of the union, the compiler could reorder instructions and move the read of each color component over the write to the colour variable.

多情出卖 2024-08-30 05:09:46

我经常遇到的union常见用途是别名

考虑以下问题:

union Vector3f
{
  struct{ float x,y,z ; } ;
  float elts[3];
}

这有什么作用?它允许通过任一名称干净、整洁地访问Vector3f vec;的成员:

vec.x=vec.y=vec.z=1.f ;

或通过对数组的整数访问

for( int i = 0 ; i < 3 ; i++ )
  vec.elts[i]=1.f;

在某些情况下,按名称访问是最清楚的事情你可以做的。在其他情况下,特别是当以编程方式选择轴时,更容易的做法是通过数字索引访问轴 - 0 表示 x,1 表示 y,2 表示 z。

The most common use of union I regularly come across is aliasing.

Consider the following:

union Vector3f
{
  struct{ float x,y,z ; } ;
  float elts[3];
}

What does this do? It allows clean, neat access of a Vector3f vec;'s members by either name:

vec.x=vec.y=vec.z=1.f ;

or by integer access into the array

for( int i = 0 ; i < 3 ; i++ )
  vec.elts[i]=1.f;

In some cases, accessing by name is the clearest thing you can do. In other cases, especially when the axis is chosen programmatically, the easier thing to do is to access the axis by numerical index - 0 for x, 1 for y, and 2 for z.

浪推晚风 2024-08-30 05:09:46

正如您所说,这是严格未定义的行为,尽管它可以在许多平台上“工作”。使用联合的真正原因是创建变体记录。

union A {
   int i;
   double d;
};

A a[10];    // records in "a" can be either ints or doubles 
a[0].i = 42;
a[1].d = 1.23;

当然,您还需要某种鉴别器来说明变体实际包含的内容。请注意,在 C++ 中,联合体没有多大用处,因为它们只能包含 POD 类型 - 实际上是那些没有构造函数和析构函数的类型。

As you say, this is strictly undefined behaviour, though it will "work" on many platforms. The real reason for using unions is to create variant records.

union A {
   int i;
   double d;
};

A a[10];    // records in "a" can be either ints or doubles 
a[0].i = 42;
a[1].d = 1.23;

Of course, you also need some sort of discriminator to say what the variant actually contains. And note that in C++ unions are not much use because they can only contain POD types - effectively those without constructors and destructors.

软甜啾 2024-08-30 05:09:46

在 C 中,这是实现变体之类的东西的好方法。

enum possibleTypes{
  eInt,
  eDouble,
  eChar
}


struct Value{

    union Value {
      int iVal_;
      double dval;
      char cVal;
    } value_;
    possibleTypes discriminator_;
} 

switch(val.discriminator_)
{
  case eInt: val.value_.iVal_; break;

在内存不足的情况下,该结构比具有所有成员的结构使用更少的内存。

方式。

    typedef struct {
      unsigned int mantissa_low:32;      //mantissa
      unsigned int mantissa_high:20;
      unsigned int exponent:11;         //exponent
      unsigned int sign:1;
    } realVal;

顺便说一句,C 提供了访问位值的

In C it was a nice way to implement something like an variant.

enum possibleTypes{
  eInt,
  eDouble,
  eChar
}


struct Value{

    union Value {
      int iVal_;
      double dval;
      char cVal;
    } value_;
    possibleTypes discriminator_;
} 

switch(val.discriminator_)
{
  case eInt: val.value_.iVal_; break;

In times of litlle memory this structure is using less memory than a struct that has all the member.

By the way C provides

    typedef struct {
      unsigned int mantissa_low:32;      //mantissa
      unsigned int mantissa_high:20;
      unsigned int exponent:11;         //exponent
      unsigned int sign:1;
    } realVal;

to access bit values.

小鸟爱天空丶 2024-08-30 05:09:46

尽管这是严格未定义的行为,但实际上它几乎适用于任何编译器。这是一种广泛使用的范例,任何有自尊的编译器在这种情况下都需要做“正确的事情”。它肯定比类型双关更受青睐,类型双关很可能会在某些编译器中生成损坏的代码。

Although this is strictly undefined behaviour, in practice it will work with pretty much any compiler. It is such a widely used paradigm that any self-respecting compiler will need to do "the right thing" in cases such as this. It's certainly to be preferred over type-punning, which may well generate broken code with some compilers.

初吻给了烟 2024-08-30 05:09:46

在 C++ 中,Boost Variant 实现了union,旨在尽可能防止未定义的行为。

它的性能与 enum + union 构造相同(也分配了堆栈等),但它使用类型模板列表而不是 enum :)

In C++, Boost Variant implement a safe version of the union, designed to prevent undefined behavior as much as possible.

Its performances are identical to the enum + union construct (stack allocated too etc) but it uses a template list of types instead of the enum :)

你是暖光i 2024-08-30 05:09:46

该行为可能是未定义的,但这仅仅意味着没有“标准”。所有不错的编译器都提供 #pragmas控制打包和对齐,但可能有不同的默认值。默认值也会根据所使用的优化设置而变化。

此外,联合不仅仅是为了节省空间。它们可以帮助现代编译器进行类型双关。如果您reinterpret_cast<>一切,编译器将无法对您正在做的事情做出假设。它可能必须丢弃它所知道的关于您的类型的信息并重新开始(强制写回内存,与 CPU 时钟速度相比,现在的效率非常低)。

The behaviour may be undefined, but that just means there isn't a "standard". All decent compilers offer #pragmas to control packing and alignment, but may have different defaults. The defaults will also change depending on the optimisation settings used.

Also, unions are not just for saving space. They can help modern compilers with type punning. If you reinterpret_cast<> everything the compiler can't make assumptions about what you are doing. It may have to throw away what it knows about your type and start again (forcing a write back to memory, which is very inefficient these days compared to CPU clock speed).

寒冷纷飞旳雪 2024-08-30 05:09:46

从技术上讲,它是未定义的,但实际上大多数(全部?)编译器将其视为与使用从一种类型到另一种类型的reinterpret_cast完全相同,其结果是实现定义的。我不会因为你当前的代码而失眠。

Technically it's undefined, but in reality most (all?) compilers treat it exactly the same as using a reinterpret_cast from one type to the other, the result of which is implementation defined. I wouldn't lose sleep over your current code.

ι不睡觉的鱼゛ 2024-08-30 05:09:46

再举一个实际使用联合的例子,CORBA 框架使用标记联合方法来序列化对象。所有用户定义的类都是一个(巨大)联合的成员,并且 整数标识符 告诉demarshaller 如何解释联合。

For one more example of the actual use of unions, the CORBA framework serializes objects using the tagged union approach. All user-defined classes are members of one (huge) union, and an integer identifier tells the demarshaller how to interpret the union.

寄意 2024-08-30 05:09:46

其他人提到了架构差异(小端 - 大端)。

我读到的问题是,由于变量的内存是共享的,因此通过写入一个变量,其他变量会发生变化,并且根据它们的类型,该值可能毫无意义。

例如。
联盟{
浮动 f;
整数我;
} x;

如果您随后从 xf 读取数据,则写入 xi 将毫无意义 - 除非您想要查看浮点数的符号、指数或尾数部分。

我认为还存在一个对齐问题:如果某些变量必须进行字对齐,那么您可能不会得到预期的结果。

例如。
联盟{
字符c[4];
整数我;
} x;

假设,在某些机器上,如果 char 必须字对齐,则 c[0] 和 c[1] 将与 i 共享存储,但 c[2] 和 c[3] 不会。

Others have mentioned the architecture differences (little - big endian).

I read the problem that since the memory for the variables is shared, then by writing to one, the others change and, depending on their type, the value could be meaningless.

eg.
union{
float f;
int i;
} x;

Writing to x.i would be meaningless if you then read from x.f - unless that is what you intended in order to look at the sign, exponent or mantissa components of the float.

I think there is also an issue of alignment: If some variables must be word aligned then you might not get the expected result.

eg.
union{
char c[4];
int i;
} x;

If, hypothetically, on some machine a char had to be word aligned then c[0] and c[1] would share storage with i but not c[2] and c[3].

沫雨熙 2024-08-30 05:09:46

在 1974 年记录的 C 语言中,所有结构成员共享一个公共命名空间,“ptr->member”的含义定义为添加
成员位移到“ptr”并使用以下命令访问结果地址
会员类型。这种设计使得可以与成员使用相同的 ptr
名称取自不同的结构定义,但具有相同的偏移量;
程序员将这种能力用于多种目的。

当结构成员被分配自己的命名空间时,这就变得不可能了
声明两个具有相同位移的结构成员。将工会添加到
该语言使得实现与之前相同的语义成为可能
在该语言的早期版本中可用(尽管无法
导出到封闭上下文的名称可能仍然需要使用
find/replace 将 foo->member 替换为 foo->type1.member)。什么是
重要的并不是加入工会的人有什么特别的
考虑到目标用途,而是它们提供了一种方法,程序员可以通过这些方法
无论出于何种目的,曾经依赖早期语义的人仍然应该
即使必须使用不同的语义,也能够实现相同的语义
语法来做到这一点。

In the C language as it was documented in 1974, all structure members shared a common namespace, and the meaning of "ptr->member" was defined as adding the
member's displacement to "ptr" and accessing the resulting address using the
member's type. This design made it possible to use the same ptr with member
names taken from different structure definitions but with the same offset;
programmers used that ability for a variety of purposes.

When structure members were assigned their own namespaces, it became impossible
to declare two structure members with the same displacement. Adding unions to
the language made it possible to achieve the same semantics that had been
available in earlier versions of the language (though the inability to have
names exported to an enclosing context may have still necessitated using a
find/replace to replace foo->member into foo->type1.member). What was
important was not so much that the people who added unions have any particular
target usage in mind, but rather that they provide a means by which programmers
who had relied upon the earlier semantics, for whatever purpose, should still
be able to achieve the same semantics even if they had to use a different
syntax to do it.

夏の忆 2024-08-30 05:09:46

正如其他人提到的,联合与枚举结合并包装到结构中可用于实现标记联合。一个实际用途是实现 Rust 的 Result,它最初是使用纯 enum 实现的(Rust 可以在枚举变体中保存附加数据)。这是一个 C++ 示例:

template <typename T, typename E> struct Result {
    public:
    enum class Success : uint8_t { Ok, Err };
    Result(T val) {
        m_success = Success::Ok;
        m_value.ok = val;
    }
    Result(E val) {
        m_success = Success::Err;
        m_value.err = val;
    }
    inline bool operator==(const Result& other) {
        return other.m_success == this->m_success;
    }
    inline bool operator!=(const Result& other) {
        return other.m_success != this->m_success;
    }
    inline T expect(const char* errorMsg) {
        if (m_success == Success::Err) throw errorMsg;
        else return m_value.ok;
    }
    inline bool is_ok() {
        return m_success == Success::Ok;
    }
    inline bool is_err() {
        return m_success == Success::Err;
    }
    inline const T* ok() {
        if (is_ok()) return m_value.ok;
        else return nullptr;
    }
    inline const T* err() {
        if (is_err()) return m_value.err;
        else return nullptr;
    }

    // Other methods from https://doc.rust-lang.org/std/result/enum.Result.html

    private:
    Success m_success;
    union _val_t { T ok; E err; } m_value;
}

As others mentioned, unions combined with enumerations and wrapped into structs can be used to implement tagged unions. One practical use is to implement Rust's Result<T, E>, which is originally implemented using a pure enum (Rust can hold additional data in enumeration variants). Here is a C++ example:

template <typename T, typename E> struct Result {
    public:
    enum class Success : uint8_t { Ok, Err };
    Result(T val) {
        m_success = Success::Ok;
        m_value.ok = val;
    }
    Result(E val) {
        m_success = Success::Err;
        m_value.err = val;
    }
    inline bool operator==(const Result& other) {
        return other.m_success == this->m_success;
    }
    inline bool operator!=(const Result& other) {
        return other.m_success != this->m_success;
    }
    inline T expect(const char* errorMsg) {
        if (m_success == Success::Err) throw errorMsg;
        else return m_value.ok;
    }
    inline bool is_ok() {
        return m_success == Success::Ok;
    }
    inline bool is_err() {
        return m_success == Success::Err;
    }
    inline const T* ok() {
        if (is_ok()) return m_value.ok;
        else return nullptr;
    }
    inline const T* err() {
        if (is_err()) return m_value.err;
        else return nullptr;
    }

    // Other methods from https://doc.rust-lang.org/std/result/enum.Result.html

    private:
    Success m_success;
    union _val_t { T ok; E err; } m_value;
}
寒尘 2024-08-30 05:09:46

@bobobobo 代码是正确的,正如 @Joshua 指出的那样(遗憾的是我不允许添加注释,所以在这里这样做,IMO 错误的决定首先禁止它):

https://en.cppreference.com/w/cpp/language/data_members#Standard_layout 告诉这样做很好,至少从 C++14 开始

在具有非联合类类型 T1 的活动成员的标准布局联合中,允许读取非联合类类型 T2 的另一个联合成员的非静态数据成员 m,前提是 m 是T1 和 T2 的共同初始序列(除了通过非易失性泛左值读取易失性成员未定义)。

因为在当前情况下,T1 和 T2 无论如何都会捐赠相同的类型。

@bobobobo code is correct as @Joshua pointed out (sadly I'm not allowed to add comments, so doing it here, IMO bad decision to disallow it in first place):

https://en.cppreference.com/w/cpp/language/data_members#Standard_layout tells that it is fine to do so, at least since C++14

In a standard-layout union with an active member of non-union class type T1, it is permitted to read a non-static data member m of another union member of non-union class type T2 provided m is part of the common initial sequence of T1 and T2 (except that reading a volatile member through non-volatile glvalue is undefined).

since in the current case T1 and T2 donate the same type anyway.

淡紫姑娘! 2024-08-30 05:09:46

您可以使用联合有两个主要原因:

  1. 一种以不同方式访问相同数据的便捷方法,如您的示例中所示
  2. 一种当存在不同数据成员且只能有一个可以访问时节省空间的方法be 'active'

1 实际上更像是一种 C 风格的 hack,可以在您了解目标系统的内存架构如何工作的基础上快捷地编写代码。正如已经说过的,如果您实际上并不针对许多不同的平台,那么您通常可以侥幸逃脱。我相信某些编译器可能还允许您使用打包指令(我知道它们在结构上这样做)?

2. 的一个很好的例子可以在使用的 VARIANT 类型中找到广泛应用于COM。

You can use a a union for two main reasons:

  1. A handy way to access the same data in different ways, like in your example
  2. A way to save space when there are different data members of which only one can ever be 'active'

1 Is really more of a C-style hack to short-cut writing code on the basis you know how the target system's memory architecture works. As already said you can normally get away with it if you don't actually target lots of different platforms. I believe some compilers might let you use packing directives also (I know they do on structs)?

A good example of 2. can be found in the VARIANT type used extensively in COM.

扛刀软妹 2024-08-30 05:09:46

在最近的 C++ 会议视频中,Bjarne Stroustrup 回答了有关联合类型双关的问题,他表示 C 允许的内容与 C++ 允许的内容之间的矛盾是 C++ 必须解决的真正问题。 (如果我能再次找到它,请提供链接。)

尽管这是官方未定义的行为,但在大多数情况下,主要的 C++ 编译器将生成符合您期望的代码,因为 (1) 它们预计将 C 代码编译为以及 C++,并且 (2) 有很多现有代码依赖于它。

所以,是的,在 C++ 中这是未定义的行为,您应该使用 std::bit_cast 或其他批准的类型双关方法。也就是说,如果您使用工会,那么您的同伴就非常好。


接受的答案强调联合的唯一目的是节省内存。这当然是当今 C++ 标准的观点,但我认为它太容易忽视历史背景了。

  • C 编程语言中引入联合时,Kernighan 和 Ritchie 说:“[联合]类似于 pascal 中的变体记录。” Niklaus Wirth 的 Pascal 大约比 C 早两年问世,Wirth 通过类型双关的示例说明了变体记录是如何工作的。

  • 在第二版《C 编程语言》(已更新以反映 ANSI 标准化版本)中,K&R 表示:“程序员有责任跟踪当前存储的类型在联合中;如果某些内容存储为一种类型并提取为另一种类型,则结果取决于实现。”诚然,K&R 书籍不是标准文档,但它们说明了有多少程序员学习了该语言。

  • K&R 展示了如何使用联合来确保两种类型具有相同的对齐要求。你为什么需要那个?输入双关语。

  • C 长期以来一直被认为是“可移植的汇编语言”,因为它为程序员提供了几乎与汇编程序员一样多的控制权,而无需将他们绑定到特定的指令集。在 CPU 级别,寄存器值没有类型。当类型确实很重要时,指令指定是否将寄存器视为有符号或无符号整数、位掩码、指针等。当接近金属编程时,很自然地能够将一组位视为不同的类型从指令到指令。

    C 为您提供了类型检查的一些好处,而无需完全牺牲基本类型的类型流动性。 (您可以在与算术运算相同的表达式中执行按位运算、将指针强制转换为整数类型并返回、执行指针算术等)。对于程序员来说,期望能够在不同的时间将一组位视为不同的类型是很自然的,而联合非常适合这一点。

  • 在 Bjarne Stroustrup 开始在 Cfront 上工作之前,类型双关的联合是常见的做法。

  • C++ 试图与 C 兼容,同时又对对象生命周期提供更多保证,从而陷入了困境。 (我认为这些都是良好的目标,但也许不现实,尽管这可能很难预料。) 工会个别成员的建设和破坏是一个难题。每个解决方案都必须牺牲一些东西。

  • 同时,事情在很大程度上在 C 中有效。从技术上讲,您可能不应该读取不是活跃成员的联合成员,但如果它与您的实现一起工作,那么实际上并没有被禁止。大多数实现在大多数情况下都有效。随着硬件变得更加复杂,关于陷阱值的警告后来出现了。

  • 直到最近,C++ 才提供了 std::variant,这是一种可区分的联合。 (在那之前,我想我们都应该写自己的。)如果联合总是只是为了节省内存,那么两种语言不应该从一开始就提供一个有区别的联合,然后废除另一种吗?

    请注意,在某些情况下,通过将带有标记的联合放在结构中,可以首先消除使用联合的内存节省值。即使标签是一个很小的值,联合体的整体对齐要求也可能需要在联合体和标签之间进行填充。

再说一遍,我不同意已接受的答案。但这种纯粹为了保存记忆的主张忽略了让我们走到这一步的大量历史。联合仍然经常用于类型双关,因为没有很多更好的选择。因此,编译器供应商继续这样做是因为他们必须这样做,尽管标准已经给了他们机会。 (而且因为编译器本身依赖于它。)我不知道它将如何发挥作用,但标准必须不断发展,要么找到一种方法来支持最常见的滥用行为,要么提供一个现实的替代方案。

In a recent C++ conference video, Bjarne Stroustrup answered a question about type punning with unions, saying that the contradiction between what C allows and what C++ allows is a real problem for C++ that must be resolved. (Link to come if I can find it again.)

Even though it's officially undefined behavior, the major C++ compilers will, in most cases, produce code that does what you'd expect because (1) they are expected to compile C code as well as C++, and (2) there's a lot of existing code that depends on it.

So, yeah, in C++ it's undefined behavior and you should use std::bit_cast or another approved method for type punning. That said, if you use a union, you're in very good company.


The accepted answer emphatically claims that the sole purpose union is to save memory. That's certainly the point of view of today's C++ standard, but I think it too readily dismisses the historical context.

  • When introducing unions in The C Programming Language, Kernighan and Ritchie say, "[Unions] are analogous to variant records in pascal." Niklaus Wirth's Pascal came out about two years before C, and Wirth illustrated how variant records work with an example of type punning.

  • In the second edition The C Programming Language, which was updated to reflect the ANSI-standardized version, K&R say, "It is the programmer's responsibility to keep track of which type is currently stored in a union; the results are implementation-dependent if something is stored as one type and extracted as another." Admittedly, the K&R books are not standards documents, but they are how many, many programmers learned the language.

  • K&R show how to use a union to ensure two types are treated with the same alignment requirements. Why would you need that? Type punning.

  • C has long been regarded as "portable assembly language" because it gives programmers almost as much control as an assembly programmer without tying them to a particular instruction set. At the CPU level, register values don't have types. When the type does matter, the instruction specifies whether to treat the register as a signed or unsigned integer, a bitmask, a pointer, etc. When programming close to the metal, it's natural to be able to treat a blob of bits as different types from instruction to instruction.

    C gives you some of the benefits of type checking without completely sacrificing the type fluidity of the fundamental types. (You can do bitwise operations in the same expression as arithmetic operations, cast pointers to integral types and back, perform pointer arithmetic, etc.). It's rather natural for programmers to expect to be able to treat a blob of bits as different types at different times, and a union is a natural fit for that.

  • Unions for type punning were common practice before Bjarne Stroustrup started working on Cfront.

  • C++ got itself into a quandary by trying to be compatible with C while simultaneously giving more guarantees about object lifetimes. (I think those were both well intended goals, but perhaps not realistic, although that was probably hard to anticipate.) Construction and destruction of individual members of a union is a hard problem. Every solution has to sacrifice something.

  • Meanwhile, things largely worked in C. Technically, you probably shouldn't read a union member that's not the active one, but if it worked with your implementation, it wasn't actually forbidden. And most implementations worked in most cases. The caveat about trap values came later, as hardware grew more sophisticated.

  • Only relatively recently did C++ provide std::variant, a discriminated union. (Before then, I guess we were all supposed to write our own.) If unions were always solely about saving memory, then shouldn't both languages have offered a discriminated union from the beginning and just done away with the other?

    Note that by putting a union with a tag in a struct, can, in some cases, nullify the memory saving value of using a union in the first place. Even if the tag is a small value, the overall alignment requirement of the union may require padding between the union and the tag.

Again, I don't disagree with the accepted answer. But the solely-to-save-memory claim ignores a bunch of history that got us to this point. Unions are still often used for type punning because there aren't a lot of better choices. So the compiler vendors keep this working because they have to, despite the fact that the standards have given them an out. (And because the compilers themselves depend on it.) I don't know how it will play out, but the standards are going to have to evolve to either find a way to bless the most common abuses or to provide a realistic alternative.

枫以 2024-08-30 05:09:46
The main purpose of using "union" in C/C++ is to provide a datatype  that could store anything. 

union a {
 char a;
 int b;
float c;
}; 

std::variant<int ,char,float,double,long> v; // std::variant in c++17

for instance , union a would allow to store either of char, int , float.
An example could be container that contains hetergenous data types or when we want to handle different data types via single interface.

Modern C++ have options that are type safe std::variant in c++17. C++17 std::variant have more usage in scenarios where a data is coming from a source  and we donot know the type(it could be either listed in and then we have to take action on the types being recevied). Visitor pattern is one of the classical use case where std::variant is used and std::visit in c++17 can also be used.
The main purpose of using "union" in C/C++ is to provide a datatype  that could store anything. 

union a {
 char a;
 int b;
float c;
}; 

std::variant<int ,char,float,double,long> v; // std::variant in c++17

for instance , union a would allow to store either of char, int , float.
An example could be container that contains hetergenous data types or when we want to handle different data types via single interface.

Modern C++ have options that are type safe std::variant in c++17. C++17 std::variant have more usage in scenarios where a data is coming from a source  and we donot know the type(it could be either listed in and then we have to take action on the types being recevied). Visitor pattern is one of the classical use case where std::variant is used and std::visit in c++17 can also be used.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文