为什么可变长度数组不是 C++ 的一部分？标准？

发布于 2024-08-14 13:14:53 字数 1101 浏览 20 评论 0原文

过去几年我很少使用C。当我读到这个问题今天我遇到了一些我不熟悉的C语法。

显然，在 C99 中，以下语法是有效的：

void foo(int n) {
    int values[n]; //Declare a variable length array
}

这似乎是一个非常有用的功能。是否曾经讨论过将其添加到 C++ 标准中？如果有，为什么会被省略？

一些潜在的原因：

编译器供应商实现起来很困难
与标准的某些其他部分不兼容
功能可以用其他 C++ 构造来模拟

C++ 标准规定数组大小必须是常量表达式 (8.3.4.1)。

是的，我当然意识到在玩具示例中可以使用 std::vector; value(m);，但这从堆而不是堆栈分配内存。如果我想要一个多维数组，例如：

void foo(int x, int y, int z) {
    int values[x][y][z]; // Declare a variable length array
}

向量版本变得相当笨拙：

void foo(int x, int y, int z) {
    vector< vector< vector<int> > > values( /* Really painful expression here. */);
}

切片、行和列也可能分布在整个内存中。

看看 comp.std.c++ 中的讨论，很明显这个问题很有争议，争论双方都有一些非常重量级的名字。 std::vector 是否总是更好的解决方案当然并不明显。

原文

I haven't used C very much in the last few years. When I read this question today I came across some C syntax which I wasn't familiar with.

Apparently in C99 the following syntax is valid:

void foo(int n) {
    int values[n]; //Declare a variable length array
}

This seems like a pretty useful feature. Was there ever a discussion about adding it to the C++ standard, and if so, why it was omitted?

Some potential reasons:

Hairy for compiler vendors to implement
Incompatible with some other part of the standard
Functionality can be emulated with other C++ constructs

The C++ standard states that array size must be a constant expression (8.3.4.1).

Yes, of course I realize that in the toy example one could use std::vector<int> values(m);, but this allocates memory from the heap and not the stack. And if I want a multidimensional array like:

void foo(int x, int y, int z) {
    int values[x][y][z]; // Declare a variable length array
}

the vector version becomes pretty clumsy:

void foo(int x, int y, int z) {
    vector< vector< vector<int> > > values( /* Really painful expression here. */);
}

The slices, rows and columns will also potentially be spread all over memory.

Looking at the discussion at comp.std.c++ it's clear that this question is pretty controversial with some very heavyweight names on both sides of the argument. It's certainly not obvious that a std::vector is always a better solution.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

从来不烧饼 2024-08-21 13:14:53

（背景：我有一些实现 C 和 C++ 编译器的经验。）

C99 中的可变长度数组基本上是一个失误。为了支持 VLA，C99 必须对常识做出以下让步：

sizeof x 不再始终是编译时常量；编译器有时必须生成代码来在运行时计算 sizeof 表达式。
允许二维 VLA (int A[x][y]) 需要一种新语法来声明采用 2D VLA 作为参数的函数：void foo(int n, int A [][*])。
在 C++ 世界中不太重要，但对于 C 的嵌入式系统程序员的目标受众来说却极其重要，声明 VLA 意味着占用堆栈的任意大块。这是保证的堆栈溢出和崩溃。（任何时候你声明int A[n]，你就隐含地断言你有2GB的空闲堆栈。毕竟，如果你知道“n”肯定小于这里是 1000”，那么您只需声明 int A[1000]。用 32 位整数 n 替换 1000 就等于承认您不知道你的程序应该有什么样的行为。）

好的，现在让我们开始讨论 C++。在 C++ 中，我们在“类型系统”和“值系统”之间有与 C89 相同的强烈区别……但我们确实开始以 C 所没有的方式依赖它。例如：

template<typename T> struct S { ... };
int A[n];
S<decltype(A)> s;  // equivalently, S<int[n]> s;

如果 n 不是编译时常量（即，如果 A 是可变修改类型），那么 的类型到底是什么>S？ S 的类型也仅在运行时确定吗？

怎么样：

template<typename T> bool myfunc(T& t1, T& t2) { ... };
int A1[n1], A2[n2];
myfunc(A1, A2);

编译器必须为 myfunc 的某些实例化生成代码。该代码应该是什么样的？如果我们在编译时不知道 A1 的类型，我们如何静态生成该代码？

更糟糕的是，如果在运行时结果是 n1 != n2，那么 !std::is_same() ？在这种情况下，对 myfunc 的调用甚至不应该编译，因为模板类型推导应该失败！我们如何才能在运行时模拟这种行为？

基本上，C++ 正朝着将越来越多的决策推入编译时的方向发展：模板代码生成、constexpr 函数求值等等。与此同时，C99 正忙于将传统的编译时决策（例如sizeof）推送到运行时。考虑到这一点，花费精力尝试将 C99 样式的 VLA 集成到 C++ 中真的有意义吗？

正如其他回答者已经指出的那样，C++ 提供了许多堆分配机制（std::unique_ptrA = new int[n]; 或 std::当您确实想要传达“我不知道我可能需要多少 RAM”的想法时，矢量A(n); 是显而易见的。 C++ 提供了一个漂亮的异常处理模型，用于处理不可避免的情况，即您需要的 RAM 量大于您拥有的 RAM 量。但希望这个答案能让您很好地了解为什么 C99 样式的 VLA不非常适合 C++，甚至不太适合 C99。 ;)

有关该主题的更多信息，请参阅 N3810 "数组扩展的替代方案”，Bjarne Stroustrup 2013 年 10 月关于 VLA 的论文。 Bjarne 的视角与我的非常不同； N3810 更侧重于为事物寻找良好的 C++ 语法，并阻止在 C++ 中使用原始数组，而我更侧重于元编程和类型系统的含义。我不知道他是否认为元编程/类型系统的影响已经解决、可以解决，或者只是无趣。

“合法使用可变长度数组”（克里斯·韦伦斯，2019-10-27）。

(Background: I have some experience implementing C and C++ compilers.)

Variable-length arrays in C99 were basically a misstep. In order to support VLAs, C99 had to make the following concessions to common sense:

sizeof x is no longer always a compile-time constant; the compiler must sometimes generate code to evaluate a sizeof-expression at runtime.
Allowing two-dimensional VLAs (int A[x][y]) required a new syntax for declaring functions that take 2D VLAs as parameters: void foo(int n, int A[][*]).
Less importantly in the C++ world, but extremely important for C's target audience of embedded-systems programmers, declaring a VLA means chomping an arbitrarily large chunk of your stack. This is a guaranteed stack-overflow and crash. (Anytime you declare int A[n], you're implicitly asserting that you have 2GB of stack to spare. After all, if you know "n is definitely less than 1000 here", then you would just declare int A[1000]. Substituting the 32-bit integer n for 1000 is an admission that you have no idea what the behavior of your program ought to be.)

Okay, so let's move to talking about C++ now. In C++, we have the same strong distinction between "type system" and "value system" that C89 does… but we've really started to rely on it in ways that C has not. For example:

template<typename T> struct S { ... };
int A[n];
S<decltype(A)> s;  // equivalently, S<int[n]> s;

If n weren't a compile-time constant (i.e., if A were of variably modified type), then what on earth would be the type of S? Would S's type also be determined only at runtime?

What about this:

template<typename T> bool myfunc(T& t1, T& t2) { ... };
int A1[n1], A2[n2];
myfunc(A1, A2);

The compiler must generate code for some instantiation of myfunc. What should that code look like? How can we statically generate that code, if we don't know the type of A1 at compile time?

Worse, what if it turns out at runtime that n1 != n2, so that !std::is_same<decltype(A1), decltype(A2)>()? In that case, the call to myfunc shouldn't even compile, because template type deduction should fail! How could we possibly emulate that behavior at runtime?

Basically, C++ is moving in the direction of pushing more and more decisions into compile-time: template code generation, constexpr function evaluation, and so on. Meanwhile, C99 was busy pushing traditionally compile-time decisions (e.g. sizeof) into the runtime. With this in mind, does it really even make sense to expend any effort trying to integrate C99-style VLAs into C++?

As every other answerer has already pointed out, C++ provides lots of heap-allocation mechanisms (std::unique_ptr<int[]> A = new int[n]; or std::vector<int> A(n); being the obvious ones) when you really want to convey the idea "I have no idea how much RAM I might need." And C++ provides a nifty exception-handling model for dealing with the inevitable situation that the amount of RAM you need is greater than the amount of RAM you have. But hopefully this answer gives you a good idea of why C99-style VLAs were not a good fit for C++ — and not really even a good fit for C99. ;)

For more on the topic, see N3810 "Alternatives for Array Extensions", Bjarne Stroustrup's October 2013 paper on VLAs. Bjarne's POV is very different from mine; N3810 focuses more on finding a good C++ish syntax for the things, and on discouraging the use of raw arrays in C++, whereas I focused more on the implications for metaprogramming and the typesystem. I don't know if he considers the metaprogramming/typesystem implications solved, solvable, or merely uninteresting.

A good blog post that hits many of these same points is "Legitimate Use of Variable Length Arrays" (Chris Wellons, 2019-10-27).

回复收藏 0 原文

从此见与不见 2024-08-21 13:14:53

最近在 usenet 中发起了关于此问题的讨论：为什么C++0x 中没有 VLA。

我同意那些似乎同意必须在堆栈上创建一个潜在的大型数组（通常只有很少的可用空间）的人的观点。争论是，如果您事先知道大小，则可以使用静态数组。如果您事先不知道大小，您将编写不安全的代码。

C99 VLA 可以提供一个小好处，即能够创建小型数组，而无需浪费空间或调用未使用元素的构造函数，但它们会给类型系统带来相当大的更改（您需要能够根据运行时值指定类型 - 这当前的 C++ 中尚不存在，除了 new 运算符类型说明符，但它们经过特殊处理，因此运行时性不会逃脱 new 的范围> 运算符）。

您可以使用 std::vector ，但它并不完全相同，因为它使用动态内存，并且使其使用自己的堆栈分配器并不容易（对齐也是一个问题））。它也不能解决同样的问题，因为向量是可调整大小的容器，而 VLA 是固定大小的。 C++ 动态数组提案的目的是引入基于库的解决方案，作为基于语言的 VLA 的替代方案。然而，据我所知，它不会成为 C++0x 的一部分。

回复收藏 0 原文

看海 2024-08-21 13:14:53

如果您愿意，您始终可以在运行时使用 alloca() 在堆栈上分配内存：

void foo (int n)
{
    int *values = (int *)alloca(sizeof(int) * n);
}

在堆栈上分配意味着当堆栈展开时它将自动释放。

快速说明：正如 Mac OS X 的 alloca(3) 手册页中提到的，“alloca() 函数依赖于机器和编译器；不鼓励使用它。”只是让你知道。

You could always use alloca() to allocate memory on the stack at runtime, if you wished:

void foo (int n)
{
    int *values = (int *)alloca(sizeof(int) * n);
}

Being allocated on the stack implies that it will automatically be freed when the stack unwinds.

Quick note: As mentioned in the Mac OS X man page for alloca(3), "The alloca() function is machine and compiler dependent; its use is dis-couraged." Just so you know.

回复收藏 0 原文

云仙小弟 2024-08-21 13:14:53

在我自己的工作中，我意识到每次我想要像可变长度自动数组或 alloca() 这样的东西时，我并不真正关心内存物理上位于 cpu 堆栈上，只是关心它来自一些堆栈分配器不会导致到普通堆的缓慢访问。所以我有一个每线程对象，它拥有一些内存，可以从中推送/弹出可变大小的缓冲区。在某些平台上我允许它通过 mmu 增长。其他平台具有固定大小（通常也伴随有固定大小的 cpu 堆栈，因为没有 mmu）。我使用的一个平台（手持游戏机）无论如何都拥有宝贵的少量 CPU 堆栈，因为它驻留在稀缺、快速的内存中。

我并不是说永远不需要将可变大小的缓冲区推送到 CPU 堆栈上。老实说，当我发现这不是标准时，我感到很惊讶，因为这个概念看起来确实足够适合该语言。但对我来说，“可变大小”和“必须物理上位于 CPU 堆栈上”的要求从未同时出现。这是关于速度的，所以我制作了自己的“数据缓冲区并行堆栈”。

回复收藏 0 原文

香橙ぽ 2024-08-21 13:14:53

似乎它将在 C++14 中可用：

https://en.wikipedia .org/wiki/C%2B%2B14#Runtime-sized_one_Dimension_arrays

更新：它没有进入 C++14。

回复收藏 0 原文

翻了热茶 2024-08-21 13:14:53

在某些情况下，与执行的操作相比，分配堆内存的成本非常昂贵。一个例子是矩阵数学。如果您使用较小的矩阵（例如 5 到 10 个元素）并进行大量算术运算，那么 malloc 开销将非常显着。同时，将大小设置为编译时常量似乎非常浪费且不灵活。

我认为 C++ 本身是如此不安全，以至于“尽量不要添加更多不安全功能”的论点并不是很强。另一方面，由于 C++ 可以说是运行时最高效的编程语言，因此它的特性总是有用的：编写性能关键程序的人将在很大程度上使用 C++，并且他们需要尽可能多的性能。将内容从堆移动到堆栈就是一种可能性。减少堆块的数量是另一个问题。允许 VLA 作为对象成员是实现此目的的一种方法。我正在研究这样的建议。诚然，实施起来有点复杂，但似乎相当可行。

回复收藏 0 原文

So尛奶瓶 2024-08-21 13:14:53

曾考虑将其包含在 C++/1x 中，但已被删除（这是对我之前所说的进行更正）。

无论如何，它在 C++ 中用处不大，因为我们已经有 std::vector 来填补这个角色。

回复收藏 0 原文

踏月而来 2024-08-21 13:14:53

VLA 是可变修改类型大家族的一部分。
这个类型家族非常特殊，因为它们有运行时组件。

代码：

int A[n];

编译器将其视为：

typedef int T[n];
T A;

请注意，数组的运行时大小并不绑定到变量 A，而是绑定到变量的类型。

没有什么可以阻止人们创建这种类型的新变量：

T B,C,D;

或者指针或数组。

T *p, Z[10];

此外，指针允许人们创建具有动态存储的 VLA。

T *p = malloc(sizeof(T));
...
free(p);

是什么打破了 VLA 只能在堆栈上分配的流行神话。

回到问题。

该运行时组件不能很好地与类型推导配合使用，而类型推导是 C++ 类型系统的基础之一。不可能使用模板、推导和重载。

C++ 类型系统是静态的，所有类型都必须在编译期间完全定义或推导。
VM 类型仅在程序执行期间完成。
将 VM 类型引入已经极其复杂的 C++ 中的额外复杂性被简单地认为是不合理的。主要是因为它们的主要实际应用
是自动 VLA (int A[n];)，它有 std::vector 形式的替代方案。

这有点令人悲伤，因为 VM 类型为处理多维数组的程序提供了非常优雅且高效的解决方案。

在 C 语言中，我们可以简单地写：

void foo(int n, int A[n][n][n]) {
  for (int i = 0; i < n; ++i)
    for (int j = 0; j < n; ++j)
      for (int k = 0; k < n; ++k)
        A[i][j][k] = i * j * k;
}

...

int A[5][5][5], B[10][10][10];
foo(5, A);
foo(10, B);

现在尝试在 C++ 中提供高效且优雅的解决方案。

VLAs are a part of a larger family of Variably Modified types.
This family of types is very special because they have runtime components.

The code:

int A[n];

Is seen by compiler as:

typedef int T[n];
T A;

Note that the runtime size of array is not bound to the variable A but to the type of the variable.

Nothing prevents one from making new variables of this type:

T B,C,D;

or the pointers or arrays

T *p, Z[10];

Moreover, pointers allow one to create VLAs with dynamic storage.

T *p = malloc(sizeof(T));
...
free(p);

What dispels a popular myth that VLAs can only be allocated on stack.

Back to the question.

This runtime component does not work well with type deduction which is one of the bases with C++ typing system. It would not possible to use templates, deduction and overloading.

C++ typing system is static, all types must be fully defined or deduced during compilation.
VM types are completed only during program execution.
Additional complexity introducing VM types to already hellishly complex C++ was simply considered unjustified. Mainly because their main practical application
are automatic VLAs (int A[n];) which have an alternative in form of std::vector.

It a bit sad because VM types provides very elegant and efficient solutions to programs handling multidimensional arrays.

In C one can simply write:

void foo(int n, int A[n][n][n]) {
  for (int i = 0; i < n; ++i)
    for (int j = 0; j < n; ++j)
      for (int k = 0; k < n; ++k)
        A[i][j][k] = i * j * k;
}

...

int A[5][5][5], B[10][10][10];
foo(5, A);
foo(10, B);

Now try to provide as efficient and elegant solution in C++.

回复收藏 0 原文

微凉 2024-08-21 13:14:53

像这样的数组是 C99 的一部分，但不是标准 C++ 的一部分。正如其他人所说，向量始终是一个更好的解决方案，这可能就是为什么可变大小的数组不在 C++ 标准中（或在提议的 C++0x 标准中）。

顺便说一句，对于“为什么”C++ 标准是这样的问题，请访问经过主持的 Usenet 新闻组 comp.std.c++ 是你要去的地方。

回复收藏 0 原文

疾风者 2024-08-21 13:14:53

为此使用 std::vector 。例如：

std::vector<int> values;
values.resize(n);

内存将在堆上分配，但这只会带来很小的性能缺陷。此外，明智的做法是不要在堆栈上分配大数据块，因为它的大小相当有限。

Use std::vector for this. For example:

std::vector<int> values;
values.resize(n);

The memory will be allocated on the heap, but this holds only a small performance drawback. Furthermore, it is wise not to allocate large datablocks on the stack, as it is rather limited in size.

回复收藏 0 原文

~没有更多了~