如何在 C++ 中使用数组?
C++ 从 C 继承了数组,它们几乎无处不在。 C++ 提供了更易于使用且不易出错的抽象(自 C++98 和 std::array
自 C++11),因此对数组的需求并不像 C 中那样频繁出现。但是,当您阅读遗留代码或进行交互时有了用 C 编写的库,您应该牢牢掌握数组的工作原理。
此常见问题解答分为五个部分:
如果您觉得此常见问题解答中缺少一些重要内容,请写下答案并将其链接到此处作为附加部分。
在下面的文本中,“array”表示“C 数组”,而不是类模板 std::array
。假设您具备 C 声明符语法的基本知识。请注意,如下所示,手动使用 new
和 delete
在遇到异常时极其危险,但这就是 另一个常见问题解答。
(注意:这是 Stack Overflow 的 C++ FAQ 的条目。如果您想要批评以这种形式提供常见问题解答的想法,然后 开始这一切的元上的帖子将是执行该操作的地方,该问题的答案将在C++ 聊天室,FAQ 想法最初是从这里开始的,因此您的答案很可能会被提出该想法的人阅读。)
C++ inherited arrays from C where they are used virtually everywhere. C++ provides abstractions that are easier to use and less error-prone (std::vector<T>
since C++98 and std::array<T, n>
since C++11), so the need for arrays does not arise quite as often as it does in C. However, when you read legacy code or interact with a library written in C, you should have a firm grasp on how arrays work.
This FAQ is split into five parts:
- arrays on the type level and accessing elements
- array creation and initialization
- assignment and parameter passing
- multidimensional arrays and arrays of pointers
- common pitfalls when using arrays
If you feel something important is missing in this FAQ, write an answer and link it here as an additional part.
In the following text, "array" means "C array", not the class template std::array
. Basic knowledge of the C declarator syntax is assumed. Note that the manual usage of new
and delete
as demonstrated below is extremely dangerous in the face of exceptions, but that is the topic of another FAQ.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
类型级别的数组
数组类型表示为
T[n]
,其中T
是元素类型,n
> 是正数大小,即数组中元素的数量。数组类型是元素类型和大小的乘积类型。如果这些成分中的一个或两个不同,您将得到一个不同的类型:请注意,大小是类型的一部分,也就是说,不同大小的数组类型是不兼容的类型,彼此之间完全没有任何关系。
sizeof(T[n])
相当于n * sizeof(T)
。数组到指针的衰减
T[n]
和T[m]
之间的唯一“联系”是这两种类型都可以隐式转换到T*
,此转换的结果是指向数组第一个元素的指针。也就是说,任何需要T*
的地方,您都可以提供T[n]
,编译器将默默地提供该指针:这种转换称为“数组- to-pointer 衰减”,这是造成混乱的一个主要根源。在此过程中数组的大小会丢失,因为它不再是类型的一部分 (
T*
)。优点:在类型级别上忘记数组的大小允许指针指向任何大小的数组的第一个元素。缺点:给定一个指向数组第一个(或任何其他)元素的指针,无法检测该数组有多大或指针相对于数组边界到底指向哪里。 指针极其愚蠢。数组不是指针
只要认为有用,即每当操作在数组上失败但在指针上成功时,编译器就会默默地生成一个指向数组第一个元素的指针。从数组到指针的转换很简单,因为生成的指针值只是数组的地址。请注意,指针不存储为数组本身的一部分(或内存中的任何其他位置)。 数组不是指针。
数组不会衰减为指向其第一个元素的指针的一个重要上下文是当
&
运算符应用于它。在这种情况下,&
运算符会生成一个指向整个数组的指针,而不仅仅是指向其第一个元素的指针。尽管在这种情况下,值(地址)相同,但指向数组第一个元素的指针和指向整个数组的指针是完全不同的类型:以下 ASCII 艺术解释了这种区别:
请注意,指向第一个元素的指针仅指向单个整数(描绘为小方框),而指向整个数组的指针则指向包含 8 个整数的数组(描绘为大方框)。
同样的情况也出现在课堂上,而且可能更为明显。指向对象的指针和指向其第一个数据成员的指针具有相同的值(相同的地址),但它们是完全不同的类型。
如果您不熟悉 C 声明符语法,则类型
int(*)[8]
中的括号是必不可少的:int(*)[8]
是指向8 个整数的数组。int*[8]
是一个由 8 个指针组成的数组,每个元素的类型为int*
。访问元素
C++ 提供了两种语法变体来访问数组的各个元素。
它们之间没有谁比谁优越,您应该熟悉两者。
指针算术
给定一个指向数组第一个元素的指针
p
,表达式p+i
将生成一个指向数组第 i 个元素的指针。之后通过取消引用该指针,可以访问单个元素:如果 x 表示一个数组,那么数组到指针的衰减就会开始,因为添加一个数组和一个整数没有意义(数组上没有加法运算),但是指针和整数相加就有意义了:(
注意隐式生成的指针没有名字,所以我写了
x+0
以便识别另一方面,如果
x
表示指向数组第一个(或任何其他)元素的指针,则数组到指针的衰减不是必要的,因为要添加的i
指针已经存在:请注意,在所描述的情况下,
x
是一个指针变量(可以通过 x 旁边的小框辨别),但它也可能是返回指针(或任何其他类型T*
的其他表达式)的函数的结果。索引运算符
由于语法
*(x+i)
有点笨拙,C++ 提供了替代语法x[i]
:由于加法是可交换的,因此以下代码的作用完全相同:
索引运算符的定义导致以下有趣的等价:
但是,
&x[0]
通常不等于x。前者是指针,后者是数组。只有当上下文触发数组到指针的衰减时,
x
和&x[0]
才能互换使用。例如:在第一行,编译器检测到从指针到指针的赋值,这很容易成功。在第二行,它检测从数组到指针的赋值。由于这是毫无意义的(但是指针到指针的赋值是有意义的),因此数组到指针的衰减像往常一样开始。
范围
T[n]
类型的数组有n
个元素,索引从0
到n-1
;没有元素n
。然而,为了支持半开范围(其中开头包含,结尾排除),C++允许计算指向(不存在的)的指针第 n 个元素,但取消引用该指针是非法的:例如,如果要对数组进行排序,则以下两种方法同样有效:
请注意,提供
&x[n] 是非法的
作为第二个参数,因为这相当于&*(x+n)
,并且子表达式*(x+n)
技术上调用 < a href="https://stackoverflow.com/questions/3144904/">C++ 中的未定义行为(但 C99 中则不然)。另请注意,您可以简单地提供 x 作为第一个参数。这对我来说有点太简洁了,而且它也使编译器的模板参数推导变得有点困难,因为在这种情况下,第一个参数是一个数组,但第二个参数是一个指针。 (数组到指针的衰减再次开始。)
Arrays on the type level
An array type is denoted as
T[n]
whereT
is the element type andn
is a positive size, the number of elements in the array. The array type is a product type of the element type and the size. If one or both of those ingredients differ, you get a distinct type:Note that the size is part of the type, that is, array types of different size are incompatible types that have absolutely nothing to do with each other.
sizeof(T[n])
is equivalent ton * sizeof(T)
.Array-to-pointer decay
The only "connection" between
T[n]
andT[m]
is that both types can implicitly be converted toT*
, and the result of this conversion is a pointer to the first element of the array. That is, anywhere aT*
is required, you can provide aT[n]
, and the compiler will silently provide that pointer:This conversion is known as "array-to-pointer decay", and it is a major source of confusion. The size of the array is lost in this process, since it is no longer part of the type (
T*
). Pro: Forgetting the size of an array on the type level allows a pointer to point to the first element of an array of any size. Con: Given a pointer to the first (or any other) element of an array, there is no way to detect how large that array is or where exactly the pointer points to relative to the bounds of the array. Pointers are extremely stupid.Arrays are not pointers
The compiler will silently generate a pointer to the first element of an array whenever it is deemed useful, that is, whenever an operation would fail on an array but succeed on a pointer. This conversion from array to pointer is trivial, since the resulting pointer value is simply the address of the array. Note that the pointer is not stored as part of the array itself (or anywhere else in memory). An array is not a pointer.
One important context in which an array does not decay into a pointer to its first element is when the
&
operator is applied to it. In that case, the&
operator yields a pointer to the entire array, not just a pointer to its first element. Although in that case the values (the addresses) are the same, a pointer to the first element of an array and a pointer to the entire array are completely distinct types:The following ASCII art explains this distinction:
Note how the pointer to the first element only points to a single integer (depicted as a small box), whereas the pointer to the entire array points to an array of 8 integers (depicted as a large box).
The same situation arises in classes and is maybe more obvious. A pointer to an object and a pointer to its first data member have the same value (the same address), yet they are completely distinct types.
If you are unfamiliar with the C declarator syntax, the parenthesis in the type
int(*)[8]
are essential:int(*)[8]
is a pointer to an array of 8 integers.int*[8]
is an array of 8 pointers, each element of typeint*
.Accessing elements
C++ provides two syntactic variations to access individual elements of an array.
Neither of them is superior to the other, and you should familiarize yourself with both.
Pointer arithmetic
Given a pointer
p
to the first element of an array, the expressionp+i
yields a pointer to the i-th element of the array. By dereferencing that pointer afterwards, one can access individual elements:If
x
denotes an array, then array-to-pointer decay will kick in, because adding an array and an integer is meaningless (there is no plus operation on arrays), but adding a pointer and an integer makes sense:(Note that the implicitly generated pointer has no name, so I wrote
x+0
in order to identify it.)If, on the other hand,
x
denotes a pointer to the first (or any other) element of an array, then array-to-pointer decay is not necessary, because the pointer on whichi
is going to be added already exists:Note that in the depicted case,
x
is a pointer variable (discernible by the small box next tox
), but it could just as well be the result of a function returning a pointer (or any other expression of typeT*
).Indexing operator
Since the syntax
*(x+i)
is a bit clumsy, C++ provides the alternative syntaxx[i]
:Due to the fact that addition is commutative, the following code does exactly the same:
The definition of the indexing operator leads to the following interesting equivalence:
However,
&x[0]
is generally not equivalent tox
. The former is a pointer, the latter an array. Only when the context triggers array-to-pointer decay canx
and&x[0]
be used interchangeably. For example:On the first line, the compiler detects an assignment from a pointer to a pointer, which trivially succeeds. On the second line, it detects an assignment from an array to a pointer. Since this is meaningless (but pointer to pointer assignment makes sense), array-to-pointer decay kicks in as usual.
Ranges
An array of type
T[n]
hasn
elements, indexed from0
ton-1
; there is no elementn
. And yet, to support half-open ranges (where the beginning is inclusive and the end is exclusive), C++ allows the computation of a pointer to the (non-existent) n-th element, but it is illegal to dereference that pointer:For example, if you want to sort an array, both of the following would work equally well:
Note that it is illegal to provide
&x[n]
as the second argument since this is equivalent to&*(x+n)
, and the sub-expression*(x+n)
technically invokes undefined behavior in C++ (but not in C99).Also note that you could simply provide
x
as the first argument. That is a little too terse for my taste, and it also makes template argument deduction a bit harder for the compiler, because in that case the first argument is an array but the second argument is a pointer. (Again, array-to-pointer decay kicks in.)程序员经常将多维数组与指针数组混淆。
多维数组
大多数程序员都熟悉命名多维数组,但许多人不知道多维数组也可以匿名创建。多维数组通常称为“数组的数组”或“真正多维数组”。
命名多维数组
当使用命名多维数组时,所有维度必须在编译时已知:
这就是命名多维数组在内存中的样子:
请注意,如上所述的 2D 网格仅仅是有用的可视化。从 C++ 的角度来看,内存是一个“扁平”的字节序列。多维数组的元素按行优先顺序存储。也就是说,
connect_four[0][6]
和connect_four[1][0]
是内存中的邻居。事实上,connect_four[0][7]
和connect_four[1][0]
表示同一个元素!这意味着您可以采用多维数组并将它们视为大型一维数组:匿名多维数组
对于匿名多维数组,除第一个维度外的所有维度都必须在编译时已知:
这是匿名多维数组在内存中的样子:
请注意,数组本身仍然在内存中分配为单个块。
指针数组
您可以通过引入另一层间接寻址来克服固定宽度的限制。
命名的指针数组
这是一个由五个指针组成的命名数组,它们使用不同长度的匿名数组进行初始化:
这是它在内存中的样子:
由于现在每行都是单独分配的,因此将 2D 数组视为 1D 数组不再起作用。
匿名指针数组
这是一个由 5 个(或任何其他数量)指针组成的匿名数组,它们是用不同长度的匿名数组初始化的:
下面是它在内存中的样子:
转换
数组到指针的衰减自然延伸到数组数组和指针数组:
但是,没有从
T[h][w]
到T**
的隐式转换。如果确实存在这样的隐式转换,则结果将是指向指向T
的h
指针数组的第一个元素的指针(每个指针都指向一行的第一个元素)在原始的二维数组中),但是该指针数组在内存中的任何位置都不存在。如果您想要进行此类转换,则必须手动创建并填充所需的指针数组:请注意,这会生成原始多维数组的视图。如果您需要副本,则必须创建额外的数组并自行复制数据:
Programmers often confuse multidimensional arrays with arrays of pointers.
Multidimensional arrays
Most programmers are familiar with named multidimensional arrays, but many are unaware of the fact that multidimensional array can also be created anonymously. Multidimensional arrays are often referred to as "arrays of arrays" or "true multidimensional arrays".
Named multidimensional arrays
When using named multidimensional arrays, all dimensions must be known at compile time:
This is how a named multidimensional array looks like in memory:
Note that 2D grids such as the above are merely helpful visualizations. From the point of view of C++, memory is a "flat" sequence of bytes. The elements of a multidimensional array are stored in row-major order. That is,
connect_four[0][6]
andconnect_four[1][0]
are neighbors in memory. In fact,connect_four[0][7]
andconnect_four[1][0]
denote the same element! This means that you can take multi-dimensional arrays and treat them as large, one-dimensional arrays:Anonymous multidimensional arrays
With anonymous multidimensional arrays, all dimensions except the first must be known at compile time:
This is how an anonymous multidimensional array looks like in memory:
Note that the array itself is still allocated as a single block in memory.
Arrays of pointers
You can overcome the restriction of fixed width by introducing another level of indirection.
Named arrays of pointers
Here is a named array of five pointers which are initialized with anonymous arrays of different lengths:
And here is how it looks like in memory:
Since each line is allocated individually now, viewing 2D arrays as 1D arrays does not work anymore.
Anonymous arrays of pointers
Here is an anonymous array of 5 (or any other number of) pointers which are initialized with anonymous arrays of different lengths:
And here is how it looks like in memory:
Conversions
Array-to-pointer decay naturally extends to arrays of arrays and arrays of pointers:
However, there is no implicit conversion from
T[h][w]
toT**
. If such an implicit conversion did exist, the result would be a pointer to the first element of an array ofh
pointers toT
(each pointing to the first element of a line in the original 2D array), but that pointer array does not exist anywhere in memory yet. If you want such a conversion, you must create and fill the required pointer array manually:Note that this generates a view of the original multidimensional array. If you need a copy instead, you must create extra arrays and copy the data yourself:
赋值
没有特殊原因,数组之间不能相互赋值。使用 std::copy 代替:
这比真正的数组赋值更灵活,因为可以将较大数组的切片复制到较小的数组中。
std::copy
通常专门用于原始类型以提供最大性能。std::memcpy
不太可能表现得更好。如有疑问,请进行测量。虽然您不能直接分配数组,但您可以分配包含数组成员的结构和类。这是因为编译器默认提供的赋值运算符数组成员按成员复制。如果您为自己的结构或类类型手动定义赋值运算符,则必须回退到手动复制数组成员。
参数传递
数组不能按值传递。您可以通过指针或引用传递它们。
通过指针传递
由于数组本身不能按值传递,因此通常会按值传递指向其第一个元素的指针。这通常称为“通过指针传递”。由于无法通过该指针检索数组的大小,因此必须传递第二个参数来指示数组的大小(经典的 C 解决方案)或指向数组最后一个元素之后的第二个指针(C++ 迭代器解决方案) :
作为一种语法替代,您还可以将参数声明为
T p[]
,它与参数上下文中的T* p
含义完全相同仅列表:您可以将编译器视为将
T p[]
重写为T *p
仅在参数列表的上下文中。这个特殊规则是造成数组和指针混乱的部分原因。在所有其他上下文中,将某些内容声明为数组或指针都会产生巨大的差异。不幸的是,您还可以在数组参数中提供一个大小,该大小会被编译器默默地忽略。也就是说,以下三个签名完全等效,如编译器错误所示:
按引用传递
数组也可以按引用传递:
在这种情况下,数组大小很重要。由于编写只接受恰好 8 个元素的数组的函数没什么用处,因此程序员通常将此类函数编写为模板:
请注意,只能使用实际的整数数组调用此类函数模板,而不能使用指向整数的指针。数组的大小是自动推断的,对于每个大小
n
,都会从模板实例化一个不同的函数。您还可以编写相当有用的函数模板,从元素类型和大小中进行抽象。Assignment
For no particular reason, arrays cannot be assigned to one another. Use
std::copy
instead:This is more flexible than what true array assignment could provide because it is possible to copy slices of larger arrays into smaller arrays.
std::copy
is usually specialized for primitive types to give maximum performance. It is unlikely thatstd::memcpy
performs better. If in doubt, measure.Although you cannot assign arrays directly, you can assign structs and classes which contain array members. That is because array members are copied memberwise by the assignment operator which is provided as a default by the compiler. If you define the assignment operator manually for your own struct or class types, you must fall back to manual copying for the array members.
Parameter passing
Arrays cannot be passed by value. You can either pass them by pointer or by reference.
Pass by pointer
Since arrays themselves cannot be passed by value, usually a pointer to their first element is passed by value instead. This is often called "pass by pointer". Since the size of the array is not retrievable via that pointer, you have to pass a second parameter indicating the size of the array (the classic C solution) or a second pointer pointing after the last element of the array (the C++ iterator solution):
As a syntactic alternative, you can also declare parameters as
T p[]
, and it means the exact same thing asT* p
in the context of parameter lists only:You can think of the compiler as rewriting
T p[]
toT *p
in the context of parameter lists only. This special rule is partly responsible for the whole confusion about arrays and pointers. In every other context, declaring something as an array or as a pointer makes a huge difference.Unfortunately, you can also provide a size in an array parameter which is silently ignored by the compiler. That is, the following three signatures are exactly equivalent, as indicated by the compiler errors:
Pass by reference
Arrays can also be passed by reference:
In this case, the array size is significant. Since writing a function that only accepts arrays of exactly 8 elements is of little use, programmers usually write such functions as templates:
Note that you can only call such a function template with an actual array of integers, not with a pointer to an integer. The size of the array is automatically inferred, and for every size
n
, a different function is instantiated from the template. You can also write quite useful function templates that abstract from both the element type and from the size.5. 使用数组时的常见陷阱。
5.1 陷阱:信任类型不安全的链接。
好吧,你已经被告知,或者你自己发现了,全局变量(命名空间
可以在翻译单元之外访问的范围变量)是
邪恶™。但您知道它们有多真实吗?考虑
下面的程序,由两个文件 [main.cpp] 和 [numbers.cpp] 组成:
在 Windows 7 中,该程序可以与 MinGW g++ 4.4.1 和
视觉C++10.0。
由于类型不匹配,程序在运行时会崩溃。
正式解释:该程序具有未定义行为 (UB),而是
因此,崩溃时它可能会挂起,或者什么都不做,或者它
可以向美国、俄罗斯、印度总统发送威胁电子邮件,
中国和瑞士,让鼻恶魔从你的鼻子里飞出来。
实践说明:在
main.cpp
中,数组被视为指针,放置在与数组位于同一地址。对于 32 位可执行文件,这意味着第一个
数组中的
int
值被视为指针。即,在main.cpp
中numbers
变量包含或似乎包含(int*)1
。这导致程序访问地址空间最底部的内存,即
传统上保留并导致陷阱。结果:你会崩溃。
编译器完全有权不诊断此错误,
因为 C++11 §3.5/10 说,关于兼容类型的要求
对于声明,
同一段详细说明了允许的变化:
这种允许的变化不包括将名称声明为一个数组
翻译单元,并作为另一个翻译单元中的指针。
5.2 陷阱:过早优化(
memset
和朋友)。尚未编写
5.3 陷阱:使用 C 惯用法获取元素数量。
凭借深厚的 C 经验,很自然地可以这样写……
由于数组会在需要时衰减为指向第一个元素的指针,因此
表达式
sizeof(a)/sizeof(a[0])
也可以写成sizeof(a)/sizeof(*a)
。不管怎样,意思都是一样的它是用于查找数组元素的C 惯用法。
主要陷阱:C 习惯用法不是类型安全的。例如,代码
...
传递一个指向
N_ITEMS
的指针,因此很可能会产生错误结果。在 Windows 7 中编译为 32 位可执行文件,它会生成……
int const a[7]
重写为int const a[]
。int const a[]
重写为int const* a
。N_ITEMS
是通过指针调用的。sizeof(array)
(指针的大小)为 4。sizeof(*array)
相当于sizeof(int)< /code>,对于 32 位可执行文件也是 4。
为了在运行时检测此错误,您可以执行以下操作:
运行时错误检测比不检测好,但是有点浪费
处理器时间,也许还有更多的程序员时间。更好地检测
编译时间!如果您很高兴不支持 C++98 的本地类型数组,
那么你可以这样做:
用 g++ 编译这个定义并代入第一个完整的程序,
我得到了……
工作原理:数组通过引用传递给
n_items
,所以它确实如此不会衰减到指向第一个元素的指针,并且该函数可以只返回
类型指定的元素数量。
使用 C++11,您也可以将其用于本地类型的数组,并且它是类型安全的
用于查找数组元素数量的C++ 习惯用法。
5.4 C++11 - C++20 陷阱:使用 constexpr 数组大小函数。
使用 C++11 及更高版本,很自然地实现数组大小函数,如下所示:
这会生成数组中的元素数量作为编译时间常量。该函数甚至被标准化为
std::size
。例如,
size()
可用于声明与另一个数组大小相同的数组:但请考虑使用
constexpr
版本的代码:陷阱:直到 C++ 23 不允许使用引用
c
na常量表达式,并且所有主要编译器都拒绝此代码。来自 C++20 标准,[expr.const] p5.12 :c
既不能在常量表达式中使用,其生命周期也不在constexpr int n = ...
内开始,因此计算c
不是核心常数表达式。对于 C++23,这些限制已由 P2280:在常量表达式中使用未知的指针和引用。c
被视为对未指定对象的引用绑定 ([ expr.const] p8)。5.4.1 解决方法:C++20 兼容的
constexpr
大小函数std::extent
std::extent
不是一个可行的解决方法,因为如果constexpr
decltype( c ) >::value;Collection
不是数组,它就会失败。为了处理可以是非数组的集合,需要一个
size
函数,而且,对于编译时使用,需要编译时数组大小的表示。以及经典的 C++03 解决方案,效果很好
同样在 C++11 和 C++14 中,是让函数报告其结果而不是值
但通过其函数结果类型。例如这样:
关于
static_n_items
返回类型的选择:此代码不使用std::integral_constant
因为使用
std::integral_constant
表示结果直接作为
constexpr
值,重新引入原始问题。关于命名:此解决方案的一部分是
constexpr
-invalid-due-to-reference问题是要明确选择编译时间常数。
在 C++23 之前,像上面的 STATIC_N_ITEMS 这样的宏会产生可移植性,
例如,对于 clang 和 Visual C++ 编译器,保留类型安全。
相关:宏不尊重范围,因此为了避免名称冲突,它可以是
使用名称前缀是个好主意,例如
MYLIB_STATIC_N_ITEMS
。5. Common pitfalls when using arrays.
5.1 Pitfall: Trusting type-unsafe linking.
OK, you’ve been told, or have found out yourself, that globals (namespace
scope variables that can be accessed outside the translation unit) are
Evil™. But did you know how truly Evil™ they are? Consider the
program below, consisting of two files [main.cpp] and [numbers.cpp]:
In Windows 7 this compiles and links fine with both MinGW g++ 4.4.1 and
Visual C++ 10.0.
Since the types don't match, the program crashes when you run it.
In-the-formal explanation: the program has Undefined Behavior (UB), and instead
of crashing it can therefore just hang, or perhaps do nothing, or it
can send threating e-mails to the presidents of the USA, Russia, India,
China and Switzerland, and make Nasal Daemons fly out of your nose.
In-practice explanation: in
main.cpp
the array is treated as a pointer, placedat the same address as the array. For 32-bit executable this means that the first
int
value in the array, is treated as a pointer. I.e., inmain.cpp
thenumbers
variable contains, or appears to contain,(int*)1
. This causes theprogram to access memory down at very bottom of the address space, which is
conventionally reserved and trap-causing. Result: you get a crash.
The compilers are fully within their rights to not diagnose this error,
because C++11 §3.5/10 says, about the requirement of compatible types
for the declarations,
The same paragraph details the variation that is allowed:
This allowed variation does not include declaring a name as an array in one
translation unit, and as a pointer in another translation unit.
5.2 Pitfall: Doing premature optimization (
memset
& friends).Not written yet
5.3 Pitfall: Using the C idiom to get number of elements.
With deep C experience it’s natural to write …
Since an
array
decays to pointer to first element where needed, theexpression
sizeof(a)/sizeof(a[0])
can also be written assizeof(a)/sizeof(*a)
. It means the same, and no matter how it’swritten it is the C idiom for finding the number elements of array.
Main pitfall: the C idiom is not typesafe. For example, the code
…
passes a pointer to
N_ITEMS
, and therefore most likely produces a wrongresult. Compiled as a 32-bit executable in Windows 7 it produces …
int const a[7]
to justint const a[]
.int const a[]
toint const* a
.N_ITEMS
is therefore invoked with a pointer.sizeof(array)
(size of a pointer) is then 4.sizeof(*array)
is equivalent tosizeof(int)
, which for a 32-bit executable is also 4.In order to detect this error at run time you can do …
The runtime error detection is better than no detection, but it wastes a little
processor time, and perhaps much more programmer time. Better with detection at
compile time! And if you're happy to not support arrays of local types with C++98,
then you can do that:
Compiling this definition substituted into the first complete program, with g++,
I got …
How it works: the array is passed by reference to
n_items
, and so it doesnot decay to pointer to first element, and the function can just return the
number of elements specified by the type.
With C++11 you can use this also for arrays of local type, and it's the type safe
C++ idiom for finding the number of elements of an array.
5.4 C++11 - C++20 pitfall: Using a
constexpr
array size function.With C++11 and later, it's natural to implement an array size function as follows:
This yields the amount of elements in an array as a compile time constant. This function has even been standardized as
std::size
in C++17.For example,
size()
can be used to declare an array of the same size as another:But consider this code using the
constexpr
version:The pitfall: until C++23 using the reference
c
n a constant expression is not allowed, and all major compilers reject this code. From the C++20 standard, [expr.const] p5.12:c
is neither usable in a constant expression nor did its lifetime begin withinconstexpr int n = ...
, so evaluatingc
is not a core constant expression. These restrictions have been lifted for C++23 by P2280: Using unknown pointers and references in constant expressions.c
is treated a reference binding to an unspecified object ([expr.const] p8).5.4.1 Workaround: C++20-compatible
constexpr
size functionstd::extent< decltype( c ) >::value;
is not a viable workaround because it would fail ifCollection
was not an array.To deal with collections that can be non-arrays one needs the overloadability of an
size
function, but also, for compile time use one needs a compile timerepresentation of the array size. And the classic C++03 solution, which works fine
also in C++11 and C++14, is to let the function report its result not as a value
but via its function result type. For example like this:
About the choice of return type for
static_n_items
: this code doesn't usestd::integral_constant
because with
std::integral_constant
the result is representeddirectly as a
constexpr
value, reintroducing the original problem.About the naming: part of this solution to the
constexpr
-invalid-due-to-referenceproblem is to make the choice of compile time constant explicit.
Until C++23, a macro like the
STATIC_N_ITEMS
above yields portability,e.g. to the clang and Visual C++ compilers, retaining type safety.
Related: macros do not respect scopes, so to avoid name collisions it can be a
good idea to use a name prefix, e.g.
MYLIB_STATIC_N_ITEMS
.数组创建和初始化
与任何其他类型的 C++ 对象一样,数组可以直接存储在命名变量中(那么大小必须是编译时常量;C++ 不支持 VLA),或者可以将它们匿名存储在堆上并通过指针间接访问(只有这样才能在运行时计算大小)。
自动数组
每次控制流经过非静态局部数组变量的定义时,都会创建自动数组(“位于堆栈上”的数组):
按升序执行初始化。请注意,初始值取决于元素类型
T
:T
是 POD (如上例中的int
),不进行初始化。T
的默认构造函数将初始化所有元素。T
未提供可访问的默认构造函数,则程序无法编译。或者,可以在数组初始值设定项中显式指定初始值,这是一个用大括号括起来的逗号分隔列表:
因为在这种情况下,数组初始值设定项中的元素数量等于数组初始值设定项的大小数组,手动指定大小是多余的。它可以由编译器自动推导:
还可以指定大小并提供较短的数组初始值设定项:
在这种情况下,其余元素为 零初始化。请注意,C++ 允许使用空数组初始值设定项(所有元素都初始化为零),而 C89 不允许(至少需要一个值)。另请注意,数组初始值设定项只能用于初始化数组;它们以后不能在作业中使用。
静态数组
静态数组(位于“数据段”中的数组)是使用
static
关键字定义的局部数组变量和命名空间范围内的数组变量(“全局变量”):(请注意,命名空间范围内的变量是隐式静态的。将
static
关键字添加到其定义中具有完全不同的、已弃用的含义。)以下是静态数组与自动数组的不同之处:
(以上都不是特定于数组的。这些规则同样适用于其他类型的静态对象。)
数组数据成员
数组数据成员是在创建其所属对象时创建的。不幸的是,C++03 没有提供初始化成员初始值设定项列表中的数组的方法,因此必须通过赋值来伪造初始化:
或者,您可以在构造函数主体中定义一个自动数组并复制元素:
在 C++0x 中,由于 统一初始化:
这是唯一适用于没有默认构造函数的元素类型的解决方案。
动态数组
动态数组没有名称,因此访问它们的唯一方法是通过指针。因为它们没有名字,所以从现在开始我将它们称为“匿名数组”。
在 C 中,匿名数组是通过 malloc 等创建的。在 C++ 中,匿名数组是使用
new T[size]
语法创建的,该语法返回指向匿名数组第一个元素的指针:以下 ASCII 艺术描述了大小计算为 8 时的内存布局运行时:
显然,由于必须单独存储额外的指针,匿名数组比命名数组需要更多的内存。 (免费存储上还有一些额外的开销。)
请注意,这里没有发生数组到指针的衰减。尽管计算
new int[size]
实际上会创建一个整数数组,但表达式new int[size]
的结果是 >已经指向单个整数(第一个元素)的指针,不是整数数组或指向未知大小的整数数组的指针。这是不可能的,因为静态类型系统要求数组大小是编译时常量。 (因此,我没有在图中用静态类型信息注释匿名数组。)关于元素的默认值,匿名数组的行为类似于自动数组。
通常,匿名 POD 数组不会被初始化,但是有一个特殊语法可以触发值初始化:(
注意尾随的一对分号之前的括号。)同样,C++0x 简化了规则,并允许通过统一初始化为匿名数组指定初始值:
如果使用完匿名数组,则必须将其释放回系统:
您必须每个匿名数组只释放一次,然后再也不会碰它。根本不释放它会导致内存泄漏(或者更一般地说,根据元素类型,资源泄漏),并且尝试多次释放它会导致未定义的行为。使用非数组形式
delete
(或free
)而不是delete[]
来释放数组也是未定义的行为。Array creation and initialization
As with any other kind of C++ object, arrays can be stored either directly in named variables (then the size must be a compile-time constant; C++ does not support VLAs), or they can be stored anonymously on the heap and accessed indirectly via pointers (only then can the size be computed at runtime).
Automatic arrays
Automatic arrays (arrays living "on the stack") are created each time the flow of control passes through the definition of a non-static local array variable:
Initialization is performed in ascending order. Note that the initial values depend on the element type
T
:T
is a POD (likeint
in the above example), no initialization takes place.T
initializes all the elements.T
provides no accessible default-constructor, the program does not compile.Alternatively, the initial values can be explicitly specified in the array initializer, a comma-separated list surrounded by curly brackets:
Since in this case the number of elements in the array initializer is equal to the size of the array, specifying the size manually is redundant. It can automatically be deduced by the compiler:
It is also possible to specify the size and provide a shorter array initializer:
In that case, the remaining elements are zero-initialized. Note that C++ allows an empty array initializer (all elements are zero-initialized), whereas C89 does not (at least one value is required). Also note that array initializers can only be used to initialize arrays; they cannot later be used in assignments.
Static arrays
Static arrays (arrays living "in the data segment") are local array variables defined with the
static
keyword and array variables at namespace scope ("global variables"):(Note that variables at namespace scope are implicitly static. Adding the
static
keyword to their definition has a completely different, deprecated meaning.)Here is how static arrays behave differently from automatic arrays:
(None of the above is specific to arrays. These rules apply equally well to other kinds of static objects.)
Array data members
Array data members are created when their owning object is created. Unfortunately, C++03 provides no means to initialize arrays in the member initializer list, so initialization must be faked with assignments:
Alternatively, you can define an automatic array in the constructor body and copy the elements over:
In C++0x, arrays can be initialized in the member initializer list thanks to uniform initialization:
This is the only solution that works with element types that have no default constructor.
Dynamic arrays
Dynamic arrays have no names, hence the only means of accessing them is via pointers. Because they have no names, I will refer to them as "anonymous arrays" from now on.
In C, anonymous arrays are created via
malloc
and friends. In C++, anonymous arrays are created using thenew T[size]
syntax which returns a pointer to the first element of an anonymous array:The following ASCII art depicts the memory layout if the size is computed as 8 at runtime:
Obviously, anonymous arrays require more memory than named arrays due to the extra pointer that must be stored separately. (There is also some additional overhead on the free store.)
Note that there is no array-to-pointer decay going on here. Although evaluating
new int[size]
does in fact create an array of integers, the result of the expressionnew int[size]
is already a pointer to a single integer (the first element), not an array of integers or a pointer to an array of integers of unknown size. That would be impossible, because the static type system requires array sizes to be compile-time constants. (Hence, I did not annotate the anonymous array with static type information in the picture.)Concerning default values for elements, anonymous arrays behave similar to automatic arrays.
Normally, anonymous POD arrays are not initialized, but there is a special syntax that triggers value-initialization:
(Note the trailing pair of parenthesis right before the semicolon.) Again, C++0x simplifies the rules and allows specifying initial values for anonymous arrays thanks to uniform initialization:
If you are done using an anonymous array, you have to release it back to the system:
You must release each anonymous array exactly once and then never touch it again afterwards. Not releasing it at all results in a memory leak (or more generally, depending on the element type, a resource leak), and trying to release it multiple times results in undefined behavior. Using the non-array form
delete
(orfree
) instead ofdelete[]
to release the array is also undefined behavior.