编译器存储函数分配,非静态的const阵列是否可以在恒定数据中避免并避免到每次初始化?
在阅读如何存储在二进制文件中的char arrays/strings?,我在考虑各种方式其中涉及的原始字符串“ nancy”
在结果二进制中看起来完好无损。该帖子的案例是:
int main()
{
char temp[6] = "Nancy";
printf("%s", temp);
return 0;
}
显然,在一般情况下(编译器无法确认temp
未被列入),它实际上必须初始化堆栈本地阵列以将来允许突变;数组本身必须分配空间(在堆栈上,或者使用寄存器用于真正怪异的架构),并且必须在每个呼叫的函数上填充它为了避免重新输入问题等,在C ++中只有一次,通常仅在c)中拨打一次。无论是将初始化到程序集中,还是从程序的常数数据部分中进行memcpy
是无关的;肯定有一个每句话必须初始化的东西。
相比之下,如果char temp [6] =“ nancy”;
被替换为:
const char *temp =“ nancy”;
char *temp = temp = temp = “ nancy”;
(仅C;在C ++中,文字为const char []
,尽管实际上它们在C中也不可变)static const char temp [6 ] =“ nancy”;
静态char temp [6] =“ nancy”;
然后,该程序不需要分配每个呼叫的任何基于数组的资源(在情况下只是指针变量#1&#2),在案例#4以外的所有情况下,它可以将数据放入仅读取的内存中(#4会将其放在读取版本中,但可以仍然被烘烤到二进制和加载的折叠式上)。
我的问题:标准是否为const char temp [6] =“ nancy”;
的行为等于static const char temp [6] =“ Nancy”; <<< /code>?
两者都是不变的,并且修改它们是违反规则的。我知道的唯一区别是:
- 如果没有
静态
,您希望该数组的地址与其他当地人共裂,而不是程序内存的其他部分(可能会影响缓存性能) - 没有
静态
,您从技术上讲,在每个呼叫上创建和破坏了变量,
我看不出任何明显的可观察行为的破坏的行为:
- 您无法观察数组的存在并停止存在,除非不确定的行为,例如将指针返回
temp
,而没有保证 - 您不能合法地计算
ptrdiff_t
对于无关的变量(仅在给定数组中,加上所述数组的一端虚拟元素),
所以我 think 编译器可以安全地“将其视为static /code>“对于此情况,AS-IF规则;没有办法观察差异,因此它可以做任何感觉最好的事情。
我是否缺少任何c或c ++标准 const
但non- 静态的每次单点初始化
函数范围范围数组? C ++标准不同意,我也想知道。
如果C和 >在特定的编译器中,例如:
int myfunc() {
const char temp[6] = "Nancy";
const char temp2[6] = "Nancy";
return temp == temp2; // true if compiler implicitly made them static or combined them, false if not
}
或:
int otherfunc(const char *s) {
const char temp[6] = "Nancy";
return s == temp;
}
int myfunc() {
const char temp[6] = "Nancy";
return otherfunc(temp); // true if compiler implicitly made them shared statics, false if not
}
In reading How are char arrays / strings stored in binary files (C/C++)?, I was thinking about the various ways in which the raw string involved, "Nancy"
, would appear intact in the resulting binary. That post's case was:
int main()
{
char temp[6] = "Nancy";
printf("%s", temp);
return 0;
}
and obviously, in the general case (where the compiler can't confirm if temp
is unmutated), it must actually initialize a stack local array to allow for mutations in the future; the array itself must have space allocated (on the stack, or maybe using registers for truly weird architectures), and it must be populated on each call to the function (let's pretend this isn't main
which is called only once in C++ and typically only once in C), to avoid reentrancy issues and the like. Whether it hardcodes the initialization into the assembly, or does a memcpy
from the program's constant data section is irrelevant; there is definitely something that must be initialized per-call.
By contrast, if char temp[6] = "Nancy";
was replaced with any of:
const char *temp = "Nancy";
char *temp = "Nancy";
(C only; in C++ the literals areconst char[]
, though in practice they're not mutable in C either)static const char temp[6] = "Nancy";
static char temp[6] = "Nancy";
then the program need not allocate any array-length-based resources per call (just a pointer variable in cases #1 & #2), and in all but case #4, it can put the data in read-only memory baked into the binary's data constants (#4 would put it in the section for read-write memory, but it could still be baked into the binary and loaded copy-on-write).
My question: Does the standard provided leeway for const char temp[6] = "Nancy";
to behave equivalently to static const char temp[6] = "Nancy";
? Both are immutable, and modifying them is against the rules. The only differences I'm aware of would be:
- Without
static
, you'd expect the array's address to be colocated with other locals, not in some other part of program memory (could have affects on cache performance) - Without
static
, you're technically saying the variable is created and destroyed on each call
I don't see anything obviously broken in terms of observable behavior by the standard:
- You can't watch the array exist and cease to exist except in terms of undefined behavior, e.g. returning a pointer to
temp
, where there are no guarantees - You can't legally compute
ptrdiff_t
for unrelated variables (only within a given array, plus the one-past-the-end virtual element of said array)
so I'd think the compiler could safely "treat as static
" for this case by as-if rules; there's no way to observe the difference, so it can do whatever it feels best.
Is there anything I'm missing where either the C or C++ standard would require some sort of per-call initialization of the const
but non-static
function scoped array? If the C and C++ standards disagree, I'd like to know that too.
Edit: As Barmar points out in the constants, there are standards-legal ways to detect this behavior in a particular compiler, e.g.:
int myfunc() {
const char temp[6] = "Nancy";
const char temp2[6] = "Nancy";
return temp == temp2; // true if compiler implicitly made them static or combined them, false if not
}
or:
int otherfunc(const char *s) {
const char temp[6] = "Nancy";
return s == temp;
}
int myfunc() {
const char temp[6] = "Nancy";
return otherfunc(temp); // true if compiler implicitly made them shared statics, false if not
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
该标准未规定如何实现本地变量。堆栈是一个常见的选择,因为它使递归功能变得容易。但是叶功能很容易检测到,示例几乎是携带
printf
的副作用的叶子功能。对于此类叶子功能,编译器可能会选择使用静态分配的内存来实现本地变量。正如该问题正确地指出的那样,由于不是静态的,因此仍需要构造和破坏本地变量。
但是,在这个问题中,
char Temp [6]
没有构造函数或破坏者。因此,如所述,将局部变量实现在叶函数中的编译器将具有memcpy
来初始化temp
。此
memcpy
将可见到优化器 - 它将看到全局地址,printf
中唯一使用相同地址的用法,然后可以推断出每个> memcpy
可以移至程序启动。相同的memcpy
的重复调用是基于掌握的,并且可以优化。这将导致生成的程序集与
静态
情况相同。因此,问题的答案是肯定的。编译器确实可以生成相同的代码,甚至还有一种合理的方式,最终可以这样做。The standard does not prescribe how local variables are implemented. A stack is a common choice, because it makes recursive functions easy. But leaf functions are easy to detect, and the example is almost a leaf function exact for the side-effect carrying
printf
.For such leaf functions, a compiler might choose to implement local variables using statically allocated memory. As the question correctly states, the local variables still need to be constructed and destructed, since they're not static.
In this question, however,
char temp[6]
has no constructors or destructors. So a compiler which implements local variables in leaf functions as described would have amemcpy
to initializetemp
.This
memcpy
would be visible to the optimizer - it would see the global address, the only use of the same address inprintf
, and it could then deduce that eachmemcpy
can be moved to program startup. Repeated calls of that samememcpy
are idempotent and can be optimized out.This would cause the generated assembly to be identical to the
static
case. So the answer to the question is yes. A compiler can indeed generate the same code, and there's even a somewhat plausible way in which it could end up doing so.根据C11,6.2.2/6
temp
没有链接,因为它是:和PER C11,6.2.2/2:
“唯一实体”暗示(我猜)“唯一地址”。因此,需要编译器提供独特性属性。
但是(猜测),如果优化器证明未使用唯一性属性,并且估计从内存读取的速度比写作要快。读取寄存器(
=“ Nancy”
的生成代码),然后(我想)它可以使temp
具有静态存储持续时间。请注意,通常写作&amp;阅读寄存器比从内存中阅读要快得多。额外:
temp
具有块范围,而不是功能范围。在最初的答案下方(“超出范围”)。
C11,6.8语句和块,语义,3(添加了强调):
Per C11, 6.2.2/6
temp
has no linkage, because it is:and per C11, 6.2.2/2:
The "unique entity" implies (I guess) "unique address". Hence, the compiler is required to provide the uniqueness property.
However (speculating), if an optimizer proved that the uniqueness property is not used AND estimated that reading from memory is faster than writing & reading registers (generated code for
= "Nancy"
), then (I guess) it can maketemp
to have static storage duration. Note that usually writing & reading registers is much faster than reading from memory.Extra:
temp
has block scope, not function scope.Below the initial answer (which is "out of scope").
C11, 6.8 Statements and blocks, Semantics, 3 (emphasis added):
对于C ++,尽管我希望C的答案是等效的:
如果递归递归发表声明的功能
,那么与
static
的变体相比,声明将导致多个完整的const char [6]
具有重叠寿命的对象存在。应用 [intro.object]/9 然后,没有重叠的内存,它们的地址以及其数组元素的地址必须是不同的。另一方面,使用
静态
,数组的实例只有一个实例,因此在多个递归中获取其地址必须产生相同的值。这是具有和没有静态
的版本之间可观察到的区别。因此,如果获取数组的地址或其元素之一的地址或对形成并逃脱功能主体的参考,并且有可能具有递归的功能调用,则编译器通常无法用附加<来对待声明代码>静态修饰符。
如果编译器可以确定,例如对数组或其元素没有指针/引用函数,或者不能递归地调用函数,或者该函数的行为不取决于数组副本的地址,则然后,它可以在AS-IF规则下将数组视为
static
。由于数组是
const
qualified自动存储持续时间变量,因此无法修改其中的值或将新对象放入其存储中。只要这些地址与行为无关,因此没有其他可能会引起可观察到的行为差异。我认为这里没有任何特定于
const char
数组。这适用于所有const
自动存储持续时间恒定定位变量,并具有微不足道的破坏。constexpr
而不是const
也不会在此处更改任何内容,因为这不会影响对象身份。由于[Into.Object]/9,因此在您的编辑中,两个函数
MyFunc
也可以保证返回0
。这两个阵列具有重叠的寿命,因此可能不会共享相同的地址。因此,这不是“检测”此优化的方法。它使它变得不可能。For C++, although I would expect the answer for C to be equivalent:
If the function with the declaration
is entered recursively, then, in contrast to the variant with
static
, the declaration will cause multiple completeconst char[6]
objects with overlapping lifetimes to exist.Applying [intro.object]/9, these objects may then not have overlapping memory and their addresses, as well as the addresses of their array elements, must be distinct. On the other hand with
static
, there would only be one instance of the array and so taking its address in multiple recursions must yield the same value. This is an observable difference between the version with and withoutstatic
.So, if the address of the array or one of its elements is taken or a reference to either formed and escapes the function body, and there are function calls which may potentially be recursive, then the compiler cannot generally treat the declaration with an additional
static
modifier.If the compiler can be sure that either e.g. no pointer/reference to the array or its elements escapes the function or that the function cannot possibly be called recursively or that the behavior of the function doesn't depend on the addresses of the array copies, then it could under the as-if rule treat the array as
static
.Because the array is a
const
-qualified automatic storage duration variable, it is impossible to modify values in it or to place new objects into its storage. As long as the addresses are not relevant to the behavior, there is therefore nothing else that could cause an observable difference in behavior.I don't think anything here is specific to
const char
arrays. This applies to allconst
automatic storage duration constant-initialized variables with trivial destruction.constexpr
instead ofconst
would not change anything here either, since that doesn't affect the object identity.Because of [intro.object]/9, both functions
myfunc
in your edit are also guaranteed to return0
. The two arrays have overlapping lifetimes and therefore may not share the same address. This is therefore not a method to "detect" this optimization. It causes it to become impossible.