编译器存储函数分配,非静态的const阵列是否可以在恒定数据中避免并避免到每次初始化?

发布于 2025-01-23 05:18:51 字数 2306 浏览 0 评论 0原文

在阅读如何存储在二进制文件中的char arrays/strings?,我在考虑各种方式其中涉及的原始字符串“ nancy”在结果二进制中看起来完好无损。该帖子的案例是:

int main()
{
    char temp[6] = "Nancy";
    printf("%s", temp);

    return 0;
}

显然,在一般情况下(编译器无法确认temp未被列入),它实际上必须初始化堆栈本地阵列以将来允许突变;数组本身必须分配空间(在堆栈上,或者使用寄存器用于真正怪异的架构),并且必须在每个呼叫的函数上填充它为了避免重新输入问题等,在C ++中只有一次,通常仅在c)中拨打一次。无论是将初始化到程序集中,还是从程序的常数数据部分中进行memcpy是无关的;肯定有一个每句话必须初始化的东西。

相比之下,如果char temp [6] =“ nancy”;被替换为:

  1. const char *temp =“ nancy”;
  2. char *temp = temp = temp = “ nancy”;(仅C;在C ++中,文字为const char [],尽管实际上它们在C中也不可变)
  3. static const char temp [6 ] =“ nancy”;
  4. 静态char temp [6] =“ nancy”;

然后,该程序不需要分配每个呼叫的任何基于数组的资源(在情况下只是指针变量#1&#2),在案例#4以外的所有情况下,它可以将数据放入仅读取的内存中(#4会将其放在读取版本中,但可以仍然被烘烤到二进制和加载的折叠式上)。

我的问题:标准是否为const char temp [6] =“ nancy”;的行为等于static const char temp [6] =“ Nancy”; <<< /code>?两者都是不变的,并且修改它们是违反规则的。我知道的唯一区别是:

  1. 如果没有静态,您希望该数组的地址与其他当地人共裂,而不是程序内存的其他部分(可能会影响缓存性能)
  2. 没有静态,您从技术上讲,在每个呼叫上创建和破坏了变量,

我看不出任何明显的可观察行为的破坏的行为:

  • 您无法观察数组的存在并停止存在,除非不确定的行为,例如将指针返回temp,而没有保证
  • 您不能合法地计算ptrdiff_t对于无关的变量(仅在给定数组中,加上所述数组的一端虚拟元素),

所以我 think 编译器可以安全地“将其视为static /code>“对于此情况,AS-IF规则;没有办法观察差异,因此它可以做任何感觉最好的事情。

我是否缺少任何c或c ++标准 const但non- 静态的每次单点初始化函数范围范围数组? C ++标准不同意,我也想知道。

如果C和 >在特定的编译器中,例如:

int myfunc() {
    const char temp[6] = "Nancy";
    const char temp2[6] = "Nancy";
    return temp == temp2;  // true if compiler implicitly made them static or combined them, false if not
}

或:

int otherfunc(const char *s) {
    const char temp[6] = "Nancy";
    return s == temp;
}

int myfunc() {
    const char temp[6] = "Nancy";
    return otherfunc(temp); // true if compiler implicitly made them shared statics, false if not
}

In reading How are char arrays / strings stored in binary files (C/C++)?, I was thinking about the various ways in which the raw string involved, "Nancy", would appear intact in the resulting binary. That post's case was:

int main()
{
    char temp[6] = "Nancy";
    printf("%s", temp);

    return 0;
}

and obviously, in the general case (where the compiler can't confirm if temp is unmutated), it must actually initialize a stack local array to allow for mutations in the future; the array itself must have space allocated (on the stack, or maybe using registers for truly weird architectures), and it must be populated on each call to the function (let's pretend this isn't main which is called only once in C++ and typically only once in C), to avoid reentrancy issues and the like. Whether it hardcodes the initialization into the assembly, or does a memcpy from the program's constant data section is irrelevant; there is definitely something that must be initialized per-call.

By contrast, if char temp[6] = "Nancy"; was replaced with any of:

  1. const char *temp = "Nancy";
  2. char *temp = "Nancy"; (C only; in C++ the literals are const char[], though in practice they're not mutable in C either)
  3. static const char temp[6] = "Nancy";
  4. static char temp[6] = "Nancy";

then the program need not allocate any array-length-based resources per call (just a pointer variable in cases #1 & #2), and in all but case #4, it can put the data in read-only memory baked into the binary's data constants (#4 would put it in the section for read-write memory, but it could still be baked into the binary and loaded copy-on-write).

My question: Does the standard provided leeway for const char temp[6] = "Nancy"; to behave equivalently to static const char temp[6] = "Nancy";? Both are immutable, and modifying them is against the rules. The only differences I'm aware of would be:

  1. Without static, you'd expect the array's address to be colocated with other locals, not in some other part of program memory (could have affects on cache performance)
  2. Without static, you're technically saying the variable is created and destroyed on each call

I don't see anything obviously broken in terms of observable behavior by the standard:

  • You can't watch the array exist and cease to exist except in terms of undefined behavior, e.g. returning a pointer to temp, where there are no guarantees
  • You can't legally compute ptrdiff_t for unrelated variables (only within a given array, plus the one-past-the-end virtual element of said array)

so I'd think the compiler could safely "treat as static" for this case by as-if rules; there's no way to observe the difference, so it can do whatever it feels best.

Is there anything I'm missing where either the C or C++ standard would require some sort of per-call initialization of the const but non-static function scoped array? If the C and C++ standards disagree, I'd like to know that too.

Edit: As Barmar points out in the constants, there are standards-legal ways to detect this behavior in a particular compiler, e.g.:

int myfunc() {
    const char temp[6] = "Nancy";
    const char temp2[6] = "Nancy";
    return temp == temp2;  // true if compiler implicitly made them static or combined them, false if not
}

or:

int otherfunc(const char *s) {
    const char temp[6] = "Nancy";
    return s == temp;
}

int myfunc() {
    const char temp[6] = "Nancy";
    return otherfunc(temp); // true if compiler implicitly made them shared statics, false if not
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

﹎☆浅夏丿初晴 2025-01-30 05:18:52

该标准未规定如何实现本地变量。堆栈是一个常见的选择,因为它使递归功能变得容易。但是叶功能很容易检测到,示例几乎是携带printf的副作用的叶子功能。

对于此类叶子功能,编译器可能会选择使用静态分配的内存来实现本地变量。正如该问题正确地指出的那样,由于不是静态的,因此仍需要构造和破坏本地变量。

但是,在这个问题中,char Temp [6]没有构造函数或破坏者。因此,如所述,将局部变量实现在叶函数中的编译器将具有memcpy来初始化temp

memcpy将可见到优化器 - 它将看到全局地址,printf中唯一使用相同地址的用法,然后可以推断出每个> memcpy可以移至程序启动。相同的memcpy的重复调用是基于掌握的,并且可以优化。

这将导致生成的程序集与静态情况相同。因此,问题的答案是肯定的。编译器确实可以生成相同的代码,甚至还有一种合理的方式,最终可以这样做。

The standard does not prescribe how local variables are implemented. A stack is a common choice, because it makes recursive functions easy. But leaf functions are easy to detect, and the example is almost a leaf function exact for the side-effect carrying printf.

For such leaf functions, a compiler might choose to implement local variables using statically allocated memory. As the question correctly states, the local variables still need to be constructed and destructed, since they're not static.

In this question, however, char temp[6] has no constructors or destructors. So a compiler which implements local variables in leaf functions as described would have a memcpy to initialize temp.

This memcpy would be visible to the optimizer - it would see the global address, the only use of the same address in printf, and it could then deduce that each memcpy can be moved to program startup. Repeated calls of that same memcpy are idempotent and can be optimized out.

This would cause the generated assembly to be identical to the static case. So the answer to the question is yes. A compiler can indeed generate the same code, and there's even a somewhat plausible way in which it could end up doing so.

ペ泪落弦音 2025-01-30 05:18:52

根据C11,6.2.2/6 temp没有链接,因为它是:

没有存储类规范符的对象的块范围标识符extern

和PER C11,6.2.2/2:

声明的对象的块范围标识符:

没有链接的标识符的每个声明表示一个唯一的实体

“唯一实体”暗示(我猜)“唯一地址”。因此,需要编译器提供独特性属性。

但是(猜测),如果优化器证明未使用唯一性属性,并且估计从内存读取的速度比写作要快。读取寄存器(=“ Nancy”的生​​成代码),然后(我想)它可以使temp具有静态存储持续时间。请注意,通常写作&amp;阅读寄存器比从内存中阅读要快得多。

额外:temp具有块范围,而不是功能范围。


在最初的答案下方(“超出范围”)。

C11,6.8语句和块,语义,3(添加了强调):

具有自动存储持续时间的对象的初始化,以及带有块范围的普通标识符的可变长度阵列声明器, 都存储了在对象中(包括在没有初始化器的对象中存储一个不确定的值)声明器出现的顺序。

Per C11, 6.2.2/6 temp has no linkage, because it is:

a block scope identifier for an object declared without the storage-class specifier extern

and per C11, 6.2.2/2:

each declaration of an identifier with no linkage denotes a unique entity

The "unique entity" implies (I guess) "unique address". Hence, the compiler is required to provide the uniqueness property.

However (speculating), if an optimizer proved that the uniqueness property is not used AND estimated that reading from memory is faster than writing & reading registers (generated code for = "Nancy"), then (I guess) it can make temp to have static storage duration. Note that usually writing & reading registers is much faster than reading from memory.

Extra: temp has block scope, not function scope.


Below the initial answer (which is "out of scope").

C11, 6.8 Statements and blocks, Semantics, 3 (emphasis added):

The initializers of objects that have automatic storage duration, and the variable length array declarators of ordinary identifiers with block scope, are evaluated and the values are stored in the objects (including storing an indeterminate value in objects without an initializer) each time the declaration is reached in the order of execution, as if it were a statement, and within each declaration in the order that declarators appear.

单挑你×的.吻 2025-01-30 05:18:52

对于C ++,尽管我希望C的答案是等效的:

如果递归递归发表声明的功能

const char temp[6] = "Nancy";

,那么与static的变体相比,声明将导致多个完整的const char [6]具有重叠寿命的对象存在。

应用 [intro.object]/9 然后,没有重叠的内存,它们的地址以及其数组元素的地址必须是不同的。另一方面,使用静态,数组的实例只有一个实例,因此在多个递归中获取其地址必须产生相同的值。这是具有和没有静态的版本之间可观察到的区别。

因此,如果获取数组的地址或其元素之一的地址或对形成并逃脱功能主体的参考,并且有可能具有递归的功能调用,则编译器通常无法用附加<来对待声明代码>静态修饰符。

如果编译器可以确定,例如对数组或其元素没有指针/引用函数,或者不能递归地调用函数,或者该函数的行为不取决于数组副本的地址,则然后,它可以在AS-IF规则下将数组视为static

由于数组是const qualified自动存储持续时间变量,因此无法修改其中的值或将新对象放入其存储中。只要这些地址与行为无关,因此没有其他可能会引起可观察到的行为差异。

我认为这里没有任何特定于const char数组。这适用于所有const自动存储持续时间恒定定位变量,并具有微不足道的破坏。 constexpr而不是const也不会在此处更改任何内容,因为这不会影响对象身份。


由于[Into.Object]/9,因此在您的编辑中,两个函数MyFunc也可以保证返回0。这两个阵列具有重叠的寿命,因此可能不会共享相同的地址。因此,这不是“检测”此优化的方法。它使它变得不可能。

For C++, although I would expect the answer for C to be equivalent:

If the function with the declaration

const char temp[6] = "Nancy";

is entered recursively, then, in contrast to the variant with static, the declaration will cause multiple complete const char[6] objects with overlapping lifetimes to exist.

Applying [intro.object]/9, these objects may then not have overlapping memory and their addresses, as well as the addresses of their array elements, must be distinct. On the other hand with static, there would only be one instance of the array and so taking its address in multiple recursions must yield the same value. This is an observable difference between the version with and without static.

So, if the address of the array or one of its elements is taken or a reference to either formed and escapes the function body, and there are function calls which may potentially be recursive, then the compiler cannot generally treat the declaration with an additional static modifier.

If the compiler can be sure that either e.g. no pointer/reference to the array or its elements escapes the function or that the function cannot possibly be called recursively or that the behavior of the function doesn't depend on the addresses of the array copies, then it could under the as-if rule treat the array as static.

Because the array is a const-qualified automatic storage duration variable, it is impossible to modify values in it or to place new objects into its storage. As long as the addresses are not relevant to the behavior, there is therefore nothing else that could cause an observable difference in behavior.

I don't think anything here is specific to const char arrays. This applies to all const automatic storage duration constant-initialized variables with trivial destruction. constexpr instead of const would not change anything here either, since that doesn't affect the object identity.


Because of [intro.object]/9, both functions myfunc in your edit are also guaranteed to return 0. The two arrays have overlapping lifetimes and therefore may not share the same address. This is therefore not a method to "detect" this optimization. It causes it to become impossible.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文