C/C++：字符串常量指针的优化

发布于 2024-07-16 03:14:21 字数 803 浏览 9 评论 0原文

看一下这段代码：

#include <iostream>
using namespace std;

int main()
{
    const char* str0 = "Watchmen";
    const char* str1 = "Watchmen";
    char* str2 = "Watchmen";
    char* str3 = "Watchmen";

    cerr << static_cast<void*>( const_cast<char*>( str0 ) ) << endl;
    cerr << static_cast<void*>( const_cast<char*>( str1 ) ) << endl;
    cerr << static_cast<void*>( str2 ) << endl;
    cerr << static_cast<void*>( str3 ) << endl;

    return 0;
}

它产生如下输出：

这是在 Cygwin 下运行的 g++ 编译器上。即使没有打开优化 (-O0)，这些指针也都指向同一位置。

编译器是否总是优化得如此之多以至于它会搜索所有字符串常量以查看它们是否相等？这种行为可以依赖吗？

原文

Have a look at this code:

#include <iostream>
using namespace std;

int main()
{
    const char* str0 = "Watchmen";
    const char* str1 = "Watchmen";
    char* str2 = "Watchmen";
    char* str3 = "Watchmen";

    cerr << static_cast<void*>( const_cast<char*>( str0 ) ) << endl;
    cerr << static_cast<void*>( const_cast<char*>( str1 ) ) << endl;
    cerr << static_cast<void*>( str2 ) << endl;
    cerr << static_cast<void*>( str3 ) << endl;

    return 0;
}

Which produces an output like this:

This was on the g++ compiler running under Cygwin. The pointers all point to the same location even with no optimization turned on (-O0).

Does the compiler always optimize so much that it searches all the string constants to see if they are equal? Can this behaviour be relied on?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟雨凡馨 2024-07-23 03:14:21

它是不可靠的，它是一种优化，不属于任何标准。

我已将代码的相应行更改为：

const char* str0 = "Watchmen";
const char* str1 = "atchmen";
char* str2 = "tchmen";
char* str3 = "chmen";

-O0 优化级别的输出为：

但对于 -O1 来说，它是：

0x80487c0
0x80487c1
0x80487c2
0x80487c3

正如您所看到的，GCC（v4.1.2）在所有后续子字符串中重用了第一个字符串。如何在内存中排列字符串常量是编译器的选择。

It can't be relied on, it is an optimization which is not a part of any standard.

I'd changed corresponding lines of your code to:

const char* str0 = "Watchmen";
const char* str1 = "atchmen";
char* str2 = "tchmen";
char* str3 = "chmen";

The output for the -O0 optimization level is:

But for the -O1 it's:

0x80487c0
0x80487c1
0x80487c2
0x80487c3

As you can see GCC (v4.1.2) reused first string in all subsequent substrings. It's compiler choice how to arrange string constants in memory.

回复收藏 0 原文

时光清浅 2024-07-23 03:14:21

这是一个非常简单的优化，可能是如此简单，以至于大多数编译器编写者根本不认为它是一种优化。毕竟，将优化标志设置为最低级别并不意味着“完全天真”。

编译器在合并重复字符串文字方面的积极程度会有所不同。他们可能会将自己限制为单个子例程 - 将这四个声明放在不同的函数中而不是单个函数中，您可能会看到不同的结果。其他人可能会做整个编译单元。其他人可能依赖链接器在多个编译单元之间进行进一步合并。

您不能依赖此行为，除非您的特定编译器的文档表明您可以这样做。语言本身在这方面没有提出任何要求。即使可移植性不是问题，我也会对在自己的代码中依赖它持谨慎态度，因为即使在单个供应商编译器的不同版本之间，行为也可能会发生变化。

回复收藏 0 原文

杯别 2024-07-23 03:14:21

您当然不应该依赖这种行为，但大多数编译器都会这样做。任何文字值（“Hello”、42 等）都将存储一次，并且指向它的任何指针自然会解析为该单个引用。

如果您发现需要依赖它，那么请确保安全并重新编码如下：

char *watchmen = "Watchmen";
char *foo = watchmen;
char *bar = watchmen;

You surely should not rely on that behavior, but most compilers will do this. Any literal value ("Hello", 42, etc.) will be stored once, and any pointers to it will naturally resolve to that single reference.

If you find that you need to rely on that, then be safe and recode as follows:

char *watchmen = "Watchmen";
char *foo = watchmen;
char *bar = watchmen;

回复收藏 0 原文

娜些时光，永不杰束 2024-07-23 03:14:21

当然，你不应该指望这一点。优化器可能会对你做一些棘手的事情，并且应该允许它这样做。

然而，这种情况非常很常见。我记得早在 87 年，一位同学正在使用 DEC C 编译器，并遇到了一个奇怪的错误，他所有的文字 3 都变成了 11（数字可能已更改以保护无辜者）。他甚至执行了 printf ("%d\n", 3) 并打印了 11。

他把我叫过去，因为这太奇怪了（为什么这会让人们思考）我？），经过大约 30 分钟的绞尽脑汁，我们找到了原因。这行代码大致如下：

if (3 = x) break;

注意单个“=”字符。是的，那是一个错字。编译器有一个小错误并允许这样做。其效果是将整个程序中的所有字面值 3 变成当时 x 中的值。

无论如何，很明显 C 编译器将所有文字 3 放在同一个地方。如果 80 年代的 C 编译器能够做到这一点，那也不会太难。我预计它会很常见。

You shouldn't count on that of course. An optimizer might do something tricky on you, and it should be allowed to do so.

It is however very common. I remember back in '87 a classmate was using the DEC C compiler and had this weird bug where all his literal 3's got turned into 11's (numbers may have changed to protect the innocent). He even did a printf ("%d\n", 3) and it printed 11.

He called me over because it was so weird (why does that make people think of me?), and after about 30 minutes of head scratching we found the cause. It was a line roughly like this:

if (3 = x) break;

Note the single "=" character. Yes, that was a typo. The compiler had a wee bug and allowed this. The effect was to turn all his literal 3's in the entire program into whatever happened to be in x at the time.

Anyway, its clear the C compiler was putting all literal 3's in the same place. If a C compiler back in the 80's was capable of doing this, it can't be too tough to do. I'd expect it to be very common.

回复收藏 0 原文

ゝ偶尔ゞ 2024-07-23 03:14:21

我不会依赖这种行为，因为我怀疑 C 或 C++ 标准是否会明确这种行为，但编译器这样做是有道理的。即使没有为编译器指定任何优化，它也会表现出这种行为，这也是有道理的；其中没有任何权衡。

C 或C++ 中的所有字符串文字（例如“字符串文字”）都是只读的，因此是常量。当您说：

char *s = "literal";

从某种意义上说，您正在将字符串向下转型为非常量类型。然而，您无法取消字符串的只读属性：如果您尝试操作它，您将在运行时而不是编译时被捕获。（这实际上是在将字符串文字分配给变量时使用 const char * 的一个很好的理由。）

I would not rely on the behavior, because I am doubtful the C or C++ standards would make explicit this behavior, but it makes sense that the compiler does it. It also makes sense that it exhibits this behavior even in the absence of any optimization specified to the compiler; there is no trade-off in it.

All string literals in C or C++ (e.g. "string literal") are read-only, and thus constant. When you say:

char *s = "literal";

You are in a sense downcasting the string to a non-const type. Nevertheless, you can't do away with the read-only attribute of the string: if you try to manipulate it, you'll be caught at run-time rather than at compile-time. (Which is actually a good reason to use const char * when assigning string literals to a variable of yours.)

回复收藏 0 原文