重复的文字和硬编码

发布于 2024-09-07 11:00:46 字数 1771 浏览 1 评论 0原文

我发现以下模式经常出现:

 b->last = ngx_cpymem(b->last, "</pre><hr>", sizeof("</pre><hr>") - 1);

请注意,文字字符串使用了两次。该摘录来自 nginx 源库。

当在编译单元中遇到这些文字时,编译器应该能够合并这些文字。

我的问题是:

  1. 商业级编译器(VC++、GCC、LLVM/Clang)在编译单元中遇到这种冗余时是否会删除这种冗余?
  2. (静态)链接器在链接目标文件时是否会删除此类冗余。
  3. 如果 2 适用,这种优化会在动态链接期间发生吗?
  4. 如果 1 和 2 适用,它们是否适用于所有文字。

这些问题很重要,因为它允许程序员在不损失效率的情况下变得冗长——即,考虑将大量静态数据模型硬连接到程序中(例如在某些低级场景中使用的决策支持系统的规则) 。

编辑

2点/说明

  1. 上面的代码是由公认的“大师”程序员编写的。这家伙一手写了nginx。

  2. 我没有问过哪种可能的文字硬编码机制更好。因此,不要偏离主题。

编辑 2

我最初的示例非常人为且具有限制性。以下代码片段显示了嵌入到内部硬编码知识中的字符串文字的用法。第一个片段用于让配置解析器告诉它为哪个字符串设置什么枚举值,第二个片段更广泛地用作程序中的字符串。就我个人而言,只要编译器使用字符串文字的一份副本,我对此感到满意,并且由于元素是静态的,因此它们不会进入全局符号表。

static ngx_conf_bitmask_t  ngx_http_gzip_proxied_mask[] = {
   { ngx_string("off"), NGX_HTTP_GZIP_PROXIED_OFF },
   { ngx_string("expired"), NGX_HTTP_GZIP_PROXIED_EXPIRED },
   { ngx_string("no-cache"), NGX_HTTP_GZIP_PROXIED_NO_CACHE },
   { ngx_string("no-store"), NGX_HTTP_GZIP_PROXIED_NO_STORE },
   { ngx_string("private"), NGX_HTTP_GZIP_PROXIED_PRIVATE },
   { ngx_string("no_last_modified"), NGX_HTTP_GZIP_PROXIED_NO_LM },
   { ngx_string("no_etag"), NGX_HTTP_GZIP_PROXIED_NO_ETAG },
   { ngx_string("auth"), NGX_HTTP_GZIP_PROXIED_AUTH },
   { ngx_string("any"), NGX_HTTP_GZIP_PROXIED_ANY },
   { ngx_null_string, 0 }
};

紧随其后的是:

static ngx_str_t  ngx_http_gzip_no_cache = ngx_string("no-cache");
static ngx_str_t  ngx_http_gzip_no_store = ngx_string("no-store");
static ngx_str_t  ngx_http_gzip_private = ngx_string("private");

对于那些留在主题上的人,太棒了!

I see the follow pattern occurring quite frequently:

 b->last = ngx_cpymem(b->last, "</pre><hr>", sizeof("</pre><hr>") - 1);

Notice that the literal string is used twice. The extract is from the nginx source-base.

The compiler should be able to merge these literals when it is encountered within the compilation unit.

My questions are:

  1. Do the commercial-grade compilers(VC++, GCC, LLVM/Clang) remove this redundancy when encountered within a compilation unit ?
  2. Does the (static) linker remove such redundancies when linking object files.
  3. if 2 applies would this optimization occur during dynamic linking ?
  4. If 1 and 2 apply, do they apply to all literals.

These questions are important because it allows a programmer to be verbose without losing efficiency -- i.e., think about enormous static data models being hard-wired into a program (for example the rules of a Decision Support System used in some low-level scenario).

Edit

2 points / clarifications

  1. The code above is written by a recognised "master" programmer. The guy single handedly wrote nginx.

  2. I have not asked which of the possible mechanisms of literal hard-coding is better. Therefore don't go off-topic.

Edit 2

My original example was quite contrived and restrictive. The following snippet shows the usage of string literals being embedded into internal hard-coded knowledge. The first snippet is meant for the config parser telling it what enum values to set for which string, and the second to be used more generally as a string in the program. Personally I am happy with this as long as the compiler uses one copy of the string literal, and since the elements are static, they don't enter the global symbol tables.

static ngx_conf_bitmask_t  ngx_http_gzip_proxied_mask[] = {
   { ngx_string("off"), NGX_HTTP_GZIP_PROXIED_OFF },
   { ngx_string("expired"), NGX_HTTP_GZIP_PROXIED_EXPIRED },
   { ngx_string("no-cache"), NGX_HTTP_GZIP_PROXIED_NO_CACHE },
   { ngx_string("no-store"), NGX_HTTP_GZIP_PROXIED_NO_STORE },
   { ngx_string("private"), NGX_HTTP_GZIP_PROXIED_PRIVATE },
   { ngx_string("no_last_modified"), NGX_HTTP_GZIP_PROXIED_NO_LM },
   { ngx_string("no_etag"), NGX_HTTP_GZIP_PROXIED_NO_ETAG },
   { ngx_string("auth"), NGX_HTTP_GZIP_PROXIED_AUTH },
   { ngx_string("any"), NGX_HTTP_GZIP_PROXIED_ANY },
   { ngx_null_string, 0 }
};

followed closely by:

static ngx_str_t  ngx_http_gzip_no_cache = ngx_string("no-cache");
static ngx_str_t  ngx_http_gzip_no_store = ngx_string("no-store");
static ngx_str_t  ngx_http_gzip_private = ngx_string("private");

To those that stayed on topic, bravo !

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

小情绪 2024-09-14 11:00:46

请注意,对于 sizeof("


")
的特定情况,几乎可以肯定字符串文字永远不会出现在输出文件中 - 整个 sizeof 表达式可以在编译时计算为整数常量 11。

尽管如此,编译器合并相同的字符串文字仍然是一种非常常见的优化。

Note that for the specific case of sizeof("</pre><hr>"), it is virtually certain that the string literal will never appear in the output file - the entire sizeof expression can be evaluated to the integer constant 11 at compile-time.

Notwithstanding, it is still a very common optimisation for compilers to merge identical string literals.

妞丶爷亲个 2024-09-14 11:00:46

我无法回答你的问题,但在这种情况下总是尝试使用 const 字符串(甚至 #define 会更好)。当您重构代码并更改一个文字的值而忘记另一个文字时,问题就出现了(在您的示例中不太可能,因为它们彼此相邻,但我以前见过)。

无论编译器可以做什么优化,人类仍然可以搞砸它:)

I can't answer your questions but always try to use a const string (or even a #define would be better) in such circumstances. The problem comes when you are refactoring code and change the value of one literal while forgetting the other (not so likely in your example as they are right next to each other but I have seen it before).

Whatever optomisations the compiler can do humans can still bugger it up :)

指尖凝香 2024-09-14 11:00:46
  1. 对于 GCC 是,对于其他也应该如此
  2. 对于 GNU 链接器也许是(参见 -fmerge-constants、-fmerge-all-constants)
  3. 不确定
  1. Yes for GCC, should be also true for others
  2. Maybe yes for GNU linker (see -fmerge-constants, -fmerge-all-constants)
  3. No
  4. Not sure
寄风 2024-09-14 11:00:46

我会非常不高兴看到这种模式 - 如果有人更改一个文字而不更改另一个文字怎么办?应将其拔出;制作一个漂亮的小命名常量。

假设你出于某种原因不能,或者只是为了实际回答这个问题:(至少,有趣的是。)

我用 C 做了一个类似的程序并用 GCC 4.4.3 编译它,常量字符串在生成的可执行文件中仅出现一次。

编辑:因为它可能作为一个简单的测试很有用,所以这是我测试它的代码...

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

main(){
    char *n = (char*)malloc(sizeof("teststring"));
    memcpy((void*)n, "teststring", sizeof("teststring"));
    printf("%s\n", n);
}

这是我用来检查字符串出现次数的命令...

strings a.out|grep teststring

但是请考虑使用不太容易出错的 命令尽可能的编码实践。

I would be very unhappy to see that pattern - what if someone changes one literal without changing the other? It should be pulled out; make a pretty little named constant.

Assuming you can't for some reason, or just to actually answer the question: (At least, anecdotally.)

I made a similar program in C and compiled it with GCC 4.4.3, the constant string appeared only once in the resulting executable.

Edit: Since it might be useful as an easy test, here is the code I tested it with...

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

main(){
    char *n = (char*)malloc(sizeof("teststring"));
    memcpy((void*)n, "teststring", sizeof("teststring"));
    printf("%s\n", n);
}

And here is the command I used to check how many times the string appeared...

strings a.out|grep teststring

But please please consider using less error-prone coding practices where possible.

找个人就嫁了吧 2024-09-14 11:00:46

我编写了一小段示例代码并进行了编译:

void func (void)
{
    char ps1[128];
    char ps2[128];

    strcpy(ps1, "string_is_the_same");
    strcpy(ps2, "string_is_the_same");

    printf("", ps1, ps2);
}

因此,即使没有优化,汇编程序文件中也只有一个文字“string_is_the_same”实例。但是,不确定这些字符串是否不会重复放置到不同的文件中 ->不同的目标文件。

I wrote a small sample code and compiled:

void func (void)
{
    char ps1[128];
    char ps2[128];

    strcpy(ps1, "string_is_the_same");
    strcpy(ps2, "string_is_the_same");

    printf("", ps1, ps2);
}

As a result in assembler file there is only one instance of literal "string_is_the_same" even without optimization. However, not sure if these strings are not duplicated being placed into different files -> different object files.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文