为什么 C 编译器在外部名称前添加下划线?
我在 C 领域工作了很长时间,以至于编译器通常会在 extern
的开头添加下划线这一事实刚刚被理解......但是,今天的另一个问题让我想知道添加下划线的真正原因。 维基百科文章声称原因是:
C 编译器的常见做法是在所有外部作用域程序标识符前面添加一个前导下划线,以避免与运行时语言支持的贡献发生冲突
我认为这至少有一个事实的核心,但它似乎也没有真正回答这个问题,因为如果将下划线添加到所有外部变量中,它对于防止冲突不会有太大帮助。
有人对前导下划线的基本原理有很好的了解吗?
添加的下划线是 Unix creat()
系统调用不以“e”结尾的部分原因吗?我听说某些平台上的早期链接器名称限制为 6 个字符。如果是这样的话,那么在外部名称前面添加下划线似乎是一个彻头彻尾的疯狂想法(现在我只有 5 个字符可以使用......)。
I've been working in C for so long that the fact that compilers typically add an underscore to the start of an extern
is just understood... However, another SO question today got me wondering about the real reason why the underscore is added. A wikipedia article claims that a reason is:
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
I think there's at least a kernel of truth to this, but also it seems to no really answer the question, since if the underscore is added to all externs it won't help much with preventing clashes.
Does anyone have good information on the rationale for the leading underscore?
Is the added underscore part of the reason that the Unix creat()
system call doesn't end with an 'e'? I've heard that early linkers on some platforms had a limit of 6 characters for names. If that's the case, then prepending an underscore to external names would seem to be a downright crazy idea (now I only have 5 characters to play with...).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果编译器提供运行时支持,您会认为在前面添加一个下划线会更有意义相反,运行时支持的少数外部标识符!
当 C 编译器首次出现时,在这些平台上用 C 进行编程的基本替代方案是用汇编语言进行编程,并且将用汇编语言和 C 编写的目标文件链接在一起曾经(有时仍然)有用。因此,实际上(恕我直言)领先的添加到外部 C 标识符的下划线是为了避免与您自己的汇编代码中的标识符发生冲突。
(另请参阅 GCC 的
asm
标签扩展;并注意,这个前置下划线可以被认为是名称修饰的简单形式,例如 C++ 等更复杂的语言使用更复杂的名称修饰,但这就是它的开始。)If the runtime support is provided by the compiler, you would think it would make more sense to prepend an underscore to the few external identifiers in the runtime support instead!
When C compilers first appeared, the basic alternative to programming in C on those platforms was programming in assembly language, and it was (and occasionally still is) useful to link together object files written in assembler and C. So really (IMHO) the leading underscore added to external C identifiers was to avoid clashes with the identifiers in your own assembly code.
(See also GCC's
asm
label extension; and note that this prepended underscore can be considered a simple form of name mangling. More complicated languages like C++ use more complicated name mangling, but this is where it started.)如果 c 编译器总是在每个符号之前添加下划线,
那么启动/c 运行时代码(通常用汇编语言编写)可以安全地使用不以下划线开头的标签和符号(例如符号“start”)。
即使您在 c 代码中编写 start() 函数,它也会在 object/asm 输出中生成为 _start 。 (请注意,在这种情况下,c 代码不可能生成不以下划线开头的符号)因此启动编码器不必担心为每个创建晦涩难懂的符号(例如 $_dontuse42%$)他/她的全局变量/标签。
因此链接器不会抱怨名称冲突,程序员也很高兴。 :)
以下与编译器在其输出格式中添加下划线的做法不同。
这是 C 系统库和其他系统组件遵循的约定。 (以及诸如 __FILE__ 等之类的东西)。
(请注意,这样的符号(例如:_time)可能会导致生成的输出中出现 2 个前导下划线(__time))
if the c compiler always prepended an underscore before every symbol,
then the startup/c-runtime code, (which is usually written in assembly) can safely use labels and symbols that do not start with an underscore, (such as the symbol 'start').
even if you write a start() function in the c code, it gets generated as _start in the object/asm output. (note that in this case, there is no possibility for the c code to generate a symbol that does not start with an underscore) so the startup coder doesnt have to worry about inventing obscure improbable symbols (like $_dontuse42%$) for each of his/her global variables/labels.
so the linker wont complain about a name clash, and the programmer is happy. :)
the following is different from the practise of the compiler prepending an underscore in its output formats.
that is a convention followed, for the c sytem libraries and other system components. (and for things such as __FILE__ etc).
(note that such a symbol (ex: _time) may result in 2 leading underscores (__time) in the generated output)
据我所知,这是为了避免命名冲突。不适用于其他外部变量,但更重要的是,当您使用库时,它希望不会与用户代码变量名称冲突。
From what I always hear it is to avoid naming conflicts. Not for other extern variables but more so that when you use a library it will hopefully not conflict with the user code variable names.
main 函数不是可执行文件的真正入口点。一些静态链接文件具有最终调用main的真正入口点,并且这些静态链接文件拥有不以下划线开头的命名空间。在我的系统上,/usr/lib 中有 gcrt1.o、crt1.o 和 dylib1.o 等。其中每个都有一个不带下划线的“start”函数,最终将调用“_main”入口点。除了这些文件之外的所有其他内容都具有外部作用域。历史与在项目中混合汇编程序和 C 语言有关,其中所有 C 语言都被视为外部。
The main function is not the real entry point of an executable. Some statically linked files have the real entry point that eventually calls main, and those statically linked files own the namespace that does not start with an underscore. On my system, in /usr/lib, there are gcrt1.o, crt1.o and dylib1.o among others. Each of those has a "start" function without an underscore that will eventually call the "_main" entry point. Everything else besides those files has external scope. The history has to do with mixing assembler and C in a project, where all C was considered external.
来自维基百科:
From Wikipedia: