保留标识符在编译的哪个阶段被保留?
只是工作中的一点好奇心。在从事危险的工作时,我开始考虑各种编译器及其相关标准库的实现。以下是我的想法进展:
某些类别的标识符被保留用于 C++ 和 C 中的实现使用。
编译器必须执行编译阶段(预处理、编译、链接)就像按顺序执行一样。
C 预处理器不知道标识符的保留状态。
因此,程序可以使用保留标识符当且仅当:
使用的保留标识符都是预处理器符号。
预处理结果不包含保留标识符。
标识符不与编译器预定义的符号(
GNUC
等)冲突
这有效吗?我不确定第 3 点和第 4.3 点。另外,有没有办法测试一下?
Just a little curiosity at work, here. While working on something dangerous, I got to thinking about the implementations of various compilers and their associated standard libraries. Here's the progression of my thoughts:
Some classes of identifiers are reserved for implementation use in C++ and C.
A compiler must perform the stages of compilation (preprocessing, compilation, linking) as though they were performed in sequence.
The C preprocessor is not aware of the reserved status of identifiers.
Therefore, a program may use reserved identifiers if and only if:
The reserved identifiers used are all preprocessor symbols.
The preprocessing result does not include reserved identifiers.
The identifiers do not conflict with symbols predefined by the compiler (
GNUC
et. al.)
Is this valid? I'm uncertain on points 3 and 4.3. Moreover, is there a way to test it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
(对该问题的评论解释说,我们正在谈论 C99 第 7.1.3 节意义上的保留标识符,即匹配
/^_[A-Z_]/ 任何地方,
/^_/
在文件范围内,/^str[az]/
具有外部链接,等等。所以这是我的猜测,至少是其中的一部分你在问...)它们不是保留的,因为编译器(的任何特定阶段)预计会诊断它们的误用。相反,它们是保留的,因为如果您愚蠢到自己(错误)使用它们,那么如果您的程序在以后停止工作或停止编译,您就不会抱怨。
我们都已经看到,当仅具有危险知识的人查看系统标头内部然后编写自己的标头防护时会发生什么:
他们正在调用未定义的行为,但没有任何东西将其诊断为“错误:由最终用户代码”。相反,大多数情况下他们很幸运,一切都很好。但有时它们会与实现中感兴趣的标识符发生冲突,并且会发生令人困惑的事情。
同样,我经常有一个名为
strip()
的外部可见函数:根据我对 C99 的 7.1.3、7.26 和 7.26.11 的阅读,这会调用未定义的行为。不过我决定不再关心这个。保留标识符并不是因为预计今天会发生任何不好的事情,而是因为标准保留了在未来修订中发明新标准
str-ip()
例程的权利。我决定,我认为 string-ip
,无论它是什么,对于将来要添加的字符串操作来说不太可能是一个名称 - 所以在万一发生了不太可能发生的事情,当我到达那座桥时我就会跨过那座桥。从技术上讲,我正在调用未定义的行为,但我不希望被咬。最后,为你的第 4 点提供一个反例:
这符合你的 4.1、4.2、4.3(如果我理解你对最后一个观点的意图)。但是,如果将
memmove
另外实现为以memcpy
编写的宏(通过 7.1.4/1),那么您就会遇到麻烦。(The comments on the question explain that we're talking about reserved identifiers in the sense of C99 section 7.1.3, i.e., identifiers matching
/^_[A-Z_]/
anywhere,/^_/
in file scope,/^str[a-z]/
with external linkage, etc. So here's my guess at at least a part of what you're asking...)They're not reserved in the sense that (any particular phase of) the compiler is expected to diagnose their misuse. Rather, they're reserved in that if you're foolish enough to (mis)use them yourself, you don't get to complain if your program stops working or stops compiling at a later date.
We've all seen what happens when people with only a dangerous amount of knowledge look inside system headers and then write their own header guards:
They're invoking undefined behaviour, but nothing diagnoses this as "error: reserved identifier used by end-user code". Instead mostly they're lucky and all is well; but occasionally they collide with an identifier of interest to the implementation, and confusing things happen.
Similarly, I often have an externally-visible function named
strip()
or so:By my reading of C99's 7.1.3, 7.26, and 7.26.11, this invokes undefined behaviour. However I have decided not to care about this. The identifier is not reserved in that anything bad is expected to happen today, but because the Standard reserves to itself the right to invent a new standard
str-ip()
routine in a future revision. And I've decided that I reckon string-ip
, whatever that might be, is an unlikely name for a string operation to be added in the future -- so in the unlikely event that happens, I'll cross that bridge when I get to it. Technically I'm invoking undefined behaviour, but I don't expect to get bitten.Finally, a counter-example to your point 4:
This complies with your 4.1, 4.2, 4.3 (if I understand your intention on that last one). However, if
memmove
is additionally implemented as a macro (via 7.1.4/1) that is written in terms ofmemcpy
, then you're going to be in trouble.我认为,这个故事比这更复杂,至少对于当且仅当来说是这样。我在 C99 中记得:
Eg 3. 是错误的,即使在预处理阶段,
define
标记也被保留,以及诸如__LINE__
、__func__
之类的伪宏code> 等也不能被重新定义。然后,标识符的保留取决于范围。
保留给外部符号,例如
setjmp
。下划线,然后是另一个
下划线或大写字母是
在 C 中到处保留。你应该
切勿触摸它们,即使
预处理器。
然后是一个小写字母
在文件范围内被禁止,因为它们
可以引用外部符号。他们
可以在函数范围内自由使用。
4.2也不完全正确。首先,在以下条件下定义一个以关键字作为名称的宏只是未定义的行为(又名非常邪恶):
那么,在其扩展中包含自己名称的宏是“安全的”,因为保证扩展不是递归的。类似下面的东西是有效的,但不推荐:(
顺便说一句,没有人使用该宏,它会编译并执行其外观,但有一些极端情况会使其爆炸。)
The story is more complicated than that, I think, at least for the if and only if. What I recall from C99:
E.g 3. is false, the
defined
token is reserved even in the preprocessing phase, and pseudo-macros like__LINE__
,__func__
etc may not be redefined either.Then, the reservation of identifiers depends on the scope.
reserved for external symbols, e.g
setjmp
.underscore and then another
underscore or a capital letter are
reserved everywhere in C. You should
never touch them, even with the
preprocessor.
and then a lowercase letter are
forbidden in file scope since they
may refer to external symbols. They
can be used freely in function scope.
4.2 is not completely correct either. First it is only undefined behavior (aka very evil) to have a macro defined that has a keyword as its name under the following condition:
Then, a macro that contains its own name in its expansion is "safe", since the expansion is guaranteed not to be recursive. Something like the following would be valid, though not recommended:
(BTW, don't anyone use that macro, it compiles and does about what it looks like, but has corner cases that let it explode.)
我不确定你所说的“意识到”是什么意思,但我认为你不一定可以假设这一点 - 7.1.3 说
预处理器(或编译器)实现可以将这些保留的标识符用于任何适合它的目的 - 如果您是这样,它不需要警告您滥用这些标识符。
我建议“程序可以使用保留标识符当且仅当”标准(例如预定义宏集)或实现在其文档中如此说明。
当然,我认为在很多情况下您都可以使用保留的标识符 - 实现不会超出其方式给您带来问题。大量代码使用保留的名称,我猜想如果没有足够的理由,实现不会破坏该代码。但是,如果您不实现编译器工具链,最好完全避免该命名空间。
I'm not sure what you mean by "aware", but I don't think you can necessarily assume this - 7.1.3 says
The preprocessor (or compiler) implementation can use these reserved identifiers for whatever purposes suit it - it doesn't need to warn you if you're misusing these identifiers.
I'd suggest that "a program may use reserved identifiers if and only if" the standard (for example the set of pre-defined macros) or the implementation says so in its documentation.
Of course, I think you'll get away with using identifiers that are reserved in quite a few cases - implementations don't go out of their way to cause you problems. An awful lot of code uses names that are reserved, and I'd guess that implementations would rather not break that code without good enough reason. However, it would be best if you avoided that namespace altogether if you're not implementing a compiler toolchain.
像
_UNDERSCORE_CAP
和double__underscore
这样的标识符被保留供实现认为合适时使用。如果实现使用它们,例如在
中具有_File
标识符或宏,这不是问题,这就是预留的用途。如果用户使用的话,这是一个潜在的问题。因此,为了诊断这一点,编译器必须跟踪标识符的来源。仅检查
中的代码是不够的,因为这些代码可以定义可能使用的宏,并且可能会使用实现保留字扩展为某些内容。例如,isupper
可能在
中定义为或类似的。 (自从我看到基于上述的定义以来已经很长时间了。)
因此,为了跟踪这一点,预处理器必须维护来自哪个宏的记录等。跟踪这将使预处理器变得相当复杂,编译器编写者似乎认为没有相应的好处。
Identifiers like
_UNDERSCORE_CAP
anddouble__underscore
are reserved for use by the implementation as it sees fit. It's not a problem if the implementation uses them, such as having, say, a_File
identifier or macro in<stdio.h>
, that's what the reservation is for. It is a potential problem if the user uses one.Therefore, in order to diagnose this, the compiler would have to keep track of where identifiers came from. It wouldn't be sufficient to just check code not in
<angle_bracket_files.h>
, since those can define macros that might be used and are likely to expand to something using implementation-reserved words. For example,isupper
might be defined in<ctype.h>
asor some such. (It's been a long time since I saw the definition I based the above on.)
Therefore, to keep track of this, the preprocessor would have to maintain records on which macro came from there, among other things. Tracking that would complicate the preprocessor considerably, to what compiler writers appear to think no corresponding gain.
如果您问是否可以
#define if while
并使代码不可读,那么可以。这是混淆C竞赛中的常见做法。但这实际上会违背你的 4.2。对于 GNUC 之类的东西,这些是预定义的,但您通常可以重新定义它们并取消定义它们。这样做确实不是一个好主意,但你可以。更有趣的是重新定义或取消定义
__LINE__
、__FILE__
以及类似的预处理器符号(b/c 它们会自动更改)。If you're asking if you can
#define if while
and make your code unreadable, then yes. This was a common practice in the obfuscated C competition. This would actually go against your 4.2, though.For things like GNUC, these are predefined, but you can usually redefine them and undef them. It is not really a good idea to do this, but you can. More interesting would be redefining or undefining
__LINE__
,__FILE__
, and preprocessor symbols like that (b/c they change automatically).