保留标识符在编译的哪个阶段被保留?

发布于 2024-09-25 10:49:48 字数 427 浏览 11 评论 0原文

只是工作中的一点好奇心。在从事危险的工作时,我开始考虑各种编译器及其相关标准库的实现。以下是我的想法进展:

  1. 某些类别的标识符被保留用于 C++ 和 C 中的实现使用。

  2. 编译器必须执行编译阶段(预处理、编译、链接)就像按顺序执行一样。

  3. C 预处理器不知道标识符的保留状态。

  4. 因此,程序可以使用保留标识符当且仅当

    1. 使用的保留标识符都是预处理器符号。

    2. 预处理结果不包含保留标识符。

    3. 标识符不与编译器预定义的符号(GNUC 等)冲突

这有效吗?我不确定第 3 点和第 4.3 点。另外,有没有办法测试一下?

Just a little curiosity at work, here. While working on something dangerous, I got to thinking about the implementations of various compilers and their associated standard libraries. Here's the progression of my thoughts:

  1. Some classes of identifiers are reserved for implementation use in C++ and C.

  2. A compiler must perform the stages of compilation (preprocessing, compilation, linking) as though they were performed in sequence.

  3. The C preprocessor is not aware of the reserved status of identifiers.

  4. Therefore, a program may use reserved identifiers if and only if:

    1. The reserved identifiers used are all preprocessor symbols.

    2. The preprocessing result does not include reserved identifiers.

    3. The identifiers do not conflict with symbols predefined by the compiler (GNUC et. al.)

Is this valid? I'm uncertain on points 3 and 4.3. Moreover, is there a way to test it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

×眷恋的温暖 2024-10-02 10:49:48

(对该问题的评论解释说,我们正在谈论 C99 第 7.1.3 节意义上的保留标识符,即匹配 /^_[A-Z_]/ 任何地方,/^_/ 在文件范围内,/^str[az]/ 具有外部链接,等等。所以这是我的猜测,至少是其中的一部分你在问...)

它们不是保留的,因为编译器(的任何特定阶段)预计会诊断它们的误用。相反,它们是保留的,因为如果您愚蠢到自己(错误)使用它们,那么如果您的程序在以后停止工作或停止编译,您就不会抱怨。

我们都已经看到,当仅具有危险知识的人查看系统标头内部然后编写自己的标头防护时会发生什么:

#ifndef _MYHEADER_H
#define _MYHEADER_H
// ...
#endif

他们正在调用未定义的行为,但没有任何东西将其诊断为“错误:由最终用户代码”。相反,大多数情况下他们很幸运,一切都很好。但有时它们会与实现中感兴趣的标识符发生冲突,并且会发生令人困惑的事情。

同样,我经常有一个名为 strip() 的外部可见函数:

char *strip(char *s) {
  // remove leading whitespace
  }

根据我对 C99 的 7.1.3、7.26 和 7.26.11 的阅读,这会调用未定义的行为。不过我决定不再关心这个。保留标识符并不是因为预计今天会发生任何不好的事情,而是因为标准保留了在未来修订中发明新标准 str-ip() 例程的权利。我决定,我认为 string-ip,无论它是什么,对于将来要添加的字符串操作来说不太可能是一个名称 - 所以在万一发生了不太可能发生的事情,当我到达那座桥时我就会跨过那座桥。从技术上讲,我正在调用未定义的行为,但我不希望被咬。

最后,为你的第 4 点提供一个反例:

#include <string.h>
#define memcpy(d,s,n)  (my_crazy_function((n), (s)))
void foo(char *a, char *b) {
  memcpy(a, b, 5);  // intends to invoke my_crazy_function
  memmove(a, b, 5); // standard behaviour expected
}

这符合你的 4.1、4.2、4.3(如果我理解你对最后一个观点的意图)。但是,如果将 memmove 另外实现为以 memcpy 编写的宏(通过 7.1.4/1),那么您就会遇到麻烦。

(The comments on the question explain that we're talking about reserved identifiers in the sense of C99 section 7.1.3, i.e., identifiers matching /^_[A-Z_]/ anywhere, /^_/ in file scope, /^str[a-z]/ with external linkage, etc. So here's my guess at at least a part of what you're asking...)

They're not reserved in the sense that (any particular phase of) the compiler is expected to diagnose their misuse. Rather, they're reserved in that if you're foolish enough to (mis)use them yourself, you don't get to complain if your program stops working or stops compiling at a later date.

We've all seen what happens when people with only a dangerous amount of knowledge look inside system headers and then write their own header guards:

#ifndef _MYHEADER_H
#define _MYHEADER_H
// ...
#endif

They're invoking undefined behaviour, but nothing diagnoses this as "error: reserved identifier used by end-user code". Instead mostly they're lucky and all is well; but occasionally they collide with an identifier of interest to the implementation, and confusing things happen.

Similarly, I often have an externally-visible function named strip() or so:

char *strip(char *s) {
  // remove leading whitespace
  }

By my reading of C99's 7.1.3, 7.26, and 7.26.11, this invokes undefined behaviour. However I have decided not to care about this. The identifier is not reserved in that anything bad is expected to happen today, but because the Standard reserves to itself the right to invent a new standard str-ip() routine in a future revision. And I've decided that I reckon string-ip, whatever that might be, is an unlikely name for a string operation to be added in the future -- so in the unlikely event that happens, I'll cross that bridge when I get to it. Technically I'm invoking undefined behaviour, but I don't expect to get bitten.

Finally, a counter-example to your point 4:

#include <string.h>
#define memcpy(d,s,n)  (my_crazy_function((n), (s)))
void foo(char *a, char *b) {
  memcpy(a, b, 5);  // intends to invoke my_crazy_function
  memmove(a, b, 5); // standard behaviour expected
}

This complies with your 4.1, 4.2, 4.3 (if I understand your intention on that last one). However, if memmove is additionally implemented as a macro (via 7.1.4/1) that is written in terms of memcpy, then you're going to be in trouble.

内心激荡 2024-10-02 10:49:48

我认为,这个故事比这更复杂,至少对于当且仅当来说是这样。我在 C99 中记得:

Eg 3. 是错误的,即使在预处理阶段,define 标记也被保留,以及诸如 __LINE____func__ 之类的伪宏code> 等也不能被重新定义。

然后,标识符的保留取决于范围。

  • 一些标识符是明确的
    保留给外部符号,例如
    setjmp
  • 以以下开头的标识符
    下划线,然后是另一个
    下划线或大写字母是
    在 C 中到处保留。你应该
    切勿触摸它们,即使
    预处理器。
  • 以下划线开头的标识符
    然后是一个小写字母
    在文件范围内被禁止,因为它们
    可以引用外部符号。他们
    可以在函数范围内自由使用。

4.2也不完全正确。首先,在以下条件下定义一个以关键字作为名称的宏只是未定义的行为(又名非常邪恶):

包含一个标准标头,同时
宏的定义名称与
关键字 (7.1.2)。

那么,在其扩展中包含自己名称的宏是“安全的”,因为保证扩展不是递归的。类似下面的东西是有效的,但不推荐:(

#define if(...)                                         \
for(int _i = 0; _i < 1; ++_i)                           \
  for(int _cond = (__VA_ARGS__);                        \
      _i < 1;                                           \
      printf("line %d val %d\n", __LINE__, _cond),      \
        ++_i)                                           \
    if(_cond)

顺便说一句,没有人使用该宏,它会编译并执行其外观,但有一些极端情况会使其爆炸。)

The story is more complicated than that, I think, at least for the if and only if. What I recall from C99:

E.g 3. is false, the defined token is reserved even in the preprocessing phase, and pseudo-macros like __LINE__, __func__ etc may not be redefined either.

Then, the reservation of identifiers depends on the scope.

  • Some identifiers are explicitly
    reserved for external symbols, e.g
    setjmp.
  • Identifiers with starting with
    underscore and then another
    underscore or a capital letter are
    reserved everywhere in C. You should
    never touch them, even with the
    preprocessor.
  • Identifiers starting with underscore
    and then a lowercase letter are
    forbidden in file scope since they
    may refer to external symbols. They
    can be used freely in function scope.

4.2 is not completely correct either. First it is only undefined behavior (aka very evil) to have a macro defined that has a keyword as its name under the following condition:

A standard header is included while a
macro is defined with the same name as
a keyword (7.1.2).

Then, a macro that contains its own name in its expansion is "safe", since the expansion is guaranteed not to be recursive. Something like the following would be valid, though not recommended:

#define if(...)                                         \
for(int _i = 0; _i < 1; ++_i)                           \
  for(int _cond = (__VA_ARGS__);                        \
      _i < 1;                                           \
      printf("line %d val %d\n", __LINE__, _cond),      \
        ++_i)                                           \
    if(_cond)

(BTW, don't anyone use that macro, it compiles and does about what it looks like, but has corner cases that let it explode.)

平生欢 2024-10-02 10:49:48

C 预处理器不知道标识符的保留状态。

我不确定你所说的“意识到”是什么意思,但我认为你不一定可以假设这一点 - 7.1.3 说

所有以下划线、大写或其他下划线开头的标识符始终保留供任何使用

预处理器(或编译器)实现可以将这些保留的标识符用于任何适合它的目的 - 如果您是这样,它不需要警告您滥用这些标识符。

我建议“程序可以使用保留标识符当且仅当”标准(例如预定义宏集)或实现在其文档中如此说明。

当然,我认为在很多情况下您都可以使用保留的标识符 - 实现不会超出其方式给您带来问题。大量代码使用保留的名称,我猜想如果没有足够的理由,实现不会破坏该代码。但是,如果您不实现编译器工具链,最好完全避免该命名空间。

The C preprocessor is not aware of the reserved status of identifiers.

I'm not sure what you mean by "aware", but I don't think you can necessarily assume this - 7.1.3 says

All identifiers that begin with an underscore an either an uppercase or another underscore are always reserved for any use

The preprocessor (or compiler) implementation can use these reserved identifiers for whatever purposes suit it - it doesn't need to warn you if you're misusing these identifiers.

I'd suggest that "a program may use reserved identifiers if and only if" the standard (for example the set of pre-defined macros) or the implementation says so in its documentation.

Of course, I think you'll get away with using identifiers that are reserved in quite a few cases - implementations don't go out of their way to cause you problems. An awful lot of code uses names that are reserved, and I'd guess that implementations would rather not break that code without good enough reason. However, it would be best if you avoided that namespace altogether if you're not implementing a compiler toolchain.

So尛奶瓶 2024-10-02 10:49:48

_UNDERSCORE_CAPdouble__underscore 这样的标识符被保留供实现认为合适时使用。如果实现使用它们,例如在 中具有 _File 标识符或宏,这不是问题,这就是预留的用途。如果用户使用的话,这是一个潜在的问题。

因此,为了诊断这一点,编译器必须跟踪标识符的来源。仅检查 中的代码是不够的,因为这些代码可以定义可能使用的宏,并且可能会使用实现保留字扩展为某些内容。例如,isupper 可能在 中定义为

#define isupper(x) (_UPPER_BIT & _CHAR_TRAITS[x])

或类似的。 (自从我看到基于上述的定义以来已经很长时间了。)

因此,为了跟踪这一点,预处理器必须维护来自哪个宏的记录等。跟踪这将使预处理器变得相当复杂,编译器编写者似乎认为没有相应的好处。

Identifiers like _UNDERSCORE_CAP and double__underscore are reserved for use by the implementation as it sees fit. It's not a problem if the implementation uses them, such as having, say, a _File identifier or macro in <stdio.h>, that's what the reservation is for. It is a potential problem if the user uses one.

Therefore, in order to diagnose this, the compiler would have to keep track of where identifiers came from. It wouldn't be sufficient to just check code not in <angle_bracket_files.h>, since those can define macros that might be used and are likely to expand to something using implementation-reserved words. For example, isupper might be defined in <ctype.h> as

#define isupper(x) (_UPPER_BIT & _CHAR_TRAITS[x])

or some such. (It's been a long time since I saw the definition I based the above on.)

Therefore, to keep track of this, the preprocessor would have to maintain records on which macro came from there, among other things. Tracking that would complicate the preprocessor considerably, to what compiler writers appear to think no corresponding gain.

捎一片雪花 2024-10-02 10:49:48

如果您问是否可以 #define if while 并使代码不可读,那么可以。这是混淆C竞赛中的常见做法。但这实际上会违背你的 4.2。

对于 GNUC 之类的东西,这些是预定义的,但您通常可以重新定义它们并取消定义它们。这样做确实不是一个好主意,但你可以。更有趣的是重新定义或取消定义 __LINE____FILE__ 以及类似的预处理器符号(b/c 它们会自动更改)。

If you're asking if you can #define if while and make your code unreadable, then yes. This was a common practice in the obfuscated C competition. This would actually go against your 4.2, though.

For things like GNUC, these are predefined, but you can usually redefine them and undef them. It is not really a good idea to do this, but you can. More interesting would be redefining or undefining __LINE__, __FILE__, and preprocessor symbols like that (b/c they change automatically).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文