整数文字的类型默认不是 int 吗?

发布于 2024-12-15 18:21:23 字数 676 浏览 2 评论 0原文

我刚刚回答了这个问题,它问为什么在 for 循环中迭代直到 100 亿需要如此长的时间(OP 实际上中止了) 10 分钟后)而不是迭代直到 10 亿:

for (i = 0; i < 10000000000; i++)

现在我和许多其他人的明显答案是,这是由于迭代变量是 32 位(永远不会达到 100 亿)和循环得到一个无限循环。

尽管我意识到了这个问题,但我仍然想知道编译器内部究竟发生了什么?

由于文字没有附加 L,恕我直言,它也应该是 int 类型,因此也是 32 位。因此,由于溢出,它应该是可访问范围内的普通 int 。要真正认识到它无法从 int 访问,编译器需要知道它是 100 亿,因此将其视为大于 32 位的常量。

这样的文字是否会自动提升到合适的(或至少是实现定义的)范围(在本例中至少为 64 位),即使没有附加 L ,这是标准行为吗?还是幕后发生了一些不同的事情,比如由于溢出而导致的 UB(整数溢出实际上是 UB)?如果有的话,标准中的一些引用可能会很好。

尽管最初的问题是 C,但我也很欣赏 C++ 的答案(如果有任何不同)。

I just answered this question, which asked why iterating until 10 billion in a for loop takes so much longer (the OP actually aborted it after 10 mins) than iterating until 1 billion:

for (i = 0; i < 10000000000; i++)

Now my and many others' obvious answer was that it was due to the iteration variable being 32-bit (which never reaches 10 billion) and the loop getting an infinite loop.

But though I realized this problem, I still wonder what was really going on inside the compiler?

Since the literal was not appended with an L, it should IMHO be of type int, too, and therefore 32-bit. So due to overflow it should be a normal int inside the range to be reachable. To actually recognize that it cannot be reached from int, the compiler needs to know that it is 10 billion and therefore see it as a more-than-32-bit constant.

Does such a literal get promoted to a fitting (or at least implementation-defined) range (at least 64-bit, in this case) automatically, even if not appended an L and is this standard behaviour? Or is something different going on behind the scenes, like UB due to overflow (is integer overflow actually UB)? Some quotes from the Standard may be nice, if any.

Although the original question was C, I also appreciate C++ answers, if any different.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

余罪 2024-12-22 18:21:23

就 C++ 而言:

C++11,[lex.icon] ¶2

整数文字的类型是表 6 中可以表示其值的相应列表中的第一个。

表 6,对于没有后缀和十进制常量的文字,给出:(

int
long int
long long int

有趣的是,对于十六进制或八进制常量,也允许使用无符号类型 - 但每个类型都出现在相应的有符号之后列表中的一个)

因此,很明显,在这种情况下,常量已被解释为 long int (或 long long int if long int > 是太 32 位)。

请注意,“太大的文字”会导致编译错误:

如果程序的翻译单元之一包含无法用任何允许的类型表示的整数文字,则该程序是格式错误的。

(ibidem,¶3)

在此示例中立即可见,这提醒我们 ideone.com 使用 32 位编译器。


我现在看到问题是关于 C...好吧,它或多或少是相同的:

C99,§6.4.4.1

整数常量的类型是可以表示其值的相应列表中的第一个。

列表与 C++ 标准中的列表相同。


附录:如果其他一切都失败,C99 和 C++11 也允许文字为“扩展整数类型”(即其他特定于实现的整数类型)。 (C++11,[lex.icon] ¶3;C99,表后的 §6.4.4.1 ¶5)

As far as C++ is concerned:

C++11, [lex.icon] ¶2

The type of an integer literal is the first of the corresponding list in Table 6 in which its value can be represented.

And Table 6, for literals without suffixes and decimal constants, gives:

int
long int
long long int

(interestingly, for hexadecimal or octal constants also unsigned types are allowed - but each one come after the corresponding signed one in the list)

So, it's clear that in that case the constant has been interpreted as a long int (or long long int if long int was too 32 bit).

Notice that "too big literals" should result in a compilation error:

A program is ill-formed if one of its translation units contains an integer literal that cannot be represented by any of the allowed types.

(ibidem, ¶3)

which is promptly seen in this sample, that reminds us that ideone.com uses 32 bit compilers.


I saw now that the question was about C... well, it's more or less the same:

C99, §6.4.4.1

The type of an integer constant is the first of the corresponding list in which its value can be represented.

list that is the same as in the C++ standard.


Addendum: both C99 and C++11 allow also the literals to be of "extended integer types" (i.e. other implementation-specific integer types) if everything else fails. (C++11, [lex.icon] ¶3; C99, §6.4.4.1 ¶5 after the table)

梦里°也失望 2024-12-22 18:21:23

从我标有 ISO/IEC 9899:TC2 委员会草案 — 2005 年 5 月 6 日 的 C 标准草案来看,这些规则与 Matteo 发现的 C++ 规则非常相似:

5 整数常量的类型是可以表示其值的相应列表中的第一个。

Suffix      Decimal Constant          Octal or Hexadecimal Constant
-------------------------------------------------------------------
none        int                       int
            long int                  unsigned int
            long long int             long int
                                      unsigned long int
                                      long long int
                                      unsigned long long int

u or U      unsigned int              unsigned int
            unsigned long int         unsigned long int
            unsigned long long int    unsigned long long int

l or L      long int                  long int
            long long int             unsigned long int
                                      long long int
                                      unsigned long long int
Both u or U unsigned long int         unsigned long int
and l or L  unsigned long long int    unsigned long long int

ll or LL    long long int             long long int
                                      unsigned long long int

Both u or U unsigned long long int    unsigned long long int
and ll or LL 

From my draft of the C standard labeled ISO/IEC 9899:TC2 Committee Draft — May 6, 2005, the rules are remarkably similar to the C++ rules Matteo found:

5 The type of an integer constant is the first of the corresponding list in which its value can be represented.

Suffix      Decimal Constant          Octal or Hexadecimal Constant
-------------------------------------------------------------------
none        int                       int
            long int                  unsigned int
            long long int             long int
                                      unsigned long int
                                      long long int
                                      unsigned long long int

u or U      unsigned int              unsigned int
            unsigned long int         unsigned long int
            unsigned long long int    unsigned long long int

l or L      long int                  long int
            long long int             unsigned long int
                                      long long int
                                      unsigned long long int
Both u or U unsigned long int         unsigned long int
and l or L  unsigned long long int    unsigned long long int

ll or LL    long long int             long long int
                                      unsigned long long int

Both u or U unsigned long long int    unsigned long long int
and ll or LL 
幽蝶幻影 2024-12-22 18:21:23

我仍然想知道编译器内部究竟发生了什么

如果您对编译器如何解释代码感兴趣,您可以查看汇编程序。

10000000000:

400054f:
mov    -0x4(%rbp),%eax
mov    %eax,-0x8(%rbp)
addl   $0x1,-0x4(%rbp)
jmp    40054f <main+0xb>

所以它只是将其编译成无限循环,
如果将 10000000000 替换为 10000:

....
test   %al,%al
jne    400551

I still wonder what was really going on inside the compiler

You can look at assembler, if you are interested in how the compiler interprets code.

10000000000:

400054f:
mov    -0x4(%rbp),%eax
mov    %eax,-0x8(%rbp)
addl   $0x1,-0x4(%rbp)
jmp    40054f <main+0xb>

so it just compiled it into infinite loop,
if replace 10000000000 with 10000:

....
test   %al,%al
jne    400551
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文