制作普通 int 64 位会破坏很多合理的代码吗?

发布于 2024-10-09 15:50:16 字数 703 浏览 11 评论 0原文

直到最近,我还认为大多数系统实现者/供应商即使在 64 位机器上也保留普通 int 32 位的决定是一种权宜之计。对于现代 C99 固定大小类型(int32_tuint32_t 等),需要有一个每种大小为 8、16、32 和 64 的标准整数类型大部分都消失了,而且 int 似乎也可以做成 64 位。

然而,C 中普通 int 大小的最大实际后果来自于 C 本质上不支持小于 int 类型的算术。特别是,如果 int 大于 32 位,则对 uint32_t 值进行任何算术运算的结果都具有 signed int 类型,这是相当令人不安的。

这是在实际实现中将 int 永久固定为 32 位的好理由吗?我倾向于说“是”。在我看来,uint32_t 的大量使用可能会在 int 大于 32 位时中断。即使应用一元减号或按位求补运算符也会变得危险,除非您强制转换回 uint32_t

当然,同样的问题也适用于当前实现中的 uint16_tuint8_t,但每个人似乎都意识到并习惯于将它们视为“小于 int””类型。

Until recently, I'd considered the decision by most systems implementors/vendors to keep plain int 32-bit even on 64-bit machines a sort of expedient wart. With modern C99 fixed-size types (int32_t and uint32_t, etc.) the need for there to be a standard integer type of each size 8, 16, 32, and 64 mostly disappears, and it seems like int could just as well be made 64-bit.

However, the biggest real consequence of the size of plain int in C comes from the fact that C essentially does not have arithmetic on smaller-than-int types. In particular, if int is larger than 32-bit, the result of any arithmetic on uint32_t values has type signed int, which is rather unsettling.

Is this a good reason to keep int permanently fixed at 32-bit on real-world implementations? I'm leaning towards saying yes. It seems to me like there could be a huge class of uses of uint32_t which break when int is larger than 32 bits. Even applying the unary minus or bitwise complement operator becomes dangerous unless you cast back to uint32_t.

Of course the same issues apply to uint16_t and uint8_t on current implementations, but everyone seems to be aware of and used to treating them as "smaller-than-int" types.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

巷子口的你 2024-10-16 15:50:16

正如你所说,我认为晋升规则确实是杀手锏。然后,uint32_t 将提升为 int,突然之间,您就已经签署了几乎每个人都期望无符号的算术。

这主要隐藏在您只进行算术并分配回 uint32_t 的地方。但在与常量进行比较的地方,这可能是致命的。我不知道依赖于此类比较而不进行显式转换的代码是否合理。像 (uint32_t)1 这样的常量转换可能会变得非常乏味。我个人至少总是使用后缀 U 来表示我想要无符号的常量,但这已经没有我想要的那么可读了。

另请记住,不保证 uint32_t 等存在。甚至连uint8_t都没有。其执行是 POSIX 的扩展。所以从这个意义上说,C 作为一门语言还远远无法实现这一目标。

As you say, I think that the promotion rules really are the killer. uint32_t would then promote to int and all of a sudden you'd have signed arithmetic where almost everybody expects unsigned.

This would be mostly hidden in places where you do just arithmetic and assign back to an uint32_t. But it could be deadly in places where you do comparison to constants. Whether code that relies on such comparisons without doing an explicit cast is reasonable, I don't know. Casting constants like (uint32_t)1 can become quite tedious. I personally at least always use the suffix U for constants that I want to be unsigned, but this already is not as readable as I would like.

Also have in mind that uint32_t etc are not guaranteed to exist. Not even uint8_t. The enforcement of that is an extension from POSIX. So in that sense C as a language is far from being able to make that move.

眼睛会笑 2024-10-16 15:50:16

“合理的代码”……

嗯……关于开发的事情是,你编写并修复它,然后它就可以工作了……然后你就停下来了!

也许您已经被烧伤过很多次,所以您可以很好地保持在某些功能的安全范围内,也许您没有以那种特定方式被烧伤,所以您没有意识到自己正在依赖于可能会发生改变的东西。

或者甚至你依赖于一个错误。

在旧的 Mac 68000 编译器上,int 为 16 位,long 为 32。但即使如此,大多数现有的 C 代码也假定 int 为 32,因此您在新闻组中找到的典型代码将无法工作。 (哦,Mac 没有 printf,但我离题了。)

所以,我想说的是,是的,如果你改变任何东西,那么有些东西就会被破坏。

"Reasonable Code"...

Well... the thing about development is, you write and fix it and then it works... and then you stop!

And maybe you've been burned a lot so you stay well within the safe ranges of certain features, and maybe you haven't been burned in that particular way so you don't realize that you're relying on something that could kind-of change.

Or even that you're relying on a bug.

On olden Mac 68000 compilers, int was 16 bit and long was 32. But even then most extant C code assumed an int was 32, so typical code you found on a newsgroup wouldn't work. (Oh, and Mac didn't have printf, but I digress.)

So, what I'm getting at is, yes, if you change anything, then some things will break.

二智少女猫性小仙女 2024-10-16 15:50:16

使用现代 C99 固定大小类型
(int32_t和uint32_t等)需要
存在一个标准整数
每种尺寸的类型 8、16、32 和 64
大部分消失了,

C99 有固定大小的 typeDEF,而不是固定大小的类型。原生 C 整数类型仍然是 charshortintlonglong long< /代码>。它们仍然具有相关性。

ILP64 的问题在于它在 C 类型和 C99 类型定义之间存在很大的不匹配。

  • int8_t = char
  • int16_t = Short
  • int32_t = 非标准类型
  • int64_t = int、long 或 long long

来自 64 位编程模型:为什么选择 LP64?

不幸的是,ILP64 模型确实
没有提供自然的描述方式
32 位数据类型,并且必须求助于
非便携式结构,例如
__int32 来描述此类类型。这很可能会导致实际问题
在生成可以运行的代码时
32 位和 64 位平台均无需
#ifdef 结构。大量的港口已经成为可能
无需代码即可转换为 LP64 型号
做出这样的改变,同时
维持已进行的投资
数据集,即使在
未输入信息
应用程序外部可见。

With modern C99 fixed-size types
(int32_t and uint32_t, etc.) the need
for there to be a standard integer
type of each size 8, 16, 32, and 64
mostly disappears,

C99 has fixed-sized typeDEFs, not fixed-size types. The native C integer types are still char, short, int, long, and long long. They are still relevant.

The problem with ILP64 is that it has a great mismatch between C types and C99 typedefs.

  • int8_t = char
  • int16_t = short
  • int32_t = nonstandard type
  • int64_t = int, long, or long long

From 64-Bit Programming Models: Why LP64?:

Unfortunately, the ILP64 model does
not provide a natural way to describe
32-bit data types, and must resort to
non-portable constructs such as
__int32 to describe such types. This is likely to cause practical problems
in producing code which can run on
both 32 and 64 bit platforms without
#ifdef constructions. It has been possible to port large quantities of
code to LP64 models without the need
to make such changes, while
maintaining the investment made in
data sets, even in cases where the
typing information was not made
externally visible by the application.

拧巴小姐 2024-10-16 15:50:16

DEC Alpha 和 OSF/1 Unix 是 Unix 的第一个 64 位版本之一,它使用 64 位整数 - ILP64 架构(意味着 intlong 和指针都是 64 位量)。这引起了很多问题。

我没有看到提到的一个问题 - 这就是为什么我在这么长时间后才回答 - 是如果你有一个 64 位 int,你使用什么大小的 short ? 16 位(经典的、不改变任何内容的方法)和 32 位(激进的“好吧,short 的长度应该是 int”方法的一半)都将呈现一些问题。

使用 C99 标头,您可以编码为固定大小的整数 - 如果您选择忽略 36 位机器或 60 位整数(至少是准合法的)。然而,大多数代码并不是使用这些类型编写的,并且代码中通常存在根深蒂固且很大程度上隐藏的(但从根本上有缺陷的)假设,如果模型偏离现有的变体,这些假设就会被破坏。

请注意 Microsoft 针对 64 位 Windows 的超保守 LLP64 模型。之所以选择它,是因为如果更改 32 位模型,太多的旧代码会被破坏。但是,由于差异,已移植到 ILP64 或 LP64 架构的代码不会立即移植到 LLP64。阴谋论者可能会说,这是故意选择的,目的是让为 64 位 Unix 编写的代码更难移植到 64 位 Windows。实际上,我怀疑这是否不仅仅是一个令人高兴的(对微软来说)副作用; 32 位 Windows 代码也必须进行大量修改才能使用 LP64 模型。

DEC Alpha and OSF/1 Unix was one of the first 64-bit versions of Unix, and it used 64-bit integers - an ILP64 architecture (meaning int, long and pointers were all 64-bit quantities). It caused lots of problems.

One issue I've not seen mentioned - which is why I'm answering at all after so long - is that if you have a 64-bit int, what size do you use for short? Both 16 bits (the classical, change nothing approach) and 32 bits (the radical 'well, a short should be half as long as an int' approach) will present some problems.

With the C99 <stdint.h> and <inttypes.h> headers, you can code to fixed size integers - if you choose to ignore machines with 36-bit or 60-bit integers (which is at least quasi-legitimate). However, most code is not written using those types, and there are typically deep-seated and largely hidden (but fundamentally flawed) assumptions in the code that will be upset if the model departs from the existing variations.

Notice Microsoft's ultra-conservative LLP64 model for 64-bit Windows. That was chosen because too much old code would break if the 32-bit model was changed. However, code that had been ported to ILP64 or LP64 architectures was not immediately portable to LLP64 because of the differences. Conspiracy theorists would probably say it was deliberately chosen to make it more difficult for code written for 64-bit Unix to be ported to 64-bit Windows. In practice, I doubt whether that was more than a happy (for Microsoft) side-effect; the 32-bit Windows code had to be revised a lot to make use of the LP64 model too.

独守阴晴ぅ圆缺 2024-10-16 15:50:16

如果 int 是 64 位,有一种代码习惯用法会被破坏,而且我经常看到它,我认为它可以被称为合理:

  • 通过测试 if ((val & 0x80000000) != 来检查值是否为负数0)

这常见于检查错误代码。许多错误代码标准(如 Window 的 HRESULT)使用位 31 来表示错误。代码有时会通过测试位 31 或有时通过检查错误是否为负数来检查该错误。

Microsoft 用于测试 HRESULT 的宏使用这两种方法 - 而且我确信有大量代码在不使用 SDK 宏的情况下执行类似操作。如果 MS 迁移到 ILP64,这将是导致移植问题的一个领域,而 LLP64 模型(或 LP64 模型)可以完全避免这些问题。

注意:如果您不熟悉“ILP64”等术语,请参阅答案末尾的迷你术语表。

我很确定有很多代码(不一定是 Windows-定向)使用 plain-old-int 来保存错误代码,假设这些 int 的大小为 32 位。我敢打赌,有很多具有该错误状态方案的代码也使用两种检查(<0 和设置的位 31),如果移动到 ILP64 平台,这些代码将会中断。如果错误代码经过精心构造,以便进行符号扩展,那么这些检查可以继续正常工作,但同样,我见过的许多此类系统通过或操作一堆位字段来构造错误值。

无论如何,我认为这不是一个无法解决的问题,但我确实认为这是一种相当常见的编码实践,如果迁移到 ILP64 平台,将导致大量代码需要修复。

请注意,我也不认为这是 Microsoft 选择 LLP64 模型的最重要原因之一(我认为该决定很大程度上是由 32 位和 64 位进程之间的二进制数据兼容性驱动的,如 在 MSDN雷蒙德·陈的博客)。


64 位平台编程模型术语的迷你词汇表

  • ILP64:intlong,指针为 64 位
  • LP64:long 和指针是 64 位,int 是 32 位(许多(大多数?)Unix 平台使用)
  • LLP64:long long 和指针是 64 位-bits、intlong 仍为 32 位(在 Win64 上使用)

有关 64 位编程模型的详细信息,请参阅 "64 位编程模型:为什么选择 LP64?"

There's one code idiom that would break if ints were 64-bits, and I see it often enough that I think it could be called reasonable:

  • checking if a value is negative by testing if ((val & 0x80000000) != 0)

This is commonly found in checking error codes. Many error code standards (like Window's HRESULT) uses bit 31 to represent an error. And code will sometimes check for that error either by testing bit 31 or sometimes by checking if the error is a negative number.

Microsoft's macros for testing HRESULT use both methods - and I'm sure there's a ton of code out there that does similar without using the SDK macros. If MS had moved to ILP64, this would be one area that caused porting headaches that are completely avoided with the LLP64 model (or the LP64 model).

Note: if you're not familiar with terms like "ILP64", please see the mini-glossary at the end of the answer.

I'm pretty sure there's a lot of code (not necessarily Windows-oriented) out there that uses plain-old-int to hold error codes, assuming that those ints are 32-bits in size. And I bet there's a lot of code with that error status scheme that also uses both kinds of checks (< 0 and bit 31 being set) and which would break if moved to an ILP64 platform. These checks could be made to continue to work correctly either way if the error codes were carefully constructed so that sign-extension took place, but again, many such systems I've seen construct the error values by or-ing together a bunch a bitfields.

Anyway, I don't think this is an unsolvable problem by any means, but I do think it's a fairly common coding practice that would cause a lot of code to require fixing up if moved to an ILP64 platform.

Note that I also don't think this was one of the foremost reasons for Microsoft to choose the LLP64 model (I think that decision was largely driven by binary data compatibility between 32-bit and 64-bit processes, as mentioned in MSDN and on Raymond Chen's blog).


Mini-Glossary for the 64-bit Platform Programming Model terminology:

  • ILP64: int, long, pointers are 64-bits
  • LP64: long and pointers are 64-bits, int is 32-bits (used by many (most?) Unix platforms)
  • LLP64: long long and pointers are 64-bits, int and long remain 32-bits (used on Win64)

For more information on 64-bit programming models, see "64-bit Programming Models: Why LP64?"

兮子 2024-10-16 15:50:16

虽然我个人不会编写这样的代码,但我敢打赌它存在于不止一个地方......当然,如果您更改 int 的大小,它就会中断。

int i, x = getInput();
for (i = 0; i < 32; i++)
{
    if (x & (1 << i))
    {
        //Do something
    }
}

While I don't personally write code like this, I'll bet that it's out there in more than one place... and of course it'll break if you change the size of int.

int i, x = getInput();
for (i = 0; i < 32; i++)
{
    if (x & (1 << i))
    {
        //Do something
    }
}
撩起发的微风 2024-10-16 15:50:16

嗯,这个故事并不是什么新鲜事。我认为“大多数计算机”指的是台式计算机。 int 已经从 16 位转换为 32 位。有什么可以表明这次不会发生同样的进展吗?

Well, it's not like this story is all new. With "most computers" I assume you mean desktop computers. There already has been a transition from 16-bit to 32-bit int. Is there anything at all that says the same progression won't happen this time?

皓月长歌 2024-10-16 15:50:16

不是特别。 int 在某些 64 位架构(不是 x64)上是 64 位。

该标准实际上并不保证您获得 32 位整数,只是 (u)int32_t 可以容纳一个。

现在,如果您依赖 int 与 ptrdiff_t 大小相同,您可能会被破坏。

请记住,C 语言甚至不保证机器是二进制机器。

Not particularly. int is 64 bit on some 64 bit architectures (not x64).

The standard does not actually guarantee you get 32 bit integers, just that (u)int32_t can hold one.

Now if you are depending on int is the same size as ptrdiff_t you may be broken.

Remember, C does not guarantee that the machine even is a binary machine.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文