常见架构最快的整数类型
stdint.h
标头缺少 int_fastest_t< /code> 和
uint_fastest_t
与 {,u}int_fastX_t
类型相对应。对于整数类型的宽度并不重要的情况,如何选择允许处理最大数量的位且对性能影响最小的整数类型?例如,如果使用简单的方法在缓冲区中搜索第一个设置位,则可能会考虑这样的循环:
// return the bit offset of the first 1 bit
size_t find_first_bit_set(void const *const buf)
{
uint_fastest_t const *p = buf; // use the fastest type for comparison to zero
for (; *p == 0; ++p); // inc p while no bits are set
// return offset of first bit set
return (p - buf) * sizeof(*p) * CHAR_BIT + ffsX(*p) - 1;
}
自然,使用 char
会导致比 int 更多的操作
。但是,long long
可能会导致比在 32 位系统等上使用 int
的开销更昂贵的操作。
我目前的假设是对于主流架构,使用long
是最安全的选择:在32位系统上它是32位,在64位系统上它是64位。
The stdint.h
header lacks an int_fastest_t
and uint_fastest_t
to correspond with the {,u}int_fastX_t
types. For instances where the width of the integer type does not matter, how does one pick the integer type that allows processing the greatest quantity of bits with the least penalty to performance? For example, if one was searching for the first set bit in a buffer using a naive approach, a loop such as this might be considered:
// return the bit offset of the first 1 bit
size_t find_first_bit_set(void const *const buf)
{
uint_fastest_t const *p = buf; // use the fastest type for comparison to zero
for (; *p == 0; ++p); // inc p while no bits are set
// return offset of first bit set
return (p - buf) * sizeof(*p) * CHAR_BIT + ffsX(*p) - 1;
}
Naturally, using char
would result in more operations than int
. But long long
might result in more expensive operations than the overhead of using int
on a 32 bit system and so on.
My current assumption is for the mainstream architectures, the use of long
is the safest bet: It's 32 bit on 32 bit systems, and 64 bit on 64 bit systems.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
int_fast8_t
始终是正确实现中最快的整数类型。永远不可能有小于 8 位的整数类型(因为需要CHAR_BIT>=8
),并且由于int_fast8_t
是至少 8 位的最快整数类型,因此最快的整数类型,句点。int_fast8_t
is always the fastest integer type in a correct implementation. There can never be integer types smaller than 8 bits (becauseCHAR_BIT>=8
is required), and sinceint_fast8_t
is the fastest integer type with at least 8 bits, it's thus the fastest integer type, period.我不确定我是否真的理解这个问题,但你为什么不直接使用 int 呢?引用我的(错误的免费草稿副本,即 C++)标准,“普通整数具有执行环境体系结构建议的自然大小。”
但我认为,如果你想为某个操作拥有最佳的整数类型,根据它是哪个操作,它会有所不同。尝试在大数据缓冲区中查找第一位,或者在整数序列中查找一个数字,或者移动它们,很可能具有完全不同的最佳类型。
编辑:
无论它的价值如何,我做了一个小基准测试。在我的特定系统(带有 Linux 的 Intel i7 920,gcc -O3)上,在这个特定示例中,长整型(64 位)比普通整型(32 位)快很多。我的猜测恰恰相反。
I'm not sure I really understand the question, but why aren't you just using int? Quoting from my (free draft copy of the wrong, i. e. C++) standard, "Plain ints have the natural size suggested by the architecture of the execution environment."
But I think that if you want to have the optimal integer type for a certain operation, it will be different depending on which operation it is. Trying to find the first bit in a large data buffer, or finding a number in a sequence of integers, or moving them around, could very well have completely different optimal types.
EDIT:
For whatever it's worth, I did a small benchmark. On my particular system (Intel i7 920 with Linux, gcc -O3) it turns out that long ints (64 bits) are quite a bit faster than plain ints (32 bits), on this particular example. I would have guessed the opposite.
从理论上讲,
int
是最好的选择。它应该映射到 CPU 的本机寄存器大小,因此在您所询问的意义上是“最佳”的。但是,您可能仍然会发现 int-64 或 int-128 在某些 CPU 上比 int-32 更快,因为尽管它们大于寄存器大小,但它们会减少迭代次数循环,因此可以通过最小化循环开销和/或利用 DMA 更快地加载/存储数据来提高效率。
(例如,在 ARM-2 处理器上,加载一个 32 位寄存器需要 4 个内存周期,但顺序加载两个寄存器只需要 5 个周期,顺序加载 4 个寄存器需要 7 个周期。您上面建议的例程将被优化为用作可以释放许多寄存器(通常是 8 到 10 个),因此通过每次循环迭代使用多个寄存器,运行速度可以提高 3 或 4 倍)
唯一确定的方法是编写几个例程,然后在特定的目标机器来找出哪一台能产生最佳性能。
Theoretically,
int
is the best bet. It should map to the CPU's native register size, and thus be "optimal" in the sense you're asking about.However, you may still find that an int-64 or int-128 is faster on some CPUs than an int-32, because although these are larger than the register size, they will reduce the number of iterations of your loop, and thus may work out more efficient by minimising the loop overheads and/or taking advantage of DMA to load/store the data faster.
(For example, on ARM-2 processors it took 4 memory cycles to load one 32-bit register, but only 5 cycles to load two sequentially, and 7 cycles to load 4 sequentially. The routine you suggest above would be optimised to use as many registers as you could free up (8 to 10 usually), and could therefore run up to 3 or 4 times faster by using multiple registers per loop iteration)
The only way to be sure is to write several routines and then profile them on the specific target machine to find out which produces the best performance.
如果您想确定自己拥有最快的实现,为什么不在您期望运行的系统上对每个实现进行基准测试,而不是尝试猜测呢?
If you want to be certain you've got the fastest implementation, why not benchmark each one on the systems you're expecting to run on instead of trying to guess?
答案是
int
本身。至少在 C++ 中,标准 3.9.1/2 说:我希望 C 语言也是如此,尽管我没有任何标准文档。
The answer is
int
itself. At least in C++, where 3.9.1/2 of the standard says:I expect the same is true for C, though I don't have any of the standards documents.
我猜想类型
size_t
(对于无符号类型)和ptrdiff_t
(对于有符号类型)通常对应于任何给定平台上非常有效的整数类型。但没有什么比检查生产的汇编器并进行基准测试更能证明这一点了。
编辑,包括不同的评论,在这里和其他回复中:
size_t
和ptrdiff_t
是C99中唯一规范的typedef,并且其中一个可以做出合理的假设,认为它们与架构相关。标准整数类型有 5 种不同的可能等级(
char
、short
、int
、long
、长长
)。所有的力量都倾向于宽度类型为 8、16、32、64 以及在不久的将来 128。因此int
将停留在 32 位。它的定义与平台上的效率无关,而只是受到宽度要求的限制。I would guess that the types
size_t
(for an unsigned type) andptrdiff_t
(for a signed type) will usually correspond to quite efficient integer types on any given platform.But nothing can prove that than inspecting the produced assembler and to do benchmarks.
Edit, including the different comments, here and in other replies:
size_t
andptrdiff_t
are the only typedefs that are normative in C99 and for which one may make a reasonable assumption that they are related to the architecture.There are 5 different possible ranks for standard integer types (
char
,short
,int
,long
,long long
). All the forces go towards having types of width 8, 16, 32, 64 and in near future 128. As a consequenceint
will be stuck on 32 bit. Its definition will have nothing to do with efficiency on the platform, but just be constrained by that width requirement.对于所有现有的主流架构,
long
是目前循环吞吐量最快的类型。For all existing mainstream architectures
long
is the fastest type at present for loop throughput.如果您使用 gcc 进行编译,我建议使用 __builtin_ffs() 用于查找第一个位集:
这将被编译成(通常是单个)本机汇编指令。
If you're compiling with gcc, i'd recommend using __builtin_ffs() for finding the first bit set:
This will be compiled into (often a single) native assembly instruction.
由于问题不完整,无法回答这个问题。作为一个类比,考虑以下问题:
布加迪威龙 ?当然很快,但不适合从伦敦到纽约。
这个问题缺少的是整数将被使用的上下文。在上面的原始示例中,我怀疑如果数组很大且稀疏,您会看到 8 位、32 位或 64 位值之间有很大差异,因为您会在 CPU 限制之前达到内存带宽限制。
要点是,体系结构没有定义各种整数类型的大小,而是编译器设计者定义的。设计师将仔细权衡给定架构的每种类型的各种尺寸的利弊,并选择最合适的。
我猜想在 64 位系统上选择 32 位 int 是因为对于大多数操作来说,使用 32 位 int 就足够了。由于内存带宽是一个限制因素,因此节省内存使用可能是最重要的因素。
It is not possible to answer this question since the question is incomplete. As an analogy, consider the question:
A Bugatti Veyron? Certainly fast, but no good for going from London to New York.
What is missing from the question, is the context the integer will be used in. In the original example above, I doubt you'd see much difference between 8, 32 or 64 bit values if the array is large and sparse since you'll be hitting memory bandwidth limits before cpu limits.
The main point is, the architecture does not define what size the various integer types are, it's the compiler designer that does that. The designer will carefully weigh up the pros and cons for various sizes for each type for a given architecture and pick the most appropriate.
I guess the 32 bit int on the 64 bit system was chosen because for most operations ints are used for 32 bits are enough. Since memory bandwidth is a limiting factor, saving on memory use was probably the overriding factor.
让我们来读一下 C17 ISO 标准 必须说明
stdint.h
的“快速”类型:在最广泛使用的x86-64架构上几乎所有整数指令的运行速度都相当快。因此,编译器通常会选择
intX_t
作为int_fastX_t
。事实上,如果您不需要超过X
位,为什么要分配更多呢?换句话说,这里整数类型的“速度”是指它的内部CPU性能。不过,您的情况主要是读取内存,并且对其进行很少的 CPU 处理。为此,使用具有内存总线宽度的整数类型更有意义。现在它是 64 位,因此类型是
int64_t
。 C 标准没有“内存总线宽度”int
类型,因此,如果您尝试编写独立于体系结构的解决方案,intmax_t
是您的最佳选择。只要确保不要发生缓冲区溢出即可。Let's read what the C17 ISO standard has to say about the "fast" types of
stdint.h
:On the most widely used x86-64 architecture almost all integer instructions run comparably fast. Therefore the compilers normally picks
intX_t
forint_fastX_t
. Indeed, if you don't need more thanX
bits, why allocate more? In other words, the meant "speed" of the integer type here is its internal CPU performance.Your case though mostly reads memory and does very little CPU processing on it. For this purpose using an integer type with the memory bus width makes more sense. Nowadays it's 64 bits, so
int64_t
is the type. The C standard doesn't have a type for "memory bus width"int
, so if you were trying to write an architecture-independent solution,intmax_t
is your best bet. Just make sure not to hit a buffer overflow.