常见架构最快的整数类型

发布于 2024-09-18 23:19:02 字数 864 浏览 7 评论 0原文

stdint.h 标头缺少 int_fastest_t< /code> 和 uint_fastest_t{,u}int_fastX_t 类型相对应。对于整数类型的宽度并不重要的情况,如何选择允许处理最大数量的位且对性能影响最小的整数类型?例如,如果使用简单的方法在缓冲区中搜索第一个设置位,则可能会考虑这样的循环:

// return the bit offset of the first 1 bit
size_t find_first_bit_set(void const *const buf)
{
    uint_fastest_t const *p = buf; // use the fastest type for comparison to zero
    for (; *p == 0; ++p); // inc p while no bits are set
    // return offset of first bit set
    return (p - buf) * sizeof(*p) * CHAR_BIT + ffsX(*p) - 1;
}

自然,使用 char 会导致比 int 更多的操作。但是,long long 可能会导致比在 32 位系统等上使用 int 的开销更昂贵的操作。

我目前的假设是对于主流架构,使用long是最安全的选择:在32位系统上它是32位,在64位系统上它是64位。

The stdint.h header lacks an int_fastest_t and uint_fastest_t to correspond with the {,u}int_fastX_t types. For instances where the width of the integer type does not matter, how does one pick the integer type that allows processing the greatest quantity of bits with the least penalty to performance? For example, if one was searching for the first set bit in a buffer using a naive approach, a loop such as this might be considered:

// return the bit offset of the first 1 bit
size_t find_first_bit_set(void const *const buf)
{
    uint_fastest_t const *p = buf; // use the fastest type for comparison to zero
    for (; *p == 0; ++p); // inc p while no bits are set
    // return offset of first bit set
    return (p - buf) * sizeof(*p) * CHAR_BIT + ffsX(*p) - 1;
}

Naturally, using char would result in more operations than int. But long long might result in more expensive operations than the overhead of using int on a 32 bit system and so on.

My current assumption is for the mainstream architectures, the use of long is the safest bet: It's 32 bit on 32 bit systems, and 64 bit on 64 bit systems.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

醉生梦死 2024-09-25 23:19:02

int_fast8_t 始终是正确实现中最快的整数类型。永远不可能有小于 8 位的整数类型(因为需要 CHAR_BIT>=8),并且由于 int_fast8_t 是至少 8 位的最快整数类型,因此最快的整数类型,句点。

int_fast8_t is always the fastest integer type in a correct implementation. There can never be integer types smaller than 8 bits (because CHAR_BIT>=8 is required), and since int_fast8_t is the fastest integer type with at least 8 bits, it's thus the fastest integer type, period.

看春风乍起 2024-09-25 23:19:02

我不确定我是否真的理解这个问题,但你为什么不直接使用 int 呢?引用我的(错误的免费草稿副本,即 C++)标准,“普通整数具有执行环境体系结构建议的自然大小。”

但我认为,如果你想为某个操作拥有最佳的整数类型,根据它是哪个操作,它会有所不同。尝试在大数据缓冲区中查找第一位,或者在整数序列中查找一个数字,或者移动它们,很可能具有完全不同的最佳类型。

编辑:

无论它的价值如何,我做了一个小基准测试。在我的特定系统(带有 Linux 的 Intel i7 920,gcc -O3)上,在这个特定示例中,长整型(64 位)比普通整型(32 位)快很多。我的猜测恰恰相反。

I'm not sure I really understand the question, but why aren't you just using int? Quoting from my (free draft copy of the wrong, i. e. C++) standard, "Plain ints have the natural size suggested by the architecture of the execution environment."

But I think that if you want to have the optimal integer type for a certain operation, it will be different depending on which operation it is. Trying to find the first bit in a large data buffer, or finding a number in a sequence of integers, or moving them around, could very well have completely different optimal types.

EDIT:

For whatever it's worth, I did a small benchmark. On my particular system (Intel i7 920 with Linux, gcc -O3) it turns out that long ints (64 bits) are quite a bit faster than plain ints (32 bits), on this particular example. I would have guessed the opposite.

叹沉浮 2024-09-25 23:19:02

从理论上讲,int 是最好的选择。它应该映射到 CPU 的本机寄存器大小,因此在您所询问的意义上是“最佳”的。

但是,您可能仍然会发现 int-64 或 int-128 在某些 CPU 上比 int-32 更快,因为尽管它们大于寄存器大小,但它们会减少迭代次数循环,因此可以通过最小化循环开销和/或利用 DMA 更快地加载/存储数据来提高效率。

(例如,在 ARM-2 处理器上,加载一个 32 位寄存器需要 4 个内存周期,但顺序加载两个寄存器只需要 5 个周期,顺序加载 4 个寄存器需要 7 个周期。您上面建议的例程将被优化为用作可以释放许多寄存器(通常是 8 到 10 个),因此通过每次循环迭代使用多个寄存器,运行速度可以提高 3 或 4 倍)

唯一确定的方法是编写几个例程,然后在特定的目标机器来找出哪一台能产生最佳性能。

Theoretically, int is the best bet. It should map to the CPU's native register size, and thus be "optimal" in the sense you're asking about.

However, you may still find that an int-64 or int-128 is faster on some CPUs than an int-32, because although these are larger than the register size, they will reduce the number of iterations of your loop, and thus may work out more efficient by minimising the loop overheads and/or taking advantage of DMA to load/store the data faster.

(For example, on ARM-2 processors it took 4 memory cycles to load one 32-bit register, but only 5 cycles to load two sequentially, and 7 cycles to load 4 sequentially. The routine you suggest above would be optimised to use as many registers as you could free up (8 to 10 usually), and could therefore run up to 3 or 4 times faster by using multiple registers per loop iteration)

The only way to be sure is to write several routines and then profile them on the specific target machine to find out which produces the best performance.

请止步禁区 2024-09-25 23:19:02

如果您想确定自己拥有最快的实现,为什么不在您期望运行的系统上对每个实现进行基准测试,而不是尝试猜测呢?

If you want to be certain you've got the fastest implementation, why not benchmark each one on the systems you're expecting to run on instead of trying to guess?

梦在深巷 2024-09-25 23:19:02

答案是int本身。至少在 C++ 中,标准 3.9.1/2 说:

普通int具有自然大小
由架构建议
执行环境

我希望 C 语言也是如此,尽管我没有任何标准文档。

The answer is int itself. At least in C++, where 3.9.1/2 of the standard says:

Plain ints have the natural size
suggested by the architecture of the
execution environment

I expect the same is true for C, though I don't have any of the standards documents.

老旧海报 2024-09-25 23:19:02

我猜想类型 size_t (对于无符号类型)和 ptrdiff_t (对于有符号类型)通常对应于任何给定平台上非常有效的整数类型。

但没有什么比检查生产的汇编器并进行基准测试更能证明这一点了。

编辑,包括不同的评论,在这里和其他回复中:

size_tptrdiff_t是C99中唯一规范的typedef,并且其中一个可以做出合理的假设,认为它们与架构相关。

标准整数类型有 5 种不同的可能等级(charshortintlong长长)。所有的力量都倾向于宽度类型为 8、16、32、64 以及在不久的将来 128。因此 int 将停留在 32 位。它的定义与平台上的效率无关,而只是受到宽度要求的限制。

I would guess that the types size_t (for an unsigned type) and ptrdiff_t (for a signed type) will usually correspond to quite efficient integer types on any given platform.

But nothing can prove that than inspecting the produced assembler and to do benchmarks.

Edit, including the different comments, here and in other replies:

size_t and ptrdiff_t are the only typedefs that are normative in C99 and for which one may make a reasonable assumption that they are related to the architecture.

There are 5 different possible ranks for standard integer types (char, short, int, long, long long). All the forces go towards having types of width 8, 16, 32, 64 and in near future 128. As a consequence int will be stuck on 32 bit. Its definition will have nothing to do with efficiency on the platform, but just be constrained by that width requirement.

等你爱我 2024-09-25 23:19:02

对于所有现有的主流架构,long 是目前循环吞吐量最快的类型。

For all existing mainstream architectures long is the fastest type at present for loop throughput.

紫轩蝶泪 2024-09-25 23:19:02

如果您使用 gcc 进行编译,我建议使用 __builtin_ffs() 用于查找第一个位集:

内置函数:int __builtin_ffs (unsigned int x)
返回 1 加 x 的最低有效 1 位的索引,或者如果 x 为零,则返回零。

这将被编译成(通常是单个)本机汇编指令。

If you're compiling with gcc, i'd recommend using __builtin_ffs() for finding the first bit set:

Built-in Function: int __builtin_ffs (unsigned int x)
Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.

This will be compiled into (often a single) native assembly instruction.

悟红尘 2024-09-25 23:19:02

由于问题不完整,无法回答这个问题。作为一个类比,考虑以下问题:

最快的车辆是什么

布加迪威龙 ?当然很快,但不适合从伦敦到纽约。

这个问题缺少的是整数将被使用的上下文。在上面的原始示例中,我怀疑如果数组很大且稀疏,您会看到 8 位、32 位或 64 位值之间有很大差异,因为您会在 CPU 限制之前达到内存带宽限制。

要点是,体系结构没有定义各种整数类型的大小,而是编译器设计者定义的。设计师将仔细权衡给定架构的每种类型的各种尺寸的利弊,并选择最合适的。

我猜想在 64 位系统上选择 32 位 int 是因为对于大多数操作来说,使用 32 位 int 就足够了。由于内存带宽是一个限制因素,因此节省内存使用可能是最重要的因素。

It is not possible to answer this question since the question is incomplete. As an analogy, consider the question:

What is the fastest vehicle

A Bugatti Veyron? Certainly fast, but no good for going from London to New York.

What is missing from the question, is the context the integer will be used in. In the original example above, I doubt you'd see much difference between 8, 32 or 64 bit values if the array is large and sparse since you'll be hitting memory bandwidth limits before cpu limits.

The main point is, the architecture does not define what size the various integer types are, it's the compiler designer that does that. The designer will carefully weigh up the pros and cons for various sizes for each type for a given architecture and pick the most appropriate.

I guess the 32 bit int on the 64 bit system was chosen because for most operations ints are used for 32 bits are enough. Since memory bandwidth is a limiting factor, saving on memory use was probably the overriding factor.

つ可否回来 2024-09-25 23:19:02

让我们来读一下 C17 ISO 标准 必须说明 stdint.h 的“快速”类型:

7.20.1.3 最快的最小宽度整数类型

1 以下每种类型都指定一个通常运行速度最快的整数类型266)
在所有至少具有指定宽度的整数类型中。

<小时>

266)指定类型不保证在所有用途上都是最快的;如果实施没有明确的理由
选择一种类型而不是另一种类型,它只会选择一些满足符号性和宽度要求的整数类型。

在最广泛使用的x86-64架构上几乎所有整数指令的运行速度都相当快。因此,编译器通常会选择 intX_t 作为 int_fastX_t。事实上,如果您不需要超过 X 位,为什么要分配更多呢?换句话说,这里整数类型的“速度”是指它的内部CPU性能。

不过,您的情况主要是读取内存,并且对其进行很少的 CPU 处理。为此,使用具有内存总线宽度的整数类型更有意义。现在它是 64 位,因此类型是 int64_t。 C 标准没有“内存总线宽度”int 类型,因此,如果您尝试编写独立于体系结构的解决方案,intmax_t 是您的最佳选择。只要确保不要发生缓冲区溢出即可。

Let's read what the C17 ISO standard has to say about the "fast" types of stdint.h:

7.20.1.3 Fastest minimum-width integer types

1 Each of the following types designates an integer type that is usually fastest266) to operate with
among all integer types that have at least the specified width.


266)The designated type is not guaranteed to be fastest for all purposes; if the implementation has no clear grounds for
choosing one type over another, it will simply pick some integer type satisfying the signedness and width requirements.

On the most widely used x86-64 architecture almost all integer instructions run comparably fast. Therefore the compilers normally picks intX_t for int_fastX_t. Indeed, if you don't need more than X bits, why allocate more? In other words, the meant "speed" of the integer type here is its internal CPU performance.

Your case though mostly reads memory and does very little CPU processing on it. For this purpose using an integer type with the memory bus width makes more sense. Nowadays it's 64 bits, so int64_t is the type. The C standard doesn't have a type for "memory bus width" int, so if you were trying to write an architecture-independent solution, intmax_t is your best bet. Just make sure not to hit a buffer overflow.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文