与缓存行对齐并了解缓存行大小
为了防止错误共享,我想将数组的每个元素与缓存行对齐。因此,首先我需要知道缓存行的大小,因此我为每个元素分配相应的字节数。其次,我希望数组的开头与缓存行对齐。
我使用的是Linux和8核x86平台。首先,如何找到缓存行大小。其次,如何与 C 中的缓存行对齐。我正在使用 gcc 编译器。
因此,结构将如下,例如,假设高速缓存行大小为 64。
element[0] occupies bytes 0-63
element[1] occupies bytes 64-127
element[2] occupies bytes 128-191
依此类推,当然假设 0-63 与高速缓存行对齐。
To prevent false sharing, I want to align each element of an array to a cache line. So first I need to know the size of a cache line, so I assign each element that amount of bytes. Secondly I want the start of the array to be aligned to a cache line.
I am using Linux and 8-core x86 platform. First how do I find the cache line size. Secondly, how do I align to a cache line in C. I am using the gcc compiler.
So the structure would be following for example, assuming a cache line size of 64.
element[0] occupies bytes 0-63
element[1] occupies bytes 64-127
element[2] occupies bytes 128-191
and so on, assuming of-course that 0-63 is aligned to a cache line.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
将值作为宏定义传递给编译器。
在运行时
sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
可用于获取 L1 缓存大小。C++17 提供
std::hardware_delta_interference_size
这是两个对象之间避免错误共享的最小偏移量,从这个定义来看,它应该正是getconf LEVEL1_DCACHE_LINESIZE
输出。Pass the value as a macro definition to the compiler.
At run-time
sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
can be used to get L1 cache size.C++17 provides
std::hardware_destructive_interference_size
which is minimum offset between two objects to avoid false sharing and from this definition it should be exactly whatgetconf LEVEL1_DCACHE_LINESIZE
outputs.要了解尺寸,您需要使用处理器的文档进行查找,据我所知,没有编程方法可以做到这一点。然而,从好的方面来说,大多数缓存行都是基于英特尔标准的标准大小。在 x86 上,缓存行为 64 字节,但是,为了防止错误共享,您需要遵循目标处理器的指导原则(英特尔对其基于 netburst 的处理器有一些特殊说明),通常您需要为此对齐到 64 字节(英特尔指出您还应该避免跨越 16 字节边界)。
要在 C 或 C++ 中执行此操作,需要使用标准
aligned_alloc
函数或编译器特定说明符之一,例如__attribute__((aligned(64)))
或__declspec(align(64))
。要在结构中的成员之间填充以将它们拆分到不同的缓存行,您需要插入一个足够大的成员以将其与下一个 64 字节边界对齐To know the sizes, you need to look it up using the documentation for the processor, afaik there is no programatic way to do it. On the plus side however, most cache lines are of a standard size, based on intels standards. On x86 cache lines are 64 bytes, however, to prevent false sharing, you need to follow the guidelines of the processor you are targeting (intel has some special notes on its netburst based processors), generally you need to align to 64 bytes for this (intel states that you should also avoid crossing 16 byte boundries).
To do this in C or C++ requires that you use the standard
aligned_alloc
function or one of the compiler specific specifiers such as__attribute__((aligned(64)))
or__declspec(align(64))
. To pad between members in a struct to split them onto different cache lines, you need on insert a member big enough to align it to the next 64 byte boundery另一种简单的方法是只查看 /proc/cpuinfo:
Another simple way is to just cat the /proc/cpuinfo:
没有完全可移植的方法来获取缓存行大小。但如果您使用的是 x86/64,则可以调用 cpuid 指令来获取您需要了解的有关缓存的所有信息 - 包括大小、缓存行大小、多少级等...
http://softpixel.com/~cwright/programming/simd/cpuid.php
(向下滚动一点位,该页面是关于 SIMD 的,但它有一个获取缓存行的部分。)
至于对齐数据结构,也没有完全可移植的方法来做到这一点。 GCC 和 VS10 有不同的方法来指定结构的对齐方式。
“破解”它的一种方法是用未使用的变量填充结构,直到它与您想要的对齐方式匹配。
为了对齐你的 malloc(),所有主流编译器也为此目的对齐了 malloc 函数。
There's no completely portable way to get the cacheline size. But if you're on x86/64, you can call the
cpuid
instruction to get everything you need to know about the cache - including size, cacheline size, how many levels, etc...http://softpixel.com/~cwright/programming/simd/cpuid.php
(scroll down a little bit, the page is about SIMD, but it has a section getting the cacheline.)
As for aligning your data structures, there's also no completely portable way to do it. GCC and VS10 have different ways to specify alignment of a struct.
One way to "hack" it is to pad your struct with unused variables until it matches the alignment you want.
To align your mallocs(), all the mainstream compilers also have aligned malloc functions for that purpose.
posix_memalign 或 valloc 可用于对齐分配的内存到缓存行。
posix_memalign or valloc can be used to align allocated memory to a cache line.
这是我制作的表格,上面有大多数 Arm/Intel 处理器。您可以在定义常量时使用它作为参考,这样您就不必概括所有体系结构的缓存行大小。
对于 C++,希望我们很快就会看到硬件接口大小,它应该是获取此信息的准确方法(假设您告诉编译器您的目标架构)。
Here's a table I made that has most Arm/Intel processors on it. You can use it for reference when defining constants, that way you don't have to generalize the cache line size for all architectures.
For C++, hopefully, we will soon see hardware interface size which should be an accurate way to get this information (assuming you tell the compiler your target architecture).
如果有人对如何在 C++ 中轻松完成此操作感到好奇,我已经构建了一个带有。
CacheAligned
类的库,该类负责确定缓存行大小以及的对齐方式T
对象,通过在CacheAligned
对象上调用.Ref()
来引用。如果您事先知道缓存行大小,或者只想坚持使用非常常见的值 64(字节),您也可以使用 Alignedhttps://github.com/NickStrupat/Aligned
If anyone is curious about how to do this easily in C++, I've built a library with a
CacheAligned<T>
class which handles determining the cache line size as well as the alignment for yourT
object, referenced by calling.Ref()
on yourCacheAligned<T>
object. You can also useAligned<typename T, size_t Alignment>
if you know the cache line size beforehand, or just want to stick with the very common value of 64 (bytes).https://github.com/NickStrupat/Aligned