检测目标 CPU 上的对齐内存要求

发布于 2025-01-06 22:58:46 字数 439 浏览 1 评论 0原文

我目前正在尝试构建一个可以在多种机器上运行的代码,从手持口袋和传感器到数据中心的大型服务器。

这些架构之间的(许多)差异之一是对齐内存访问的要求。

“标准”x86 CPU 不需要对齐内存访问,但许多其他 CPU 需要它,如果不遵守规则,就会产生异常。

到目前为止,我一直在通过使用 Packed 属性(或 pragma)强制编译器对已知有风险的特定数据访问保持谨慎来处理它。而且效果很好。

问题是,编译器非常谨慎,以至于在此过程中损失了大量性能。

由于性能很重要,因此我们最好重写部分代码以专门在严格对齐的 cpu 上工作。另一方面,此类代码在支持未对齐内存访问(例如 x86)的 cpu 上速度会较慢,因此我们希望在需要严格对齐内存访问的 cpu 上使用它。

现在的问题是: 如何在编译时检测目标体系结构需要严格对齐的内存访问? (或者反过来)

I'm currently trying to build a code which is supposed to work on a wide range of machines, from handheld pockets and sensors to big servers in data centers.

One of the (many) differences between these architectures is the requirement for aligned memory access.

Aligned memory access is not required on "standard" x86 CPU, but many other CPU need it and produce an exception if the rule is not respected.

Up to now, i've been dealing with it by forcing the compiler to be cautious on specific data accesses which are known to be risky, using the packed attribute (or pragma). And it works fine.

The problem is, the compiler is so cautious that a lot of performance is lost in the process.

Since performance is important, we would be better of to rewrite some portion of the code to specifically work on strict-aligned cpus. Such code would, on the other hand, be slower on cpus which support unaligned memory access (such as x86), so we want to use it only on cpus which require strict-aligned memory access.

And now the question :
how to detect, at compile time, that the target architecture requires strict-aligned memory access ? (or the other way round)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

最偏执的依靠 2025-01-13 22:58:46

据我所知,没有任何 C 实现提供任何预处理器宏来帮助您解决这个问题。由于您的代码应该可以在各种机器上运行,因此我假设您可以访问各种机器进行测试,因此您可以通过测试程序找出答案。然后您可以编写自己的宏,如下所示:

#if defined(__sparc__)
/* Unaligned access will crash your app on a SPARC */
#define ALIGN_ACCESS 1
#elif defined(__ppc__) || defined(__POWERPC__) || defined(_M_PPC)
/* Unaligned access is too slow on a PowerPC (maybe?) */
#define ALIGN_ACCESS 1
#elif defined(__i386__) || defined(__x86_64__) || \
      defined(_M_IX86) || defined(_M_X64)
/* x86 / x64 are fairly forgiving */
#define ALIGN_ACCESS 0
#else
#warning "Unsupported architecture"
#define ALIGN_ACCESS 1
#endif

请注意,未对齐访问的速度将取决于它跨越的边界。例如,如果访问跨越 4k 页边界,速度会慢很多,并且可能还有其他边界导致速度更慢。即使在 x86 上,一些未对齐的访问也不会由处理器处理,而是由操作系统内核处理。这慢得令人难以置信。

也不能保证未来(或当前)的实现不会突然改变未对齐访问的性能特征。这已经发生在过去并且可能在将来发生; PowerPC 601 非常宽容未对齐的访问,但 PowerPC 603e 则不然。

使事情变得更加复杂的是,您编写的用于进行未对齐访问的代码在跨平台的实现上会有所不同。例如,在 PowerPC 上,它被简化为 x << 32x>>如果 x 是 32 位,则 32 始终为 0,但在 x86 上则没有这样的运气。

No C implementation that I know of provides any preprocessor macro to help you figure this out. Since your code supposedly runs on a wide range of machines, I assume that you have access to a wide variety of machines for testing, so you can figure out the answer with a test program. Then you can write your own macro, something like below:

#if defined(__sparc__)
/* Unaligned access will crash your app on a SPARC */
#define ALIGN_ACCESS 1
#elif defined(__ppc__) || defined(__POWERPC__) || defined(_M_PPC)
/* Unaligned access is too slow on a PowerPC (maybe?) */
#define ALIGN_ACCESS 1
#elif defined(__i386__) || defined(__x86_64__) || \
      defined(_M_IX86) || defined(_M_X64)
/* x86 / x64 are fairly forgiving */
#define ALIGN_ACCESS 0
#else
#warning "Unsupported architecture"
#define ALIGN_ACCESS 1
#endif

Note that the speed of an unaligned access will depend on the boundaries which it crosses. For example, if the access crosses a 4k page boundary it will be much slower, and there may be other boundaries which cause it to be slower still. Even on x86, some unaligned accesses are not handled by the processor and are instead handled by the OS kernel. That is incredibly slow.

There is also no guarantee that a future (or current) implementation will not suddenly change the performance characteristics of unaligned accesses. This has happened in the past and may happen in the future; the PowerPC 601 was very forgiving of unaligned access but the PowerPC 603e was not.

Complicating things even further is the fact that the code you'd write to make an unaligned access would differ in implementation across platforms. For example, on PowerPC it's simplified by the fact that x << 32 and x >> 32 are always 0 if x is 32 bits, but on x86 you have no such luck.

坐在坟头思考人生 2025-01-13 22:58:46

无论如何,编写严格的内存对齐代码是一个好主意。即使在允许未对齐访问的 x86 系统上,未对齐的读/写也会导致两次内存访问,并且会损失一些性能。编写适用于所有 CPU 架构的高效代码并不困难。要记住的简单规则是指针必须与您正在读取或写入的对象的大小对齐。例如,如果写入 DWORD,则 (dest_pointer & 3 == 0)。使用诸如“UNALIGNED_PTR”类型之类的拐杖将导致编译器生成低效的代码。如果您有大量必须立即运行的遗留代码,那么使用编译器来“修复”这种情况是有意义的,但如果这是您的代码,那么从一开始就编写它以在所有系统上运行。

Writing your code for strict memory alignment is a good idea anyway. Even on x86 systems which allow unaligned access, your unaligned reads/writes will cause two memory accesses and some performance will be lost. It's not difficult to write efficient code which works on all CPU architectures. The simple rule to remember is that the pointer must be aligned to the size of the object you're reading or writing. e.g. if writing a DWORD, then (dest_pointer & 3 == 0). Using a crutch such as "UNALIGNED_PTR" types will cause the compiler to generate inefficient code. If you've got a large amount of legacy code that must work immediately, then it makes sense to use the compiler to "fix" the situation, but if it's your code, then write it from the start to work on all systems.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文