检测目标 CPU 上的对齐内存要求

发布于 2025-01-06 22:58:46 字数 439 浏览 4 评论 0原文

我目前正在尝试构建一个可以在多种机器上运行的代码，从手持口袋和传感器到数据中心的大型服务器。

这些架构之间的（许多）差异之一是对齐内存访问的要求。

“标准”x86 CPU 不需要对齐内存访问，但许多其他 CPU 需要它，如果不遵守规则，就会产生异常。

到目前为止，我一直在通过使用 Packed 属性（或 pragma）强制编译器对已知有风险的特定数据访问保持谨慎来处理它。而且效果很好。

问题是，编译器非常谨慎，以至于在此过程中损失了大量性能。

由于性能很重要，因此我们最好重写部分代码以专门在严格对齐的 cpu 上工作。另一方面，此类代码在支持未对齐内存访问（例如 x86）的 cpu 上速度会较慢，因此我们希望仅在需要严格对齐内存访问的 cpu 上使用它。

现在的问题是：如何在编译时检测目标体系结构需要严格对齐的内存访问？（或者反过来）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最偏执的依靠 2025-01-13 22:58:46

据我所知，没有任何 C 实现提供任何预处理器宏来帮助您解决这个问题。由于您的代码应该可以在各种机器上运行，因此我假设您可以访问各种机器进行测试，因此您可以通过测试程序找出答案。然后您可以编写自己的宏，如下所示：

#if defined(__sparc__)
/* Unaligned access will crash your app on a SPARC */
#define ALIGN_ACCESS 1
#elif defined(__ppc__) || defined(__POWERPC__) || defined(_M_PPC)
/* Unaligned access is too slow on a PowerPC (maybe?) */
#define ALIGN_ACCESS 1
#elif defined(__i386__) || defined(__x86_64__) || \
      defined(_M_IX86) || defined(_M_X64)
/* x86 / x64 are fairly forgiving */
#define ALIGN_ACCESS 0
#else
#warning "Unsupported architecture"
#define ALIGN_ACCESS 1
#endif

请注意，未对齐访问的速度将取决于它跨越的边界。例如，如果访问跨越 4k 页边界，速度会慢很多，并且可能还有其他边界导致速度更慢。即使在 x86 上，一些未对齐的访问也不会由处理器处理，而是由操作系统内核处理。这慢得令人难以置信。

也不能保证未来（或当前）的实现不会突然改变未对齐访问的性能特征。这已经发生在过去并且可能在将来发生； PowerPC 601 非常宽容未对齐的访问，但 PowerPC 603e 则不然。

使事情变得更加复杂的是，您编写的用于进行未对齐访问的代码在跨平台的实现上会有所不同。例如，在 PowerPC 上，它被简化为 x << 32 和 x>>如果 x 是 32 位，则 32 始终为 0，但在 x86 上则没有这样的运气。

No C implementation that I know of provides any preprocessor macro to help you figure this out. Since your code supposedly runs on a wide range of machines, I assume that you have access to a wide variety of machines for testing, so you can figure out the answer with a test program. Then you can write your own macro, something like below:

#if defined(__sparc__)
/* Unaligned access will crash your app on a SPARC */
#define ALIGN_ACCESS 1
#elif defined(__ppc__) || defined(__POWERPC__) || defined(_M_PPC)
/* Unaligned access is too slow on a PowerPC (maybe?) */
#define ALIGN_ACCESS 1
#elif defined(__i386__) || defined(__x86_64__) || \
      defined(_M_IX86) || defined(_M_X64)
/* x86 / x64 are fairly forgiving */
#define ALIGN_ACCESS 0
#else
#warning "Unsupported architecture"
#define ALIGN_ACCESS 1
#endif

Note that the speed of an unaligned access will depend on the boundaries which it crosses. For example, if the access crosses a 4k page boundary it will be much slower, and there may be other boundaries which cause it to be slower still. Even on x86, some unaligned accesses are not handled by the processor and are instead handled by the OS kernel. That is incredibly slow.

There is also no guarantee that a future (or current) implementation will not suddenly change the performance characteristics of unaligned accesses. This has happened in the past and may happen in the future; the PowerPC 601 was very forgiving of unaligned access but the PowerPC 603e was not.

Complicating things even further is the fact that the code you'd write to make an unaligned access would differ in implementation across platforms. For example, on PowerPC it's simplified by the fact that x << 32 and x >> 32 are always 0 if x is 32 bits, but on x86 you have no such luck.

回复收藏 0 原文

坐在坟头思考人生 2025-01-13 22:58:46

无论如何，编写严格的内存对齐代码是一个好主意。即使在允许未对齐访问的 x86 系统上，未对齐的读/写也会导致两次内存访问，并且会损失一些性能。编写适用于所有 CPU 架构的高效代码并不困难。要记住的简单规则是指针必须与您正在读取或写入的对象的大小对齐。例如，如果写入 DWORD，则 (dest_pointer & 3 == 0)。使用诸如“UNALIGNED_PTR”类型之类的拐杖将导致编译器生成低效的代码。如果您有大量必须立即运行的遗留代码，那么使用编译器来“修复”这种情况是有意义的，但如果这是您的代码，那么从一开始就编写它以在所有系统上运行。

回复收藏 0 原文

~没有更多了~