是否有一个标准宏来检测需要对齐内存访问的架构?
假设类似:
void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
unsigned int i;
for(i=0; i<len; i++)
{
dest[i] = src[i] & mask[i];
}
}
我可以通过编写类似以下内容在非对齐访问机器(例如x86)上运行得更快:
void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
unsigned int i;
unsigned int wordlen = len >> 2;
for(i=0; i<wordlen; i++)
{
((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; // this raises SIGBUS on SPARC and other archs that require aligned access.
}
for(i=wordlen<<2; i<len; i++){
dest[i] = src[i] & mask[i];
}
}
但是它需要构建在多种体系结构上,所以我想做类似的事情:
void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
unsigned int i;
unsigned int wordlen = len >> 2;
#if defined(__ALIGNED2__) || defined(__ALIGNED4__) || defined(__ALIGNED8__)
// go slow
for(i=0; i<len; i++)
{
dest[i] = src[i] & mask[i];
}
#else
// go fast
for(i=0; i<wordlen; i++)
{
// the following line will raise SIGBUS on SPARC and other archs that require aligned access.
((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i];
}
for(i=wordlen<<2; i<len; i++){
dest[i] = src[i] & mask[i];
}
#endif
}
但我找不到有关编译器的任何好的信息定义的宏(就像我上面假设的__ALIGNED4__
)指定对齐或使用预处理器确定目标架构对齐的任何巧妙方法。我可以测试 define (__SVR4) &&定义 (__sun)
,但我更喜欢能够在需要对齐内存访问的其他体系结构上正常工作的TM。
Assuming something like:
void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
unsigned int i;
for(i=0; i<len; i++)
{
dest[i] = src[i] & mask[i];
}
}
I can go faster on a non-aligned access machine (e.g. x86) by writing something like:
void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
unsigned int i;
unsigned int wordlen = len >> 2;
for(i=0; i<wordlen; i++)
{
((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; // this raises SIGBUS on SPARC and other archs that require aligned access.
}
for(i=wordlen<<2; i<len; i++){
dest[i] = src[i] & mask[i];
}
}
However it needs to build on several architectures so I would like to do something like:
void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
unsigned int i;
unsigned int wordlen = len >> 2;
#if defined(__ALIGNED2__) || defined(__ALIGNED4__) || defined(__ALIGNED8__)
// go slow
for(i=0; i<len; i++)
{
dest[i] = src[i] & mask[i];
}
#else
// go fast
for(i=0; i<wordlen; i++)
{
// the following line will raise SIGBUS on SPARC and other archs that require aligned access.
((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i];
}
for(i=wordlen<<2; i<len; i++){
dest[i] = src[i] & mask[i];
}
#endif
}
But I cannot find any good information on compiler defined macros (like my hypothetical __ALIGNED4__
above) that specify alignment or any clever ways of using the pre-processor to determine target architecture alignment. I could just test defined (__SVR4) && defined (__sun)
, but I would prefer something that will Just WorkTM on other architectures requiring aligned memory accesses.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
虽然 x86 默默地修复了未对齐的访问,但这对于性能来说并不是最佳的。通常最好假设某种对齐方式并自行执行修复:
另外,请查看 SIMD 指令。
While x86 silently fixes up unaligned accesses, this is hardly optimal for performance. It is usually best to assume a certain alignment and perform fixups yourself:
Also, take a look at SIMD instructions.
标准方法是使用一个配置脚本来运行程序来测试对齐问题。如果测试程序没有崩溃,配置脚本会在生成的配置标头中定义一个宏,以实现更快的实现。默认情况下更安全的实现。
The standard approach would be to have a
configure
script that runs a program to test for alignment issues. If the test program doesn't crash, the configure script defines a macro in a generated config header that allows for the faster implementation. The safer implementation is the default.(我觉得很奇怪,当真正通勤时,你有
src
和mask
。我将mask_bytes
重命名为memand
。但无论如何......)另一种选择是使用利用 C 中类型的不同函数。例如:
这样您就可以让程序员决定。
(I find it weird that you have
src
andmask
when really these commute. I renamedmask_bytes
tomemand
. But anyways...)Another options is to use different functions that take advantage of types in C. For instance:
This way you let the programmer decide.