SSE 访问违规
我有代码:
float *mu_x_ptr;
__m128 *tmp;
__m128 *mm_mu_x;
mu_x_ptr = _aligned_malloc(4*sizeof(float), 16);
mm_mu_x = (__m128*) mu_x_ptr;
for(row = 0; row < ker_size; row++) {
tmp = (__m128*) &original[row*width + col];
*mm_mu_x = _mm_add_ps(*tmp, *mm_mu_x);
}
由此我得到:
First-chance exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
Unhandled exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
The program '[4452] SSIM.exe: Native' has exited with code -1073741819 (0xc0000005)
运行程序时,错误发生在 _mm_add_ps 行。
原始是使用 _aligned_malloc(..., 16) 分配的;以及传递给函数,所以根据我对 sse 的理解,它不应该是不对齐的。
我想知道是否有人能明白为什么会崩溃,因为我不明白为什么。
编辑:宽度和列始终是 4 的倍数。列是 0 或 4,而宽度始终是 4 的倍数。
编辑2:看起来我的原始数组未对齐。不会:
function(float *original);
.
.
.
orignal = _aligned_malloc(width*height*sizeof(float), 16);
function(original);
_aligned_free(original);
}
确保原始内容在函数内部对齐吗?
Edit3:这实际上很奇怪。当我这样做时:
float *orig;
orig = _aligned_malloc(width*height*sizeof(float), 16);
assert(isAligned(orig));
断言失败
#define isAligned(p) (((unsigned long)(p)) & 15 == 0)
I have the code:
float *mu_x_ptr;
__m128 *tmp;
__m128 *mm_mu_x;
mu_x_ptr = _aligned_malloc(4*sizeof(float), 16);
mm_mu_x = (__m128*) mu_x_ptr;
for(row = 0; row < ker_size; row++) {
tmp = (__m128*) &original[row*width + col];
*mm_mu_x = _mm_add_ps(*tmp, *mm_mu_x);
}
From this I get:
First-chance exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
Unhandled exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
The program '[4452] SSIM.exe: Native' has exited with code -1073741819 (0xc0000005)
when running the program, the error occurs at the _mm_add_ps line.
original is allocated using _aligned_malloc(..., 16); as well and passed to the function, so it shouldn't, as far as my understanding of sse goes, be that it's not alligned.
I'm wondering if anyone can see why this crashes, since I can't see why.
EDIT: Width and col is always multiples of 4. Col is 0 or 4, while width is always a multiple of 4.
EDIT2: Looks like my original array is not aligned. Wouldn't:
function(float *original);
.
.
.
orignal = _aligned_malloc(width*height*sizeof(float), 16);
function(original);
_aligned_free(original);
}
Make sure that original is alligned inside of function?
Edit3: This is actually really weird. When I do:
float *orig;
orig = _aligned_malloc(width*height*sizeof(float), 16);
assert(isAligned(orig));
The assert fails with
#define isAligned(p) (((unsigned long)(p)) & 15 == 0)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为您需要使用
而不是
编辑:如果您在某些偏移之后出现访问冲突错误,则可能您的步幅未对齐。无论哪种方式分配 __m128 元素(代表 4 个浮点数)。这负责对齐。
此外,您还可以通过消除算术 [row * width + col] 来获得一些额外的性能。
确定您的步幅并相应地增加指针。
I think you need to use
instead of
EDIT: If you get access violation errors are after some offset then possibly your stride is not aligned. Either way allocate __m128 elements(which represent 4 floats). This takes care of the alignment.
Also you can get some extra performance by eliminating the arithmetic [row * width + col].
Determine your stride and increment your pointer accordingly.
除非
width
和col
具有合适的值,否则tmp
将不对齐。理想情况下,width
和col
都应该是 4 的倍数。您可能需要添加一些断言来检查对齐情况,例如
tmp
will be misaligned unlesswidth
andcol
have suitable values. Ideally bothwidth
andcol
should be multiples of 4.You might want to add some asserts to check the alignment, e.g.