SSE 访问违规

发布于 2024-09-12 10:06:37 字数 1315 浏览 7 评论 0原文

我有代码:

float *mu_x_ptr;
__m128 *tmp;
__m128 *mm_mu_x;

mu_x_ptr = _aligned_malloc(4*sizeof(float), 16);
mm_mu_x = (__m128*) mu_x_ptr;
for(row = 0; row < ker_size; row++) {
    tmp = (__m128*) &original[row*width + col];
    *mm_mu_x = _mm_add_ps(*tmp, *mm_mu_x);
}

由此我得到:

First-chance exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
Unhandled exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
The program '[4452] SSIM.exe: Native' has exited with code -1073741819 (0xc0000005)

运行程序时,错误发生在 _mm_add_ps 行。

原始是使用 _aligned_malloc(..., 16) 分配的;以及传递给函数,所以根据我对 sse 的理解,它不应该是不对齐的。

我想知道是否有人能明白为什么会崩溃,因为我不明白为什么。

编辑:宽度和列始终是 4 的倍数。列是 0 或 4,而宽度始终是 4 的倍数。

编辑2:看起来我的原始数组未对齐。不会:

function(float *original);
.
.
.
    orignal = _aligned_malloc(width*height*sizeof(float), 16);
    function(original);
    _aligned_free(original);
}

确保原始内容在函数内部对齐吗?

Edit3:这实际上很奇怪。当我这样做时:

float *orig;
orig = _aligned_malloc(width*height*sizeof(float), 16);
assert(isAligned(orig));

断言失败

#define isAligned(p) (((unsigned long)(p)) & 15 == 0)

I have the code:

float *mu_x_ptr;
__m128 *tmp;
__m128 *mm_mu_x;

mu_x_ptr = _aligned_malloc(4*sizeof(float), 16);
mm_mu_x = (__m128*) mu_x_ptr;
for(row = 0; row < ker_size; row++) {
    tmp = (__m128*) &original[row*width + col];
    *mm_mu_x = _mm_add_ps(*tmp, *mm_mu_x);
}

From this I get:

First-chance exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
Unhandled exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
The program '[4452] SSIM.exe: Native' has exited with code -1073741819 (0xc0000005)

when running the program, the error occurs at the _mm_add_ps line.

original is allocated using _aligned_malloc(..., 16); as well and passed to the function, so it shouldn't, as far as my understanding of sse goes, be that it's not alligned.

I'm wondering if anyone can see why this crashes, since I can't see why.

EDIT: Width and col is always multiples of 4. Col is 0 or 4, while width is always a multiple of 4.

EDIT2: Looks like my original array is not aligned. Wouldn't:

function(float *original);
.
.
.
    orignal = _aligned_malloc(width*height*sizeof(float), 16);
    function(original);
    _aligned_free(original);
}

Make sure that original is alligned inside of function?

Edit3: This is actually really weird. When I do:

float *orig;
orig = _aligned_malloc(width*height*sizeof(float), 16);
assert(isAligned(orig));

The assert fails with

#define isAligned(p) (((unsigned long)(p)) & 15 == 0)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

风柔一江水 2024-09-19 10:06:37

我认为您需要使用

__m128 tmp = _mm_load_ps( &original[row * width + col] );

而不是

tmp = (__m128 *)&original[row * width + col];

编辑:如果您在某些偏移之后出现访问冲突错误,则可能您的步幅未对齐。无论哪种方式分配 __m128 元素(代表 4 个浮点数)。这负责对齐。

此外,您还可以通过消除算术 [row * width + col] 来获得一些额外的性能。
确定您的步幅并相应地增加指针。

I think you need to use

__m128 tmp = _mm_load_ps( &original[row * width + col] );

instead of

tmp = (__m128 *)&original[row * width + col];

EDIT: If you get access violation errors are after some offset then possibly your stride is not aligned. Either way allocate __m128 elements(which represent 4 floats). This takes care of the alignment.

Also you can get some extra performance by eliminating the arithmetic [row * width + col].
Determine your stride and increment your pointer accordingly.

悟红尘 2024-09-19 10:06:37

除非 widthcol 具有合适的值,否则 tmp 将不对齐。理想情况下,widthcol 都应该是 4 的倍数。

您可能需要添加一些断言来检查对齐情况,例如

#define IsAligned(p) ((((unsigned long)(p)) & 15) == 0)

float *mu_x_ptr;
__m128 *tmp;
__m128 *mm_mu_x;

assert(original != NULL && IsAligned(original));
mu_x_ptr = _aligned_malloc(4 * sizeof(float), 16);
assert(mu_x_ptr != NULL && IsAligned(mu_x_ptr));
mm_mu_x = (__m128 *)mu_x_ptr;
assert(IsAligned(mm_mu_x));
for (row = 0; row < ker_size; row++)
{
    tmp = (__m128 *)&original[row * width + col];
    assert(IsAligned(tmp));
    *mm_mu_x = _mm_add_ps(*tmp, *mm_mu_x);
}

tmp will be misaligned unless width and col have suitable values. Ideally both width and col should be multiples of 4.

You might want to add some asserts to check the alignment, e.g.

#define IsAligned(p) ((((unsigned long)(p)) & 15) == 0)

float *mu_x_ptr;
__m128 *tmp;
__m128 *mm_mu_x;

assert(original != NULL && IsAligned(original));
mu_x_ptr = _aligned_malloc(4 * sizeof(float), 16);
assert(mu_x_ptr != NULL && IsAligned(mu_x_ptr));
mm_mu_x = (__m128 *)mu_x_ptr;
assert(IsAligned(mm_mu_x));
for (row = 0; row < ker_size; row++)
{
    tmp = (__m128 *)&original[row * width + col];
    assert(IsAligned(tmp));
    *mm_mu_x = _mm_add_ps(*tmp, *mm_mu_x);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文