对齐大小数组和非对齐大小数组的速度不同

发布于 2024-12-22 12:47:03 字数 814 浏览 2 评论 0原文

我尝试对对齐大小数组和非对齐大小数组进行操作,但结果令我困惑,非对齐大小数组比对齐大小数组更快,这是我的代码:

TimeMeter timeMeter;

const int N = 100000;

_Tp A[64];
_Tp B[65];

int szA = sizeof(A);
int szB = sizeof(B);

//  Method 1
timeMeter.start();
for ( int n = 0; n < N; n++ )
{
    memset(A, 0, szA);
}
timeMeter.stop();
printf("Method 1 Time usage = %f ms\n", timeMeter.span());

//  Method 2
timeMeter.start();
for ( int n = 0; n < N; n++ )
{
    memset(B, 0, szB);
}
timeMeter.stop();
printf("Method 2 Time usage = %f ms\n", timeMeter.span());
  • When _Tpchar (8):方法 1 花费 2.195ms,方法 2 花费 2.175ms
  • _Tpint (32):方法1 花费 13.313ms,方法 2 花费 5.987ms
  • _Tpdouble (64) 时:方法 1 花费 14.266ms,方法 2 花费 11.304ms

I try to operate on aligned-size array and non-aligned-size array, but the result is a puzzle me, the non-aligned-size array is faster than aligned-size array, This is my code:

TimeMeter timeMeter;

const int N = 100000;

_Tp A[64];
_Tp B[65];

int szA = sizeof(A);
int szB = sizeof(B);

//  Method 1
timeMeter.start();
for ( int n = 0; n < N; n++ )
{
    memset(A, 0, szA);
}
timeMeter.stop();
printf("Method 1 Time usage = %f ms\n", timeMeter.span());

//  Method 2
timeMeter.start();
for ( int n = 0; n < N; n++ )
{
    memset(B, 0, szB);
}
timeMeter.stop();
printf("Method 2 Time usage = %f ms\n", timeMeter.span());
  • When _Tp is char (8): Method 1 cost 2.195ms and Method 2 cost 2.175ms
  • When _Tp is int (32): Method 1 cost 13.313ms and Method 2 cost 5.987ms
  • When _Tp is double (64): Method 1 cost 14.266ms and Method 2 cost 11.304ms

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

眼角的笑意。 2024-12-29 12:47:03

您的基准测试无效有几个原因:

  1. 这里似乎根本没有检查对齐情况。您只需有两个不同大小的数组即可。另外,memset 也不会太关心对齐,因为它在字节级别工作。
  2. 正如ildjarn指出的,memset对这么小的内存量不是很好。它实在是太快了,但这本身并不是一个大问题…………
  3. 您没有使用您所设置的任何内存。优化器可以有效地消除对 memset 的除一次调用以外的所有调用。
  4. 由于您不使用任何内存,CPU 实际上可能会执行大量重新排序/缓存,特别是在循环之间。
  5. 您的基准测试的运行时间接近许多操作系统上的时间片大小(您没有说是哪一个,所以我猜测像许多 Linux 上一样,时间片为 1 毫秒)。这意味着操作系统切换开销可能会极大地改变测试结果。
  6. 你的数组是一个接一个地分配的。 CPU 倾向于预测顺序,因此这实际上可能会影响结果。尝试切换循环的顺序,看看是否会有所不同。
  7. 您没有表明您正在使用什么时间。许多计时器根本不具备毫秒精度测试所需的分辨率,因此结果可能会出现偏差。

Your benchmark is invalid for several reasons:

  1. Nothing here appears to check alignment at all. You simply have two different sizes of arrays. Plus memset also won't care much about alignment since it works at the byte level.
  2. As ildjarn pointed out, memset on such a small amount of memory is not very good. It is simply too fast, but that in itself is not a huge problem...
  3. ...you aren't using any of the memory that your are setting. An optimizer could validly eliminate all but one call to your memset.
  4. As you don't use any of the memory the CPU may actually be doing a lot of reordering/caching, in particular between the loops.
  5. Your benchmarks have a running time near the timeslice size on many OS's (you don't say which one, so I'll guess 1ms timeslice like on many Linuxes). That means OS switching overhead could be greatly altering the results of your test.
  6. Your arrays are allocated one after another. CPUs tend to predict ordering, thus this could actually affect the results. Try switching the orders of your loops and see if it makes a difference.
  7. You don't indicate what timing you are using. Many timers simply don't have the resolution needed for ms accuracy tests, so you could be getting a bias in the results.
沙与沫 2024-12-29 12:47:03

类型只需在自身内部对齐,即 char 必须在 1 字节边界上对齐,int 必须在 4 字节边界上对齐,而 double 必须在 8 字节边界上对齐。

要真正测试未对齐的访问,请尝试执行

_Tp* A = (_Tp*)((char*)(new char[num * sizeof(_Tp)]) + 1);

...

delete[] (_Tp*)((char*)A - 1);

此外,memset 将所有内容视为指向一系列 char 的指针,这些字符永远不会未对齐,因此无论您对数组,您无法让 memset 进行未对齐的写入。

Types only have to be aligned within themselves, i.e. char must be aligned on a 1-byte boundary, int must be aligned on a 4-byte boundary, and double must be aligned on an 8-byte boundary.

To really test unaligned accesses, try doing

_Tp* A = (_Tp*)((char*)(new char[num * sizeof(_Tp)]) + 1);

...

delete[] (_Tp*)((char*)A - 1);

Furthermore, memset treats everything like a pointer to a series of chars, which are never unaligned, so no matter what you do with the array you can't get memset to do an unaligned write.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文