STREAM 和 GUPS（单 CPU）基准测试可以在 NUMA 机器中使用非本地内存吗

发布于 2024-08-26 05:03:42 字数 1526 浏览 8 评论 0 原文

我想从 HPCC、STREAM 和 GUPS 运行一些测试。

他们将测试内存带宽、延迟和吞吐量（随机访问方面）。

我可以在启用内存交错的 NUMA 节点上启动单 CPU 测试 STREAM 或单 CPU GUPS 吗？（HPCC - 高性能计算挑战赛的规则允许吗？）

使用非本地内存可以增加 GUPS 结果，因为它将增加 2 或 4 倍的内存库数量，可用于随机访问。（GUPS 通常受到非理想内存子系统和缓慢的内存库打开/关闭的限制。随着内存库的增加，它可以更新一个库，而其他库正在打开/关闭。）

谢谢。

更新：

（您也不可以重新排序程序进行的内存访问）。

但是编译器可以重新排序循环嵌套吗？例如 hpcc/RandomAccess.c

  /* Perform updates to main table.  The scalar equivalent is:
   *
   *     u64Int ran;
   *     ran = 1;
   *     for (i=0; i<NUPDATE; i++) {
   *       ran = (ran << 1) ^ (((s64Int) ran < 0) ? POLY : 0);
   *       table[ran & (TableSize-1)] ^= stable[ran >> (64-LSTSIZE)];
   *     }
   */
  for (j=0; j<128; j++)
    ran[j] = starts ((NUPDATE/128) * j);
  for (i=0; i<NUPDATE/128; i++) {
/* #pragma ivdep */
    for (j=0; j<128; j++) {
      ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0);
      Table[ran[j] & (TableSize-1)] ^= stable[ran[j] >> (64-LSTSIZE)];
    }
  }

这里的主循环是 for (i=0; i ，嵌套循环是 for (j=0; j<128 ; j++) {.使用“循环交换”优化，编译器可以将此代码转换为

for (j=0; j<128; j++) {
  for (i=0; i<NUPDATE/128; i++) {
      ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0);
      Table[ran[j] & (TableSize-1)] ^= stable[ran[j] >> (64-LSTSIZE)];
  }
}

它可以完成，因为此循环嵌套是完美的循环嵌套。 HPCC规则禁止这样的优化吗？

原文

I want to run some tests from HPCC, STREAM and GUPS.

They will test memory bandwidth, latency, and throughput (in term of random accesses).

Can I start Single CPU test STREAM or Single CPU GUPS on NUMA node with memory interleaving enabled? (Is it allowed by the rules of HPCC - High Performance Computing Challenge?)

Usage of non-local memory can increase GUPS results, because it will increase 2- or 4- fold the number of memory banks, available for random accesses. (GUPS typically limited by nonideal memory-subsystem and by slow memory bank opening/closing. With more banks it can do update to one bank, while the other banks are opening/closing.)

Thanks.

UPDATE:

(you may nor reorder the memory accesses that the program makes).

But can compiler reorder loops nesting? E.g. hpcc/RandomAccess.c

  /* Perform updates to main table.  The scalar equivalent is:
   *
   *     u64Int ran;
   *     ran = 1;
   *     for (i=0; i<NUPDATE; i++) {
   *       ran = (ran << 1) ^ (((s64Int) ran < 0) ? POLY : 0);
   *       table[ran & (TableSize-1)] ^= stable[ran >> (64-LSTSIZE)];
   *     }
   */
  for (j=0; j<128; j++)
    ran[j] = starts ((NUPDATE/128) * j);
  for (i=0; i<NUPDATE/128; i++) {
/* #pragma ivdep */
    for (j=0; j<128; j++) {
      ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0);
      Table[ran[j] & (TableSize-1)] ^= stable[ran[j] >> (64-LSTSIZE)];
    }
  }

The main loop here is for (i=0; i<NUPDATE/128; i++) { and the nested loop is for (j=0; j<128; j++) {. Using 'loop interchange' optimization, compiler can convert this code to

for (j=0; j<128; j++) {
  for (i=0; i<NUPDATE/128; i++) {
      ran[j] = (ran[j] << 1) ^ ((s64Int) ran[j] < 0 ? POLY : 0);
      Table[ran[j] & (TableSize-1)] ^= stable[ran[j] >> (64-LSTSIZE)];
  }
}

It can be done because this loop nest is perfect loop nest. Is such optimization prohibited by rules of HPCC?

分享到QQ

分享到微博