二维阵列两个方向之间的性能测试

发布于 2024-12-08 19:45:02 字数 488 浏览 0 评论 0原文

此代码 (A) 的执行速度比第二个代码快得多（10 倍）：

for(int w=0; w<width; w++) {
        for(int h=1; h<height; h++) {
            image[h][w] = (1-a)*image[h][w] + a*image[h-1][w];
        }
    }

第二个代码：

for(int h=0; h<height; h++) {
        for(int w=1; w<width; w++) {
            image[h][w] = (1-a)*image[h][w] + a*image[h][w-1];
        }
    }

为什么会这样？无论是水平方向还是垂直方向遍历图像中的所有像素都是一样的。

有没有办法加快第二个速度？

提前致谢。

原文

This code (A) executes much faster (10 times) then the second one:

for(int w=0; w<width; w++) {
        for(int h=1; h<height; h++) {
            image[h][w] = (1-a)*image[h][w] + a*image[h-1][w];
        }
    }

Second one:

for(int h=0; h<height; h++) {
        for(int w=1; w<width; w++) {
            image[h][w] = (1-a)*image[h][w] + a*image[h][w-1];
        }
    }

Why is that? it is the same to go through all of the pixels in the image in either horizontal or vertical direction.

Is there a way to speed up the second one?

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初见你 2024-12-15 19:45:02

这与引用位置有关。如果按照元素存储在内存中的顺序访问元素，这将比以跨步模式访问它们要快得多，因为内存缓存和内存带宽将得到更有效的利用。

上面解释了第二个版本比第一个版本更快，这正是我的盒子上发生的情况：

aix@aix:~$ time ./ver1
real    0m29.421s

aix@aix:~$ time ./ver2
real    0m2.198s

这是我用来分配数组的代码：

  double a = 0.5;
  int width = 2048;
  int height = 2048;
  double* data = new double[height * width];
  double** image = new double*[height];
  for (int i = 0; i < height; i++) {
    image[i] = data + i * width;
  }

版本 1 乘以以下循环：

  for (int iter = 0; iter < 100; iter++) {
    for(int w=0; w<width; w++) {
      for(int h=1; h<height; h++) {
        image[h][w] = (1-a)*image[h][w] + a*image[h-1][w];
      }
    }
  }

版本 2 循环：

  for (int iter = 0; iter < 100; iter++) {
    for(int h=0; h<height; h++) {
      for(int w=1; w<width; w++) {
        image[h][w] = (1-a)*image[h][w] + a*image[h][w-1];
      }
    }
  }

用 g++ 4.4.3 与 -O3 并在某种描述的 Xeon 机器上运行（64 位 Ubuntu）。

如果您仍然 100% 确定您看到了相反效果，那么与我正在做的事情相比，您所做的事情一定有根本上的不同。如果您告诉我们图像的尺寸以及它的准确分配方式（以帮助建立内存布局），这可能会有所帮助。

This has to do with locality of reference. If you access the elements in the same order as they are stored in memory, this will be much faster than accessing them in a strided pattern, as the memory caches and memory bandwidth will be utilized far more effectively.

The above would explain the second version being faster than the first one, and this is exactly what happens on my box:

aix@aix:~$ time ./ver1
real    0m29.421s

aix@aix:~$ time ./ver2
real    0m2.198s

Here is the code I use to allocate the array:

  double a = 0.5;
  int width = 2048;
  int height = 2048;
  double* data = new double[height * width];
  double** image = new double*[height];
  for (int i = 0; i < height; i++) {
    image[i] = data + i * width;
  }

Version 1 times the following loop:

  for (int iter = 0; iter < 100; iter++) {
    for(int w=0; w<width; w++) {
      for(int h=1; h<height; h++) {
        image[h][w] = (1-a)*image[h][w] + a*image[h-1][w];
      }
    }
  }

Version 2 loop:

  for (int iter = 0; iter < 100; iter++) {
    for(int h=0; h<height; h++) {
      for(int w=1; w<width; w++) {
        image[h][w] = (1-a)*image[h][w] + a*image[h][w-1];
      }
    }
  }

Compiled with g++ 4.4.3 with -O3 and run on a Xeon box of some description (64-bit Ubuntu).

If you are still 100% sure that you're seeing the opposite effect, there must be something fundamentally different about what you're doing compared to what I'm doing. It might help if you told us the dimensions of your image and how exactly it gets allocated (in order to help establish the memory layout).

回复收藏 0 原文