优化数组求和(子集问题)

发布于 2024-11-07 09:24:08 字数 2002 浏览 0 评论 0原文

在我的程序中最热门的部分(根据 gprof,90% 的时间),我需要将一个数组 A 求和到另一个 B 中。两个数组的大小均为 2^n(n 为 18..24)并保存一个整数(为简单起见) ,实际上存储的元素是mpz_t或小int数组)。求和规则:对于0..2^n-1中的每个i,设置B[i] = sum(A[j]),其中j是位向量,以及j & ~ i == 0 (换句话说,如果 i 的第 k 位是,则任何 j 的第 k 位不能设置为 1不是 1)。

我当前的代码(这是最内层循环的主体)在 2^(1.5 * n) 总和的时间内完成此操作,因为我将在 A 的(平均)2^(n/2) 个元素上迭代每个 i。

  int A[1<<n]; // have some data
  int B[1<<n]; // empty
  for (int i = 0; i < (1<<n); i++ ) {
    /* Iterate over subsets */
    for (int j = i; ; j=(j-1) & i ) {
      B[i] += A[j];  /* it is an `sum`, actually it can be a mpz_add here */
      if(j==0) break;
    }
  }

我提到过,几乎所有总和都是根据之前求和的值重新计算的。我建议,可以有代码,在 n* 2^n 总和的时间内完成相同的任务。

我的第一个想法是 B[i] = B[i_without_the_most_significant_bit] + A[j_new];其中 j_new 仅是 j 具有 i 中处于“1”状态的most_significant 位。这使我的时间减少了一半,但这还不够(实际问题大小仍然需要数小时和数天):

  int A[1<<n];
  int B[1<<n];
  B[0] = A[0]; // the i==0 will not work with my idea and clz()
  for (int i = 1; i < (1<<n); i++ ) {
    int msb_of_i = 1<< ((sizeof(int)*8)-__builtin_clz(i)-1);
    int i_wo_msb = i & ~ msb;
    B[i] = B[i_wo_msb];
    /* Iterate over subsets */
    for (int j_new = i; ; j_new=(j_new-1) & i ) {
      B[i] += A[j_new];  
      if(j_new==msb) break; // stop, when we will try to unset msb
    }
  }

您能建议更好的算法吗?

附加图像,对于 n=4 的每个 i 求和的 i 和 j 列表:

i  j`s summed
0  0
1  0 1
2  0 2
3  0 1 2 3
4  0 4
5  0 1 4 5
6  0 2 4 6
7  0 1 2 3 4 5 6 7
8  0                8
9  0 1              8 9
a  0 2              8 a
b  0 1 2 3          8 9 a b
c  0 4              8 c
d  0 1 4 5          8 9 c d
e  0 2 4 6          8 a c e
f  0 1 2 3 4 5 6 7  8 9 a b c d e f

注意数字的相似性

PS msb 魔法来自这里:取消字中的最高有效位 (int32) [C]

In the hottest part of my program (90% of time according to gprof), I need to sum one array A into another B. Both arrays are 2^n (n is 18..24) sized and holds an integer (for simplicity, actually the element stored is mpz_t or small int array). The rule of summing: for each i in 0..2^n-1, set B[i] = sum (A[j]), where j is bit vector, and j & ~ i == 0 (in other words, k-th bit of any j can't be set to 1 if k-th bit of i is not 1).

My current code (this is a body of innermost loop) does this in the time of 2^(1.5 * n) sums, because I will iterate for each i on (in average) 2^(n/2) elements of A.

  int A[1<<n]; // have some data
  int B[1<<n]; // empty
  for (int i = 0; i < (1<<n); i++ ) {
    /* Iterate over subsets */
    for (int j = i; ; j=(j-1) & i ) {
      B[i] += A[j];  /* it is an `sum`, actually it can be a mpz_add here */
      if(j==0) break;
    }
  }

My I mentioned, that almost any sum is recomputed from the values, summed earlier. I suggest, there can be code, doing the same task in the time of n* 2^n sums.

My first idea is that B[i] = B[i_without_the_most_significant_bit] + A[j_new]; where j_new is only j's having the most_significant bit from i in '1' state. This halves my time, but this is not enough (still hours and days on real problem size):

  int A[1<<n];
  int B[1<<n];
  B[0] = A[0]; // the i==0 will not work with my idea and clz()
  for (int i = 1; i < (1<<n); i++ ) {
    int msb_of_i = 1<< ((sizeof(int)*8)-__builtin_clz(i)-1);
    int i_wo_msb = i & ~ msb;
    B[i] = B[i_wo_msb];
    /* Iterate over subsets */
    for (int j_new = i; ; j_new=(j_new-1) & i ) {
      B[i] += A[j_new];  
      if(j_new==msb) break; // stop, when we will try to unset msb
    }
  }

Can you suggest better algorithm?

Additional image, list of i and j summed for each i for n=4:

i  j`s summed
0  0
1  0 1
2  0 2
3  0 1 2 3
4  0 4
5  0 1 4 5
6  0 2 4 6
7  0 1 2 3 4 5 6 7
8  0                8
9  0 1              8 9
a  0 2              8 a
b  0 1 2 3          8 9 a b
c  0 4              8 c
d  0 1 4 5          8 9 c d
e  0 2 4 6          8 a c e
f  0 1 2 3 4 5 6 7  8 9 a b c d e f

Note the similarity of figures

PS the msb magic is from here: Unset the most significant bit in a word (int32) [C]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

奢华的一滴泪 2024-11-14 09:24:08

分而治之?现在还没到位。

void sums(int *a, int n, int *b) {
  if (n <= 0) {
    *b = *a;
    return;
  }
  int m = 1 << (n - 1);
  sums(a, n - 1, b);
  sums(a + m, n - 1, b + m);
  for (int i = 0; i < m; i++) {
    b[m + i] += b[i];
  }
}

Divide and conquer anyone? Now not in-place.

void sums(int *a, int n, int *b) {
  if (n <= 0) {
    *b = *a;
    return;
  }
  int m = 1 << (n - 1);
  sums(a, n - 1, b);
  sums(a + m, n - 1, b + m);
  for (int i = 0; i < m; i++) {
    b[m + i] += b[i];
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文