可以使用哪些其他数学运算符来转换算法
差运算符(类似于导数运算符)和和运算符(类似于积分运算符)可用于更改算法,因为它们是逆的。
Sum of (difference of y) = y
Difference of (sum of y) = y
下面是在 ac 程序中以这种方式使用它们的示例。
此 C 程序演示了制作正方形数组的三种方法。
- 第一种方法是简单明显的方法,
y = x*x
。 - 第二种方法使用方程
(y 的差值) = (x0 + x1)*(x 的差值)
。 - 第三种方法是相反的,并使用方程
(y 之和) = x(x+1)(2x+1)/6
。
第二种方法始终比第一种方法稍快,尽管我没有费心去优化它。我想,如果我更加努力,我可以做得更好。
第三种方法始终慢两倍,但这并不意味着基本想法是愚蠢的。我可以想象,对于 y = x*x 之外的某些函数,这种方法可能会更快。此外还存在整数溢出问题。
尝试所有这些转换非常有趣,所以现在我想知道我可以使用哪些其他数学运算符对来转换算法?
这是代码:
#include <stdio.h>
#include <time.h>
#define tries 201
#define loops 100000
void printAllIn(unsigned int array[tries]){
unsigned int index;
for (index = 0; index < tries; ++index)
printf("%u\n", array[index]);
}
int main (int argc, const char * argv[]) {
/*
Goal, Calculate an array of squares from 0 20 as fast as possible
*/
long unsigned int obvious[tries];
long unsigned int sum_of_differences[tries];
long unsigned int difference_of_sums[tries];
clock_t time_of_obvious1;
clock_t time_of_obvious0;
clock_t time_of_sum_of_differences1;
clock_t time_of_sum_of_differences0;
clock_t time_of_difference_of_sums1;
clock_t time_of_difference_of_sums0;
long unsigned int j;
long unsigned int index;
long unsigned int sum1;
long unsigned int sum0;
long signed int signed_index;
time_of_obvious0 = clock();
for (j = 0; j < loops; ++j)
for (index = 0; index < tries; ++index)
obvious[index] = index*index;
time_of_obvious1 = clock();
time_of_sum_of_differences0 = clock();
for (j = 0; j < loops; ++j)
for (index = 1, sum_of_differences[0] = 0; index < tries; ++index)
sum_of_differences[index] = sum_of_differences[index-1] + 2 * index - 1;
time_of_sum_of_differences1 = clock();
time_of_difference_of_sums0 = clock();
for (j = 0; j < loops; ++j)
for (signed_index = 0, sum0 = 0; signed_index < tries; ++signed_index) {
sum1 = signed_index*(signed_index+1)*(2*signed_index+1);
difference_of_sums[signed_index] = (sum1 - sum0)/6;
sum0 = sum1;
}
time_of_difference_of_sums1 = clock();
// printAllIn(obvious);
printf(
"The obvious approach y = x*x took, %f seconds\n",
((double)(time_of_obvious1 - time_of_obvious0))/CLOCKS_PER_SEC
);
// printAllIn(sum_of_differences);
printf(
"The sum of differences approach y1 = y0 + 2x - 1 took, %f seconds\n",
((double)(time_of_sum_of_differences1 - time_of_sum_of_differences0))/CLOCKS_PER_SEC
);
// printAllIn(difference_of_sums);
printf(
"The difference of sums approach y = sum1 - sum0, sum = (x - 1)x(2(x - 1) + 1)/6 took, %f seconds\n",
(double)(time_of_difference_of_sums1 - time_of_difference_of_sums0)/CLOCKS_PER_SEC
);
return 0;
}
The difference operator, (similar to the derivative operator), and the sum operator, (similar to the integration operator), can be used to change an algorithm because they are inverses.
Sum of (difference of y) = y
Difference of (sum of y) = y
An example of using them that way in a c program is below.
This c program demonstrates three approaches to making an array of squares.
- The first approach is the simple obvious approach,
y = x*x
. - The second approach uses the equation
(difference in y) = (x0 + x1)*(difference in x)
. - The third approach is the reverse and uses the equation
(sum of y) = x(x+1)(2x+1)/6
.
The second approach is consistently slightly faster then the first one, even though I haven't bothered optimizing it. I imagine that if I tried harder I could make it even better.
The third approach is consistently twice as slow, but this doesn't mean the basic idea is dumb. I could imagine that for some function other than y = x*x
this approach might be faster. Also there is an integer overflow issue.
Trying out all these transformations was very interesting, so now I want to know what are some other pairs of mathematical operators I could use to transform the algorithm?
Here is the code:
#include <stdio.h>
#include <time.h>
#define tries 201
#define loops 100000
void printAllIn(unsigned int array[tries]){
unsigned int index;
for (index = 0; index < tries; ++index)
printf("%u\n", array[index]);
}
int main (int argc, const char * argv[]) {
/*
Goal, Calculate an array of squares from 0 20 as fast as possible
*/
long unsigned int obvious[tries];
long unsigned int sum_of_differences[tries];
long unsigned int difference_of_sums[tries];
clock_t time_of_obvious1;
clock_t time_of_obvious0;
clock_t time_of_sum_of_differences1;
clock_t time_of_sum_of_differences0;
clock_t time_of_difference_of_sums1;
clock_t time_of_difference_of_sums0;
long unsigned int j;
long unsigned int index;
long unsigned int sum1;
long unsigned int sum0;
long signed int signed_index;
time_of_obvious0 = clock();
for (j = 0; j < loops; ++j)
for (index = 0; index < tries; ++index)
obvious[index] = index*index;
time_of_obvious1 = clock();
time_of_sum_of_differences0 = clock();
for (j = 0; j < loops; ++j)
for (index = 1, sum_of_differences[0] = 0; index < tries; ++index)
sum_of_differences[index] = sum_of_differences[index-1] + 2 * index - 1;
time_of_sum_of_differences1 = clock();
time_of_difference_of_sums0 = clock();
for (j = 0; j < loops; ++j)
for (signed_index = 0, sum0 = 0; signed_index < tries; ++signed_index) {
sum1 = signed_index*(signed_index+1)*(2*signed_index+1);
difference_of_sums[signed_index] = (sum1 - sum0)/6;
sum0 = sum1;
}
time_of_difference_of_sums1 = clock();
// printAllIn(obvious);
printf(
"The obvious approach y = x*x took, %f seconds\n",
((double)(time_of_obvious1 - time_of_obvious0))/CLOCKS_PER_SEC
);
// printAllIn(sum_of_differences);
printf(
"The sum of differences approach y1 = y0 + 2x - 1 took, %f seconds\n",
((double)(time_of_sum_of_differences1 - time_of_sum_of_differences0))/CLOCKS_PER_SEC
);
// printAllIn(difference_of_sums);
printf(
"The difference of sums approach y = sum1 - sum0, sum = (x - 1)x(2(x - 1) + 1)/6 took, %f seconds\n",
(double)(time_of_difference_of_sums1 - time_of_difference_of_sums0)/CLOCKS_PER_SEC
);
return 0;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里有两类优化:强度降低和窥孔优化。
强度降低是用更便宜的函数替换“昂贵”数学函数的常用术语 - 例如,用两个 替换乘法对数表查找,加法,然后进行反对数查找以找到最终结果。
窥视孔优化是一个常用术语,用于用左移替换乘以 2 的幂之类的东西。某些 CPU 具有用于这些操作的简单指令,对于乘以 2 的幂的特定情况,这些指令的运行速度比通用整数乘法更快。
您还可以对单个算法进行优化。您可能会写
a * b
,但有许多不同的方法来执行乘法,并且不同的算法在不同的条件下表现更好或更差。其中许多决定是由芯片设计者做出的,但任意精度整数库根据可用原语的优点做出自己的选择。There are two classes of optimizations here: strength reduction and peephole optimizations.
Strength reduction is the usual term for replacing "expensive" mathematical functions with cheaper functions -- say, replacing a multiplication with two logarithm table lookups, an addition, and then an inverse logarithm lookup to find the final result.
Peephole optimizations is the usual term for replacing something like multiplication by a power of two with left shifts. Some CPUs have simple instructions for these operations that run faster than generic integer multiplication for the specific case of multiplying by powers of two.
You can also perform optimizations of individual algorithms. You might write
a * b
, but there are many different ways to perform multiplication, and different algorithms perform better or worse under different conditions. Many of these decisions are made by the chip designers, but arbitrary-precision integer libraries make their own choices based on the merits of the primitives available to them.当我尝试在 Ubuntu 10.04 上编译您的代码时,在 main() 启动时出现分段错误,因为您在堆栈上声明了许多兆字节的变量。在将大部分变量移到 main 之外以使它们成为全局变量后,我能够编译它。
然后我得到了这些结果:
程序运行得如此之快,很难相信它真的做了什么。我放入“-O0”选项来禁用优化,但 GCC 可能仍然优化了所有计算。因此,我尝试将“易失性”限定符添加到数组中,但仍然得到类似的结果。
这就是我停止研究的地方。总之,我真的不知道你的代码发生了什么,但很可能出了问题。
When I tried to compile your code on Ubuntu 10.04, I got a segmentation fault right when main() started because you are declaring many megabytes worth of variables on the stack. I was able to compile it after I moved most of your variables outside of main to make them be global variables.
Then I got these results:
The program runs so fast it's hard to believe it really did anything. I put the "-O0" option in to disable optimizations but it's possible GCC still might have optimized out all of the computations. So I tried adding the "volatile" qualifier to your arrays but still got similar results.
That's where I stopped working on it. In conclusion, I don't really know what's going on with your code but it's quite possible that something is wrong.