如何提高c中标准矩阵加法算法的效率?
如何提高标准矩阵加法算法的效率?
矩阵由二维数组表示并按顺序相加。
How would I improve the efficiency of the standard matrix addition algorithm?
The matrix is represented by a 2D array and is added sequentially.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不会阅读你所有的代码。正如我所看到的,这是附加部分,
我认为这不能在复杂性方面得到改进。至于其他类型的微优化,例如使用
++i
而不是i++
或更改循环的顺序等 - 我认为您不应该关心这些,直到您已经运行了一个分析器,它向您显示这些是您的性能瓶颈。请记住,过早的优化是万恶之源:)I am not going to read all your code. As I can see, this is the addition part
I don't think this can be improved complexity-wise. As for other types of microoptimizations such as doing a
++i
instead ofi++
or changing the order of the loops etc. - I think you shouldn't care about these until you've run a profiler which shows you that these are your performance bottlenecks. Remember, premature optimization is the root of all evil :)只要您以正确的顺序获得两个 for 循环,对于可移植代码来说,朴素的双 for 循环就非常接近最佳值。您需要按顺序访问内存才能获得最佳性能。
您可以展开循环,但这不会对性能产生太大影响。
如果您想要最佳性能,请不要自己编写它,而应使用已优化的 BLAS为您的平台。
The naive double for loop is pretty close to optimal for portable code, so long as you get your two for loops in the right order. You need to be accessing the memory sequentially to get best performance.
You could unroll the loops but this won't make very much difference to performance.
If you want best performance then don't write it yourself and instead use a BLAS that has been optimised for your platform.
您可以尝试使用 GPU 而不是 CPU 来执行密集型操作。您可以使用 AMP 来实现此目的。
You can try to use GPU instead of CPU for performing intensive operations. You can use AMP for this.