提高标准矩阵乘法算法的效率?
如何提高标准矩阵乘法算法的效率?
这种方式涉及到的主要操作是:C[i][j]+=A[i][p]*B[p][j]
可以采取哪些措施来提高效率算法?
How can I improve the efficiency of standard matrix multiplication algorithm?
The main operation involved in this approach is: C[i][j]+=A[i][p]*B[p][j]
What can be done to improve the efficiency of the algorithm?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可能想看看使用 BLAS(基本线性代数子例程)库,特别是 Intel 提供的 MKL 这里,AMD 有他们的 ACML 这里,还有(开源)Goto BLAS 这里。
(密集)矩阵-矩阵乘法内核将是
?GEMM
调用,其中?
指示浮点类型。例如,DGEMM
将调用double
例程。除非您非常有信心知道自己在进行低级优化,否则这些库可能会比您手动编码的库提供更好的性能。
如果您确实想尝试自己编码,那么您可能需要考虑以下事项:
SSE、SSE2..4
指令得到广泛支持,一些较新的CPU
还将支持AVX
指令。此参考资料可能会让您了解当前的状况:
希望这有帮助。
You might want to have a look at using a BLAS (Basic Linear Algebra Subroutine) library, specifically Intel offer their MKL here, AMD have their ACML here and there's also the (opensource) Goto BLAS here.
The (dense) matrix-matrix multiply kernel will be a
?GEMM
call, where the?
indicates the floating point type. For exampleDGEMM
will call thedouble
routine.Unless you're extremely confident you know what you're doing with low-level optimisations, these libraries will probably offer better performance than something you can code by hand.
If you do want to have a go at coding this yourself then you may want to consider the following:
SSE, SSE2..4
instructions are widely supported, some newerCPU
's will also supportAVX
instructions.This reference might give you an idea of the current state of things:
Hope this helps.
请注意,使用这些方法并不能保证更好的性能。需要进行大量调整才能显着加快速度。人们投入了大量资金来研究如何快速进行矩阵乘法,因此关于该主题的期刊文章并不缺乏。
Note that using these methods does not guarantee better performance. There is a lot of tuning required give a significant speedup. There is a lot of money going into figuring out how to multiply matrices quickly so there is no shortage of journal articles on the subject.
我建议阅读Golub 和 Van Loan< 的第一章/a>,它解决了这个确切的问题。
I would suggest reading Chapter 1 of Golub and Van Loan, which addresses this exact question.
如果问题涉及多个矩阵乘法 - M1 x M2 x ... x Mn - 那么还有另一种基于动态规划的优化技术,这是另一种球类游戏。请注意,这不适用于提高两个矩阵相乘的效率;但是,如果您以成对方式将三个或更多矩阵相乘,那么您可以在更高级别上进行优化。只是想我会把这个答案放在堆上以完善信息。
If the question concerns multiple matrix multiplications - M1 x M2 x ... x Mn - then there is another optimization technique based on dynamic programming, which is sort of another ball-game. Note that this doesn't apply to improving the efficiency of multiplying two matrices; however, if you are multiplying three or more matrices in a pairwise fashion, then you can optimize at an even higher level. Just thought I'd throw this answer on the heap to round out the information.
嗯,有Strassen算法,根据矩阵的大小,它比您列出的标准算法。当然还有更快的算法,但它们实现起来并不那么简单。
标准算法是O(N^3),
Strassen 算法为 O(N^2.8),
Coppersmith-Winograd 的复杂度为 O(N^2.3)
Well there's Strassen's Algorithm, which, depending on the size of your matrix, is ever so slightly faster than the standard algorithm that you listed. There are of course even faster algorithms, but they arent so simple to implement.
The standard algorithm is O(N^3),
Strassen's algo is O(N^2.8),
and Coppersmith-Winograd is O(N^2.3)