LAPACK:打包存储矩阵上的操作是否更快?
我想使用 Fortran 和 LAPACK 对实对称矩阵进行三对角化。 LAPACK 基本上提供了两个例程,一个在完整矩阵上运行,另一个在打包存储中的矩阵上运行。虽然后者肯定使用更少的内存,但我想知道是否可以对速度差异进行说明?
I want to tridiagonalize a real symmetric matrix using Fortran and LAPACK. LAPACK basically provides two routines, one operating on the full matrix, the other on the matrix in packed storage. While the latter surely uses less memory, I was wondering if anything can be said about the speed difference?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当然,这是一个经验问题:但一般来说,没有什么是免费的,更少的内存/更多的运行时间是一个非常常见的权衡。
在这种情况下,对于打包的情况,数据的索引更加复杂,因此当您遍历矩阵时,获取数据的成本会稍高一些。 (使这张图变得复杂的是,对于对称矩阵,lapack 例程还假设某种包装 - 您只有矩阵的上部或下部组件可用)。
今天早些时候我正在研究一个特征问题,所以我将用它作为测量基准;尝试使用简单的对称测试用例(赫登矩阵,来自 http:// /people.sc.fsu.edu/~jburkardt/m_src/test_mat/test_mat.html ),并将
ssyevd
与sspevd
大约有 18% 的差异,我必须承认这个差异比我预期的要大(包装箱的误差也稍大一些?)。这是Intel的MKL。当然,正如 eriktous 指出的那样,性能差异通常取决于您的矩阵,以及您正在做的问题;对矩阵的随机访问越多,开销就越严重。我使用的代码如下:
It's an empirical question, of course: but in general, nothing comes for free, and less memory/more runtime is a pretty common tradeoff.
In this case, the indexing for the data is more complex for the packed case, so as you traverse the matrix, the cost of getting your data is a little higher. (Complicating this picture is that for symmetric matrices, the lapack routines also assume a certain kind of packing - that you only have the upper or lower component of the matrix available).
I was messing around with an eigenproblem earlier today, so I'll use that as a measurement benchmark; trying with a simple symmetric test case (The Herdon matrix, from http://people.sc.fsu.edu/~jburkardt/m_src/test_mat/test_mat.html ), and comparing
ssyevd
withsspevd
There's about an 18% difference, which I must admit is larger than I expected (also with a slightly larger error for the packed case?). This is with intel's MKL. The performance difference will depend on your matrix in general, of course, as eriktous points out, and on the problem you're doing; the more random access to the matrix you have to do, the worse the overhead would be. The code I used is as follows: