Intel VML 添加速度慢
我编写了这个小子例程,用于比较简单的向量数学函数,使用循环执行:
f(i) = a(i) + b(i)
或直接执行:
f = a + b
或使用英特尔 MKL VML:
vdAdd(n,a,b,f)
n=50000000 的计时结果为:
VML 0.9 秒 直接0.4 循环 0.4
我不明白,为什么 VML 花费的时间是其他方法的两倍! (循环有时比直接更快)
可以在 http://paste.ideaslabs.com/show/ 下找到子例程L6dVLdAOIf 并通过调用
program test
use vmltests
implicit none
call vmlTest()
end program
I wrote this small subroutine that compares simple vector mathematical functions, performed either with a loop:
f(i) = a(i) + b(i)
or direct:
f = a + b
or using Intel MKL VML:
vdAdd(n,a,b,f)
The timing results for n=50000000 are:
VML 0.9 sec
direct 0.4
loop 0.4
And I dont understand, why VML takes twice as long as the other methods!
(Loop is sometimes faster than direct)
Subroutine can be found under http://paste.ideaslabs.com/show/L6dVLdAOIf
and called via
program test
use vmltests
implicit none
call vmlTest()
end program
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的示例代码存在潜在的二级缓存问题,可以通过阻塞优化来克服它。有关详细信息,请参阅英特尔® 软件网络论坛答案:http://software .intel.com/en-us/forums/showthread.php?t=80041
英特尔® 优化通知:
Your sample code have potential L2 cache issue, one can overcome it with blocking optimization. See Intel® Software Networks Forum answer for details: http://software.intel.com/en-us/forums/showthread.php?t=80041
Intel® Optimization Notice: