std::pow 的性能 - 缓存未命中?
我一直在尝试优化我的一个数字程序,但遇到了一些谜团。我正在循环执行数千个浮点运算的代码,其中 1 个调用 pow
- 尽管如此,该调用花费了 5% 的时间......这不一定是一个关键问题,但它是奇怪,所以我想了解发生了什么。
当我分析缓存未命中时,VS.NET 2010RC 的分析器报告几乎所有缓存未命中都发生在 std::pow
中...那么...这是怎么回事?有更快的替代方案吗?我尝试了 powf
,但这只是稍微快一点;它仍然是造成异常数量的缓存未命中的原因。
为什么像 pow 这样的基本函数会导致缓存未命中?
编辑:这不是托管代码。 /Oi
内在函数已启用,但编译器可以选择忽略它。用 exp(y*log(x))
替换 pow(x,y)
具有类似的性能 - 现在所有缓存未命中都在日志函数中。
I've been trying to optimize a numeric program of mine, and have run into something of a mystery. I'm looping over code that performs thousands of floating point operations of which 1 call to pow
- nevertheless, that call takes 5% of the time... That's not necessarily a critical issue, but it is odd, so I'd like to understand what's happening.
When I profiled for cache misses, VS.NET 2010RC's profiler reports that virtually all cache misses are occurring in std::pow
... so... what's up with that? Is there a faster alternative? I tried powf
, but that's only slightly faster; it's still responsible for an abnormal number of cache misses.
Why would a basic function like pow cause cache-misses?
Edit: this is not managed code. /Oi
intrinsics are enabled, but the compiler may at its option ignore that. Replacing pow(x,y)
by exp(y*log(x))
has similar performance - just now all the cache misses are in the log function.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
是的..很慢。至于为什么详细,其他觉得更有信心的人可以尝试解释一下。
想要加快速度吗?这里:http://martin .ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-cc/
Yea.. it's slow. As to why in detail someone else who feels more confident can try to explain.
Want to speed it up ? here : http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/
您能否提供有关“x”以及评估 pow 的环境的更多信息?
您所看到的可能是正在工作的硬件预取器。根据探查器,不同汇编指令的“成本”分配可能不正确,在长延迟指令(例如评估 pow 所需的指令)上应该更频繁。
除此之外,我会使用像 VTune/PTU 这样的真实分析器,而不是任何 Visual Studio 版本中可用的分析器。
Can you give more information on the 'x' as well as the environment where pow is evaluated?
What you are seeing might be the hardware prefetchers at work. Depending on the profiler the allocation of the 'cost' of the different assembly instructions might be incorrect, it should be even more frequent on long latency instructions like the ones needed to evaluate pow.
Added to that, I would use a real profiler like VTune/PTU than the one available in any Visual Studio version.
如果将
std::pow(var)
替换为其他函数,例如std::max(var, var)
,它是否仍占用 5%?您仍然会遇到所有缓存未命中情况吗?我猜时间不会,缓存未命中是。计算幂比许多其他操作要慢(您使用的是哪个?)。调用不在缓存中的代码将导致缓存未命中,无论它是哪个函数。
If you replace
std::pow(var)
with another function, likestd::max(var, var)
, does it still take up 5%? Do you still get all the cache misses?I'm guessing no on time and yes on cache misses. Calculating powers is slower than many other operations (which are you using?). Calling out to code that's not in the cache will cause a cache miss no matter which function it is.
如果您的代码涉及一些繁重的数字运算,那么对于
std::pow
消耗 5% 的运行时间我不会感到太惊讶。许多数字运算都非常快,因此像 std::pow 这样稍慢的操作相对于其他已经很快的操作来说似乎需要更多时间。 (这也可以解释为什么切换到 std::powf 后没有看到太多改进。)缓存未命中有点更令人费解,并且在没有更多数据的情况下很难提供解释。一种可能性是,如果您的其他代码内存密集型以至于它吞噬了所有分配的缓存,那么
std::pow
承受缓存上的所有打击也就不足为奇了错过了。If your code involves some heavy number-crunching, I wouldn't be too surprised that
std::pow
is consuming 5% of the running time. Many numeric operations are very fast, so a slightly slower operation likestd::pow
will appear to take more time relative to the other already-fast operations. (That would also account for why you didn't see much improvement switching tostd::powf
.)The cache misses are somewhat more puzzling, and it's hard to offer an explanation without more data. One possibility is that if your other code is so memory-intense that it gobbles up all the allocated cache, then it wouldn't be completely surprising that
std::pow
is taking all the punches on the cache misses.