高效的缓存和 BLOB - 分析缓存命中/未命中
为了使程序具有高速缓存效率,所使用的数据应该线性存储,对吗?
因此,我没有使用动态分配,而是使用线性分配器将数据放入 blob 中。这足以提高性能吗?我应该怎样做才能进一步提高缓存效率?
我知道这个问题并不具体,但我不知道如何解释它......
哪些程序可以帮助我分析缓存命中/未命中?
For a program to be cache efficient the data used should be stored linearly right?
So instead of dynamic allocation I put my data in a blob using a linear allocator. Is this enought to improve performace? what should I do to improve cache efficiency even more?
I know that this questions arent specific but I don't know how to explain it...
Which programs can help me profile cache hits/misses?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您正在寻找适用于 Windows 的分析器,您可以尝试 AMD 的 CodeAnalyst 或VerySleepy,这两个都是免费的,但是 AMD 是两者中更强大的一个(并且可以工作)在英特尔硬件上,但 iirc 你不能使用基于硬件的分析工具),它包括监视分支预测未命中和缓存利用率等内容。分析很棒,因为它告诉您要优化什么,但您并不总是知道如何优化,为此,您应该看看Agner Fog 的优化手册 结合 Intel 优化手册 (其中包含很多有关局部性和缓存性优化的内容)
If your looking for a profiler for windows, you can try AMD's CodeAnalyst or VerySleepy, both of these are free, AMDs is the more powerful of the two however( and works on intel hardware, but iirc you can't use the hardware based profiling stuff), it includes monitoring of things like branch prediction misses and cache utilization. Profiling is great, as it tells you what to optimize, but you don't always know how, for that, you should have a look at Agner Fog's optimization manuals combined with Intel's optimization manual (which contains a lot on locality and cachability optimizations)
如果您使用的是 Linux,则可以使用 Valgrind(特别是 cachegrind 工具)。
如果您使用的是 Windows,那么 VS2010(2008) 专业版有一个内置分析器,但是
我不知道有关其缓存分析工具的任何详细信息。还有英特尔
VTune 分析仪(放大器)。它们都是商业产品,尽管我认为你可以获得 30 天的评估副本。
关于 SO 的其他一些问题可能会有所帮助:
If you're on Linux you could use Valgrind(specifically cachegrind tool).
If you're on Windows then VS2010(2008) Professional edition has a builtin profiler but
I don't know any details about it's cache profiling facilities. There is also the Intel
VTune Analyzer(Amplifier). Both of them are commercial products, although I think you can get 30 days evaluation copies.
Some other questions on SO that might be of help:
在 Linux 上,您可以使用
perf mem
对内存访问进行采样,包括以非常细粒度的方式未命中(包括未命中地址),如 此处描述。On Linux, you can use
perf mem
to sample memory accesses, including misses in a very fine-grained manner (including the miss address), as described here.