如何测量 Windows 上的内存带宽利用率?
我有一个高度线程的程序,但我相信它无法跨多个内核很好地扩展,因为它已经饱和了所有内存带宽。
是否有任何工具可以测量正在使用多少内存带宽?
编辑:请注意,典型的分析器会显示内存泄漏和内存分配等内容,我对此不感兴趣。 我只关心内存带宽是否饱和。
I have a highly threaded program but I believe it is not able to scale well across multiple cores because it is already saturating all the memory bandwidth.
Is there any tool out there which allows to measure how much of the memory bandwidth is being used?
Edit: Please note that typical profilers show things like memory leaks and memory allocation, which I am not interested in.
I am only whether the memory bandwidth is being saturated or not.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您有最新的英特尔处理器,您可以尝试使用英特尔(r) 性能计数器监视器:http://software.intel.com/en-us/articles/intel-performance-counter-monitor/ 它可以直接测量内存控制器消耗的内存带宽。
If you have a recent Intel processor, you might try to use Intel(r) Performance Counter Monitor: http://software.intel.com/en-us/articles/intel-performance-counter-monitor/ It can directly measure consumed memory bandwidth from the memory controllers.
我推荐 Visual Studio Sample Profiler,它可以收集特定硬件计数器上的示例事件。例如,您可以选择在缓存未命中时进行采样。 这里有一篇文章解释了如何选择 CPU 计数器,不过您也可以使用其他计数器。
I'd recommend the Visual Studio Sample Profiler which can collect sample events on specific hardware counters. For example, you can choose to sample on cache misses. Here's an article explaining how to choose the CPU counter, though there are other counters you can play with as well.
很难找到一个可以测量应用程序内存带宽利用率的工具。
但是,由于您面临的问题是可疑的内存带宽问题,因此您可以尝试测量应用程序是否每秒生成大量页面错误,这肯定意味着您距离理论内存带宽还很远。
您还应该衡量您的算法对缓存的友好程度。如果他们破坏缓存,您的内存带宽利用率将受到严重阻碍。谷歌“测量缓存未命中”的好消息来源告诉你如何做到这一点。
it would be hard to find a tool that measured memory bandwidth utilization for your application.
But since the issue you face is a suspected memory bandwidth problem, you could try and measure if your application is generating a lot of page faults / sec, which would definitely mean that you are no where near the theoretical memory bandwidth.
You should also measure how cache friendly your algorithms are. If they are thrashing the cache, your memory bandwidth utilization will be severely hampered. Google "measuring cache misses" on good sources that tells you how to do this.
使用任何类型的纯软件解决方案都不可能正确测量内存总线利用率。 (过去是在 80 年代左右。但后来我们有了管道、缓存、乱序执行、多核、具有多个总线的非统一内存架构等等)。
您绝对必须有硬件监视内存总线,以确定它的“繁忙”程度。
幸运的是,大多数 PC 平台确实有一些,所以你只需要驱动程序和其他软件来与之对话:
wenjianhn 评论说有一个专门针对英特尔硬件的项目(他们称之为处理器计数器监视器),网址为 https://github.com/opcm/pcm
对于 Windows 上的其他架构,我不确定。但是有一个项目(针对 Linux)在 https:// /github.com/RRZE-HPC/likwid
原则上,计算机工程师可以将合适的示波器连接到几乎任何 PC 上并“直接”进行监控,尽管这可能需要经过适当培训的计算机工程师以及相当高性能的测试仪器(阅读:两者都非常昂贵)。
如果您自己尝试这样做,请知道您可能需要仪器或至少需要分析来了解您打算监控利用率的总线协议。
有时,对于某些总线来说,这可能非常简单 - 例如旧的并行 FIFO 硬件,通常有一条单独的线路用于“fifo full”,另一条线路用于“fifoempty”。
此类芯片通常在单向链路上的较快总线和较慢总线之间使用。 “fifo full”信号,即使它通常偶尔会触发,也可以监控是否存在过长的“长”电平:以 USB 2.0 高速链路为例,当操作系统未轮询 USB fifo 硬件时,就会发生这种情况时间。通过测量这些“阻塞”的频率和持续时间,您可以测量总线利用率,但仅限于该 USB 2.0 总线。
对于 PC 内存总线,我想您也可以尝试仅监控 RAM 接口使用的电量 - 这可能会随着使用而扩展。这可能很难做到,但你可能会“幸运”。您需要为总线提供 VccIO 的电源电流。实际上,对于较新的 PC 硬件来说,这应该比那些古老的 80 年代的系统(它们在打开时总是以全功率运行)工作得更好。
一个相当普通的示波器对于这两个例子来说就足够了——你只需要一个只能在“长于给定宽度的脉冲”上触发的示波器,并让它一直运行直到它触发,这是进行“浸泡测试”的好方法长时间。
您可以通过查找“空闲”时间的变化来监控任一方式的利用率。
但现代 PC 内存总线要复杂得多,而且速度也快得多。
要直接通过分接总线来完成此操作,您至少需要一个专门设计用于监视 PC 所具有的 DDR 总线生成的示波器(和有源探头),以及用于解码协议的软件分析选项(通常单独出售)足以确定其上发生的活动类型,从中您可以确定要测量为“空闲”的活动类型。
您甚至可能需要一块旨在允许您进行这些测量的主板。
这并不像仅仅寻找没有活动的时间段那么直接 - 所有 DRAM 至少都需要定期刷新周期,这可能会也可能不会随着明显的总线活动而发生(有些 DRAM 自动执行此操作,有些需要特定命令来执行)触发它,有些可以继续从不在刷新中的存储体中寻址和传输数据,有些则不能,等等)。
因此,仪器需要能够足够深入地分析数据,以便您了解其繁忙程度。
最好也是最简单的选择是找到一家拥有满足您需求的工具的 PC 硬件 (CPU) 供应商,然后购买该硬件,以便您可以使用这些工具。
这甚至可能涉及在虚拟机中运行应用程序,因此您可以从托管它的不同操作系统中更好的工具中受益。
为此,您可能想要尝试 Linux KVM(是的,即使对于 Windows - 有适用于它的 Windows 来宾驱动程序),并将您的 VM 固定到特定的 CPU,同时您还配置 Linux 以避免将其他作业放在上面那些相同的CPU。
It isn't possible to properly measure memory bus utilisation with any kind of software-only solution. (it used to be, back in the 80's or so. But then we got piplining, cache, out-of-order execution, multiple cores, non-uniform memory architectues with multiple busses, etc etc etc).
You absolutely have to have hardware monitoring the memory bus, to determine how 'busy' it is.
Fortunately, most PC platforms do have some, so you just need the drivers and other software to talk to it:
wenjianhn comments that there is a project specficially for intel hardware (which they call the Processor Counter Monitor) at https://github.com/opcm/pcm
For other architectures on Windows, I am not sure. But there is a project (for linux) which has a grab-bag of support for different architectures at https://github.com/RRZE-HPC/likwid
In principle, a computer engineer could attach a suitable oscilloscope to almost any PC and do the monitoring 'directly', although this is likely to require both a suitably-trained computer engineer as well as quite high performance test instruments (read: both very costly).
If you try this yourself, know that you'll likely need instruments or at least analysis which is aware of the protocol of the bus you're intending to monitor for utilisation.
This can sometimes be really easy, with some busses - eg old parallel FIFO hardware, which usually has a separate wire for 'fifo full' and another for 'fifo empty'.
Such chips are used usually between a faster bus and a slower one, on a one-way link. The 'fifo full' signal, even it it normally occasionally triggers, can be monitored for excessively 'long' levels: For the example of a USB 2.0 Hi-Speed link, this happens when the OS isn't polling the USB fifo hardware on time. Measuring the frequency and duration of these 'holdups' then lets you measure bus utilisation, but only for this USB 2.0 bus.
For a PC memory bus, I guess you could also try just monitoring how much power your RAM interface is using - which perhaps may scale with use. This might be quite difficult to do, but you may 'get lucky'. You want the current of the supply which feeds VccIO for the bus. This should actually work much better for newer PC hardware than those ancient 80's systems (which always just ran at full power when on).
A fairly ordinary oscilloscope is enough for either of those examples - you just need one that can trigger only on 'pulses longer than a given width', and leave it running until it does, which is a good way to do 'soak testing' over long periods.
You monitor utiliation either way by looking for the change in 'idle' time.
But modern PC memory busses are quite a bit more complex, and also much faster.
To do it directly by tapping the bus, you'll need at least an oscilloscope (and active probes) designed explicitly for monitoring the generation of DDR bus your PC has, along with the software analysis option (usually sold separately) to decode the protocol enough to figure out the kind of activity which is occuring on it, from which you can figure out what kind of activity you want to measure as 'idle'.
You may even need a motherboard designed to allow you to make those measurements also.
This isn't so staightfoward as just looking for periods of no activity - all DRAM needs regular refresh cycles at the very least, which may or may not happen along with obvious bus activity (some DRAM's do it automatically, some need a specific command to trigger it, some can continue to address and transfer data from banks not in refresh, some can't, etc).
So the instrument needs to be able to analyse the data deeply enough for you extract how busy it is.
Your best, and simplest bet is to find a PC hardware (CPU) vendor who has tools which do what you want, and buy that hardware so you can use those tools.
This might even involve running your application in a VM, so you can benefit from better tools in a different OS hosting it.
To this end, you'll likely want to try Linux KVM (yes, even for Windows - there are windows guest drivers for it), and also pin down your VM to specific CPUs, whilst you also configure linux to avoid putting other jobs on those same CPUs.
https://www.hwinfo.com/ 支持某些报告这些指标的现代 CPU
https://www.hwinfo.com/ supports this for some modern CPUs that report those metrics