使用 linux perf 工具测量应用程序的 FLOP
我想使用“perf”(Linux 性能计数器子系统的新命令行接口命令)来测量某些应用程序执行的浮点和算术运算的数量。 (出于测试目的,我使用了我创建的一个简单的虚拟应用程序,请参见下文)。
因为我找不到任何为测量 FP 和整数运算而定义的“perf”事件,所以我开始挖掘原始硬件事件代码(与 -rNNN 一起使用,其中 NNN 是事件代码的十六进制值)。所以我真正的问题是,我找到的退休指令(INST_RETIRED)的代码没有区分 FP 和其他指令(X87 和 MMX/SSE)。当我尝试对特定代码使用适当的 umask 时,我发现“perf”不知何故不理解或支持 umask 包含。我尝试过:
% perf stat -e rC0 ./a.out
这给了我已退休的指令,但
% perf stat -e rC002 ./a.out
它应该给我执行的X87指令,说我提供了错误的参数。也许是这样,但是将原始硬件事件的 umask 与“perf”一起使用的正确方法是什么?一般来说,如何获取使用 perf 工具执行的程序的浮点和整数运算的确切数量?
非常感谢, Konstantin Boyanov
这是我的测试应用程序:
int main(void){
float numbers[1000];
float res1;
double doubles[1000];
double res2;
int i,j=3,k=42;
for(i=0;i<1000;i++){
numbers[i] = (i+k)*j;
doubles[i] = (i+j)*k;
res1 = numbers[i]/(float)k;
res2 = doubles[i]/(float)j;
}
}
I want to measure the ammount of floating point and arithmetic operations executed by some application with 'perf', the new command line interface command to the linux performance counter subsystem. (For testing purposes I use a simple dummy app which I created, see below).
Because I could not find any 'perf' events defined for measuring FP and integer operations, I started digging in the raw hardware event codes (to be used with -rNNN, where NNN is hexadecimal value of the event code). So my real problem is that, the codes I found for retired instructions (INST_RETIRED) do not make the distinction between FP and other instructions (X87 and MMX/SSE). When I tried to use the appropriate umasks to the particular code I found out that somehow 'perf' does not understand or support the umask inclusion. i tried with:
% perf stat -e rC0 ./a.out
which gives me the instructions retired, but
% perf stat -e rC002 ./a.out
which should give me the X87 instructions executed says I supplied wrong parameters. Maybe so, but what is the correct way to use umasks of raw hardware events with 'perf'? in general what is the way to get the exact number of floating point and integer operations a program executed using the perf tool?
Many thanks,
Konstantin Boyanov
Here is my test app:
int main(void){
float numbers[1000];
float res1;
double doubles[1000];
double res2;
int i,j=3,k=42;
for(i=0;i<1000;i++){
numbers[i] = (i+k)*j;
doubles[i] = (i+j)*k;
res1 = numbers[i]/(float)k;
res2 = doubles[i]/(float)j;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
要使用的事件取决于处理器。您可以使用 libpfm4 (http://perfmon2.git.sourceforge.net/git/gitweb-index.cgi) 来确定哪些是可用事件(使用 showevinfo 程序),然后使用同一发行版中的 check_events 来找出原始事件事件的代码。我的 Sandy Bridge CPU 支持 FP_COMP_OPS_EXE 事件,我根据经验发现该事件与 FLOP 计数密切相关。
The event to use depends on the processor. You can use libpfm4 (http://perfmon2.git.sourceforge.net/git/gitweb-index.cgi) to determine which are the available events (using the showevinfo program) and then check_events from the same distribution to figure out the raw codes for the event. My Sandy Bridge CPU supports the FP_COMP_OPS_EXE event which I have empirically found corresponds closely to the FLOP count.
我不确定 perf,但 oprofile 有许多处理器的浮点事件。可能会有一些重叠,因为 INST_RETIRED 也是一个有效的 oprofile 事件。
I'm not sure about perf, but oprofile has floating point events for many processors. There may be some overlap, as INST_RETIRED is a valid oprofile event too.