查找上下文切换的数量
为了衡量多线程应用程序的上下文开关数量,我遵循了两种方法:1)使用perf Sched
和2)在/proc/proc/pid/state < /代码>。不过,差异很大。我所做的步骤是:
1-使用perf命令,开关数为7848。
$ sudo perf stat -e sched:sched_switch,task-clock,context-switches,cpu-migrations,page-faults,cycles,instructions ./mm_double_omp 4
Using 4 threads
PID = 395944
Performance counter stats for './mm_double_omp 4':
7,601 sched:sched_switch # 0.044 K/sec
173,377.19 msec task-clock # 3.973 CPUs utilized
7,601 context-switches # 0.044 K/sec
2 cpu-migrations # 0.000 K/sec
24,780 page-faults # 0.143 K/sec
164,393,781,352 cycles # 0.948 GHz
69,723,515,498 instructions # 0.42 insn per cycle
43.636463582 seconds time elapsed
173.244505000 seconds user
0.123880000 seconds sys
请注意,sched:sched_switch
和上下文转换
是相同的。如果我仅使用sched:sched_switch
该号码仍然按7000
。程序的开始和结束。
int main() {
char cmdbuf[256];
int pid_num = getpid();
printf("PID = %d\n", pid_num);
snprintf(cmdbuf, sizeof(cmdbuf), "sudo cp /proc/%d/status %s", pid_num, "start.txt" );
system(cmdbuf);
// DO
snprintf(cmdbuf, sizeof(cmdbuf), "sudo cp /proc/%d/status %s", pid_num, "finish.txt" );
system(cmdbuf);
return 0;
}
执行后,我看到:
$ tail -n2 start.txt
voluntary_ctxt_switches: 2
nonvoluntary_ctxt_switches: 0
$ tail -n2 finish.txt
voluntary_ctxt_switches: 5
nonvoluntary_ctxt_switches: 573
因此,少于600个上下文开关远低于perf结果。问题是:
perf代码会影响测量吗?如果是,那么它的开销很大。
上下文开关的含义在这两种方法中都是相同的吗?
哪个更可靠?
In order to measure the number of context switches for a multi-thread application, I followed two methods: 1) with perf sched
and 2) with the information in /proc/pid/status
. The difference is quite large, though. The steps I did are:
1- Using perf command, the number of switches is 7848.
$ sudo perf stat -e sched:sched_switch,task-clock,context-switches,cpu-migrations,page-faults,cycles,instructions ./mm_double_omp 4
Using 4 threads
PID = 395944
Performance counter stats for './mm_double_omp 4':
7,601 sched:sched_switch # 0.044 K/sec
173,377.19 msec task-clock # 3.973 CPUs utilized
7,601 context-switches # 0.044 K/sec
2 cpu-migrations # 0.000 K/sec
24,780 page-faults # 0.143 K/sec
164,393,781,352 cycles # 0.948 GHz
69,723,515,498 instructions # 0.42 insn per cycle
43.636463582 seconds time elapsed
173.244505000 seconds user
0.123880000 seconds sys
Please note that sched:sched_switch
and context-switches
are the same. If I only use sched:sched_switch
the number is still in the order of 7000.
2- I modified the code to copy /proc/pid/status
file two times: At the beginning and finish of the program.
int main() {
char cmdbuf[256];
int pid_num = getpid();
printf("PID = %d\n", pid_num);
snprintf(cmdbuf, sizeof(cmdbuf), "sudo cp /proc/%d/status %s", pid_num, "start.txt" );
system(cmdbuf);
// DO
snprintf(cmdbuf, sizeof(cmdbuf), "sudo cp /proc/%d/status %s", pid_num, "finish.txt" );
system(cmdbuf);
return 0;
}
After the execution I see:
$ tail -n2 start.txt
voluntary_ctxt_switches: 2
nonvoluntary_ctxt_switches: 0
$ tail -n2 finish.txt
voluntary_ctxt_switches: 5
nonvoluntary_ctxt_switches: 573
So, there are less than 600 context switches which is far less than the perf result. Questions are:
Does perf code affect the measurement? If yes, then it has a large overhead.
Is the meaning of context switch is the same in both methods?
Which one is more reliable then?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论