关于Linux性能监控之CPU篇详解

发布于 2022-09-09 11:01:08 字数 2555 浏览 5 评论 0

1. 对于每一个CPU来说运行队列不要超过3，例如，如果是双核CPU就不要超过6；
2. 如果CPU在满负荷运行，应该符合下列分布，
a) User Time：65%～70%
b) System Time：30%～35%
c) Idle：0%～5%
3. 对于上下文切换要结合CPU使用率来看，如果CPU使用满足上述分布，大量的上下文切换也是可以接受的。

常用的监视工具有，vmstat, top,dstat和mpstat.
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 104300 16800 95328 72200 0 0 5 26 7 14 4 1 95 0
0 0 104300 16800 95328 72200 0 0 0 24 1021 64 1 1 98 0
0 0 104300 16800 95328 72200 0 0 0 0 1009 59 1 1 98 0

r表示运行队列的大小，
b表示由于IO等待而block的线程数量，
in表示中断的数量，
cs表示上下文切换的数量，
us表示用户CPU时间，
sys表示系统CPU时间，
wa表示由于IO等待而是CPU处于idle状态的时间，
id表示CPU处于idle状态的总时间。

dstat可以给出每一个设备产生的中断数：
# dstat -cip 1
----total-cpu-usage---- ----interrupts--- ---procs---
usr sys idl wai hiq siq| 15 169 185 |run blk new
6 1 91 2 0 0| 12 0 13 | 0 0 0
1 0 99 0 0 0| 0 0 6 | 0 0 0
0 0 100 0 0 0| 18 0 2 | 0 0 0
0 0 100 0 0 0| 0 0 3 | 0 0 0
我们可以看到这里有3个设备号15，169和185.设备名和设备号的关系我们可以参考文件/proc/interrupts, 这里185代表网卡eth1.
# cat /proc/interrupts
CPU0
0: 1277238713 IO-APIC-edge timer
6: 5 IO-APIC-edge floppy
7: 0 IO-APIC-edge parport0
8: 1 IO-APIC-edge rtc
9: 1 IO-APIC-level acpi
14: 6011913 IO-APIC-edge ide0
15: 15761438 IO-APIC-edge ide1
169: 26 IO-APIC-level Intel 82801BA-ICH2
185: 16785489 IO-APIC-level eth1
193: 0 IO-APIC-level uhci_hcd:usb1

mpstat可以显示每个CPU的运行状况，比如系统有4个CPU。我们可以看到：
# mpstat –P ALL 1
Linux 2.4.21-20.ELsmp (localhost.localdomain) 05/23/2006
05:17:31 PM CPU %user %nice %system %idle intr/s
05:17:32 PM all 0.00 0.00 3.19 96.53 13.27
05:17:32 PM 0 0.00 0.00 0.00 100.00 0.00
05:17:32 PM 1 1.12 0.00 12.73 86.15 13.27
05:17:32 PM 2 0.00 0.00 0.00 100.00 0.00
05:17:32 PM 3 0.00 0.00 0.00 100.00 0.00

总结的说，Linux性能监控包含以下方面：
检查系统的运行队列，确保每一个CPU的运行队列不大于3.确保CPU使用分布满足70/30原则（用户70%，系统30%）。如果系统时间过长，可能是因为频繁的调度和改变优先级。CPU Bound进程总是会被惩罚（降低优先级）而IO Bound进程总会被奖励（提高优先级）。

分享到QQ

分享到微博