升级内核从3.14升级到5.10之后,RT Linux抖动
我们有一个运行Linux 3.14抢占核的旧产品, 一个应用程序将投票式现场设备保持一致:将一个UDP数据包发送到一个IP,然后睡觉2ms,并且在睡眠完成后需要接收响应UDP数据包。当内核为3.14时,一切都很好。
但是,在使用RT补丁将内核升级到5.10之后,我们可以观察到一些抖动,并且应用程序中的NO_RESPNSE计数器增加了。 在Linux上运行的Wireshark,我可以看到情况是(第二列是自上一个数据包以来的时间):
44186 0.001031 172.23.0.17 172.23.7.17 UDP 57 37000 → 37000 Len=15
44187 0.002450 172.23.0.17 172.23.7.18 UDP 57 37000 → 37000 Len=15
44188 0.000118 172.23.7.17 172.23.0.17 UDP 313 37000 → 37000 Len=271
44189 0.000926 172.23.7.18 172.23.0.17 UDP 313 37000 → 37000 Len=271
我们想要的是:
44170 0.002116 172.23.0.17 172.23.1.17 UDP 57 37000 → 37000 Len=15
44171 0.001115 172.23.1.17 172.23.0.17 UDP 313 37000 → 37000 Len=271
44172 0.001042 172.23.0.17 172.23.1.18 UDP 57 37000 → 37000 Len=15
44173 0.001104 172.23.1.18 172.23.0.17 UDP 313 37000 → 37000 Len=271
因此,从172.23.7.17发出的响应为时已晚。但是经过一番测试,我认为此延迟不是由于归档设备而是内核之类的(我们在同一Linux上运行Wireshark,因此我认为时间戳可能并不总是正确的)。 顶部的Si%是旧核的3次。尤其是当我使用Hping3给CPU压力时(CPU只有一个核心)时,新内核中的Si%在旧核中为17%和6%:
top - 21:51:48 up 41 min, 3 users, load average: 1.52, 0.92, 0.66
Tasks: 126 total, 3 running, 123 sleeping, 0 stopped, 0 zombie
%Cpu(s): 12.2 us, 36.7 sy, 0.0 ni, 33.3 id, 1.1 wa, 0.0 hi, 16.7 si, 0.0 st
MiB Mem : 1910.4 total, 1612.3 free, 132.3 used, 165.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1660.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5794 root 20 0 11692 5516 5236 R 39.7 0.3 0:52.08 hping3
541 root 20 0 127828 126912 81048 S 10.1 6.5 3:51.52 our_app
当使用Cyclittest时,抖动将是新内核的大数量:
# ./cyclictest -a 0 --policy fifo -p 50 -N -t 1
WARN: stat /dev/cpu_dma_latency failed: No such file or directory
policy: fifo: loadavg: 1.81 1.95 1.56 2/136 8285
T: 0 ( 6260) P:50 I:1000 C: 981767 Min: 9082 Act: 13414 Avg: 14979 Max: 1850722
在旧内核上,它只会像Max:200000(0.2ms),而Iptables则没有运行。 完美记录的输出:
2.46% [kernel] [k] restore_all_switch_stack
1.62% [kernel] [k] check_preemption_disabled
1.27% [kernel] [k] __copy_user_ll
1.17% [kernel] [k] entry_INT80_32
0.97% libapt-pkg.so.6.0.0 [.] pkgCache::FindGrp
0.91% libapt-pkg.so.6.0.0 [.] debListParser::ParseDepends
0.85% ld-2.31.so (deleted) [.] 0x00001090
0.83% libapt-pkg.so.6.0.0 [.] pkgTagSection::Scan
0.78% libapt-pkg.so.6.0.0 [.] 0x0017c32c
0.76% [kernel] [k] __sched_text_start
0.66% [kernel] [k] __local_bh_enable_ip
0.64% libapt-pkg.so.6.0.0 [.] pkgCache::sHash
0.62% [kernel] [k] preempt_count_add
0.59% [kernel] [k] preempt_count_sub
0.56% [kernel] [k] __rcu_read_unlock
0.54% [kernel] [k] rt_spin_unlock
0.51% [kernel] [k] avc_has_perm_noaudit
0.50% libc-2.31.so [.] malloc
0.50% [kernel] [k] siphash_2u64
0.46% ld-2.31.so [.] 0x00001090
0.42% [kernel] [k] raw_sendmsg
0.41% [kernel] [k] syscall_exit_to_user_mode
0.40% [kernel] [k] __local_bh_disable_ip
0.39% [kernel] [k] ip_route_output_key_hash_rcu
0.39% [kernel] [k] fib_table_lookup
0.38% libapt-pkg.so.6.0.0 [.] pkgCache::GrpIterator::FindPkg
0.38% [kernel] [k] kmem_cache_alloc
0.38% [kernel] [k] exit_to_user_mode_prepare
0.38% [kernel] [k] __rcu_read_lock
0.38% [kernel] [k] sched_clock
0.38% [kernel] [k] kallsyms_expand_symbol.constprop.0
0.37% perf_5.10 [.] 0x0016ec37
0.35% [kernel] [k] try_to_wake_up
0.35% [kernel] [k] __switch_to_asm
For a higher level overview, try: perf top --sort comm,dso
您能给我一些建议吗?提前致谢。
We have an old product that running Linux 3.14 Preempt kernel,
One application keeps polling field devices one by one: send one UDP packet to one IP then sleep 2ms, and require to receive response UDP packet when sleep finished. When the kernel is 3.14 all is fine.
But after we upgrade the kernel to 5.10 with RT patch, we could observe some jitter and the no_respnse counter in the application is increased.
By Wireshark running on Linux, I could see that the situation is(the second column is the time since the last packet):
44186 0.001031 172.23.0.17 172.23.7.17 UDP 57 37000 → 37000 Len=15
44187 0.002450 172.23.0.17 172.23.7.18 UDP 57 37000 → 37000 Len=15
44188 0.000118 172.23.7.17 172.23.0.17 UDP 313 37000 → 37000 Len=271
44189 0.000926 172.23.7.18 172.23.0.17 UDP 313 37000 → 37000 Len=271
what we want is like :
44170 0.002116 172.23.0.17 172.23.1.17 UDP 57 37000 → 37000 Len=15
44171 0.001115 172.23.1.17 172.23.0.17 UDP 313 37000 → 37000 Len=271
44172 0.001042 172.23.0.17 172.23.1.18 UDP 57 37000 → 37000 Len=15
44173 0.001104 172.23.1.18 172.23.0.17 UDP 313 37000 → 37000 Len=271
So the response from 172.23.7.17 is too late. But after some test I think this delay is not due to filed devices but the kernel or something(we run the Wireshark on the same Linux so the timestamp may not always be correct I think).
The si% at the top is 3 times of old kernel. Especially when I use hping3 to give stress to the CPU(the CPU only has one core), the si% in the new kernel is 17% and 6% in the old kernel:
top - 21:51:48 up 41 min, 3 users, load average: 1.52, 0.92, 0.66
Tasks: 126 total, 3 running, 123 sleeping, 0 stopped, 0 zombie
%Cpu(s): 12.2 us, 36.7 sy, 0.0 ni, 33.3 id, 1.1 wa, 0.0 hi, 16.7 si, 0.0 st
MiB Mem : 1910.4 total, 1612.3 free, 132.3 used, 165.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1660.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5794 root 20 0 11692 5516 5236 R 39.7 0.3 0:52.08 hping3
541 root 20 0 127828 126912 81048 S 10.1 6.5 3:51.52 our_app
When use cyclictest, jitter will be large number for new kernel:
# ./cyclictest -a 0 --policy fifo -p 50 -N -t 1
WARN: stat /dev/cpu_dma_latency failed: No such file or directory
policy: fifo: loadavg: 1.81 1.95 1.56 2/136 8285
T: 0 ( 6260) P:50 I:1000 C: 981767 Min: 9082 Act: 13414 Avg: 14979 Max: 1850722
On old kernel it will be only like Max: 200000 (0.2ms) And the iptables is not running.
And the output of perf record:
2.46% [kernel] [k] restore_all_switch_stack
1.62% [kernel] [k] check_preemption_disabled
1.27% [kernel] [k] __copy_user_ll
1.17% [kernel] [k] entry_INT80_32
0.97% libapt-pkg.so.6.0.0 [.] pkgCache::FindGrp
0.91% libapt-pkg.so.6.0.0 [.] debListParser::ParseDepends
0.85% ld-2.31.so (deleted) [.] 0x00001090
0.83% libapt-pkg.so.6.0.0 [.] pkgTagSection::Scan
0.78% libapt-pkg.so.6.0.0 [.] 0x0017c32c
0.76% [kernel] [k] __sched_text_start
0.66% [kernel] [k] __local_bh_enable_ip
0.64% libapt-pkg.so.6.0.0 [.] pkgCache::sHash
0.62% [kernel] [k] preempt_count_add
0.59% [kernel] [k] preempt_count_sub
0.56% [kernel] [k] __rcu_read_unlock
0.54% [kernel] [k] rt_spin_unlock
0.51% [kernel] [k] avc_has_perm_noaudit
0.50% libc-2.31.so [.] malloc
0.50% [kernel] [k] siphash_2u64
0.46% ld-2.31.so [.] 0x00001090
0.42% [kernel] [k] raw_sendmsg
0.41% [kernel] [k] syscall_exit_to_user_mode
0.40% [kernel] [k] __local_bh_disable_ip
0.39% [kernel] [k] ip_route_output_key_hash_rcu
0.39% [kernel] [k] fib_table_lookup
0.38% libapt-pkg.so.6.0.0 [.] pkgCache::GrpIterator::FindPkg
0.38% [kernel] [k] kmem_cache_alloc
0.38% [kernel] [k] exit_to_user_mode_prepare
0.38% [kernel] [k] __rcu_read_lock
0.38% [kernel] [k] sched_clock
0.38% [kernel] [k] kallsyms_expand_symbol.constprop.0
0.37% perf_5.10 [.] 0x0016ec37
0.35% [kernel] [k] try_to_wake_up
0.35% [kernel] [k] __switch_to_asm
For a higher level overview, try: perf top --sort comm,dso
Could you give me some advice? Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论