确定内核故障转储中源代码中的确切行
你好 我正在使用我的驱动程序在接口上运行 bi-di
'iperf
' 测试。 重现步骤是在一个接口上运行 bi-di I/O(其他接口未激活):
- 上运行 iperf -c -P 8 -t 100000 -I 10
- 在 DUT iperf -c 几乎立即从对等方获得与上述相同的参数(在上述“iperf send”的第一个 10 秒结束后) 两者都使用“iperf -s -w 256K”
崩溃不是在驱动程序中发生的,而是在“iperf
”上下文中发生的。我将复制粘贴堆栈跟踪:
PID: 8855 TASK: f7036550 CPU: 0 COMMAND: "iperf"
#0 [c074bed0] crash_kexec at c0443233
#1 [c074bf14] die at c04064d3
#2 [c074bf44] do_page_fault at c062134b
#3 [c074bf94] error_code (via page_fault) at c0405abb
EAX: f5888100 EBX: 00000000 ECX: 00100100 EDX: 00200200 EBP: 00000001
DS: 007b ESI: f5888000 ES: 007b EDI: cb614000
CS: 0060 EIP: c05c4e94 ERR: ffffffff EFLAGS: 00010046
#4 [c074bfc8] net_rx_action at c05c4e94
#5 [c074bfe4] __do_softirq at c042aa65
--- <soft IRQ> ---
#0 [f281ac4c] do_softirq at c04073e5
#1 [f281ac58] do_IRQ at c04074d9
#2 [f281ac70] common_interrupt at c0405975
EAX: 39383736 EBX: f281af4c ECX: 00000428 EDX: 31303938 EBP: f378b042
DS: 007b ESI: f378b1c2 ES: 007b EDI: 09fdb448
CS: 0060 EIP: c04f1c07 ERR: ffffffba EFLAGS: 00000202
#3 [f281aca4] __copy_to_user_ll at c04f1c07
#4 [f281acb0] memcpy_toiovec at c05bfecc
#5 [f281acc4] skb_copy_datagram_iovec at c05c059b
#6 [f281acf4] tcp_rcv_established at c05ef40a
#7 [f281ad20] tcp_v4_do_rcv at c05f48c5
#8 [f281ad54] tcp_prequeue_process at c05e6bdd
#9 [f281ad5c] tcp_recvmsg at c05e90e2
#10 [f281ad9c] sock_common_recvmsg at c05bb1c4
#11 [f281adc0] sock_recvmsg at c05b8dc6
#12 [f281aea0] sys_recvfrom at c05ba6ab
#13 [f281af64] sys_recv at c05ba727
#14 [f281af80] sys_socketcall at c05bab52
#15 [f281afb8] system_call at c0404f44
EAX: ffffffda EBX: 0000000a ECX: b6ba2340 EDX: 00014268
DS: 007b ESI: 00000000 ES: 007b EDI: 09fbe630
SS: 007b ESP: b6ba2328 EBP: b6ba2378
CS: 0073 EIP: 004ad410 ERR: 00000066 EFLAGS: 00000293
crash>
崩溃时的 EIP
是 net_rx_action:0xdd/19ca
。现在我已经编译了 kernel-2.6.18-238 源
(运行 DUT 的操作系统的源版本)并执行了 'objdump -S ./net/core/ dev.o>
',其中包含 ./net/core/dev.c
上的 dev_o_dmpnet_rx_acdtion
() 的定义。现在,在“dev_o_dmp
”文件中,net_rx_action()
有许多内联定义,因此无法完全反映源文件中的流程。在这种情况下,将 0xdd 添加到 net_rx_action 的基地址(例如 32FF)=> 是否安全? 340C .ie 340C
是导致崩溃“内核分页请求错误
”的违规行号
有关如何处理的任何提示/建议调试这个问题会有很大帮助
Hi
I am running a bi-di
'iperf
' test on an interface using my driver.
Steps to repro would be to run bi-di I/O
on one interface(other interface is not active):
- Run iperf -c -P 8 -t 100000 -I 10 on DUT
- iperf -c with same params as above from peer almost immediately ( after 1st 10s of above 'iperf send' are over)
With 'iperf -s -w 256K' on both
The crash is not happening as such in the driver but in the 'iperf
' context. I am going to copy-paste the stack trace:
PID: 8855 TASK: f7036550 CPU: 0 COMMAND: "iperf"
#0 [c074bed0] crash_kexec at c0443233
#1 [c074bf14] die at c04064d3
#2 [c074bf44] do_page_fault at c062134b
#3 [c074bf94] error_code (via page_fault) at c0405abb
EAX: f5888100 EBX: 00000000 ECX: 00100100 EDX: 00200200 EBP: 00000001
DS: 007b ESI: f5888000 ES: 007b EDI: cb614000
CS: 0060 EIP: c05c4e94 ERR: ffffffff EFLAGS: 00010046
#4 [c074bfc8] net_rx_action at c05c4e94
#5 [c074bfe4] __do_softirq at c042aa65
--- <soft IRQ> ---
#0 [f281ac4c] do_softirq at c04073e5
#1 [f281ac58] do_IRQ at c04074d9
#2 [f281ac70] common_interrupt at c0405975
EAX: 39383736 EBX: f281af4c ECX: 00000428 EDX: 31303938 EBP: f378b042
DS: 007b ESI: f378b1c2 ES: 007b EDI: 09fdb448
CS: 0060 EIP: c04f1c07 ERR: ffffffba EFLAGS: 00000202
#3 [f281aca4] __copy_to_user_ll at c04f1c07
#4 [f281acb0] memcpy_toiovec at c05bfecc
#5 [f281acc4] skb_copy_datagram_iovec at c05c059b
#6 [f281acf4] tcp_rcv_established at c05ef40a
#7 [f281ad20] tcp_v4_do_rcv at c05f48c5
#8 [f281ad54] tcp_prequeue_process at c05e6bdd
#9 [f281ad5c] tcp_recvmsg at c05e90e2
#10 [f281ad9c] sock_common_recvmsg at c05bb1c4
#11 [f281adc0] sock_recvmsg at c05b8dc6
#12 [f281aea0] sys_recvfrom at c05ba6ab
#13 [f281af64] sys_recv at c05ba727
#14 [f281af80] sys_socketcall at c05bab52
#15 [f281afb8] system_call at c0404f44
EAX: ffffffda EBX: 0000000a ECX: b6ba2340 EDX: 00014268
DS: 007b ESI: 00000000 ES: 007b EDI: 09fbe630
SS: 007b ESP: b6ba2328 EBP: b6ba2378
CS: 0073 EIP: 004ad410 ERR: 00000066 EFLAGS: 00000293
crash>
the EIP
at the time of crash is net_rx_action:0xdd/19ca
. Now i have compiled the kernel-2.6.18-238 sources
( the source version of the OS on which the DUT is running) and did an 'objdump -S ./net/core/dev.o > dev_o_dmp
' on the ./net/core/dev.c
which has the definition of the net_rx_acdtion
(). Now in the 'dev_o_dmp
' file the net_rx_action()
has lots of inline definitions and hence somehow does not exactly mirror the flow in the source file. In such a scenario ,is it safe to add 0xdd to the base addr of net_rx_action (say 32FF) => 340C
.i.e would 340C
be the offending line number that is giving rise to the crash ' kernel paging request error
'
Any tips /recommendations on how to go about debugging this problem would be of great help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不幸的是,或者幸运的是,根据您的观点,通过高级别的优化,编译器可能会创建调试格式无法将合理的 C 代码行映射到汇编指令的汇编代码。您可能遇到此问题的情况类型取决于编译器、优化级别、调试符号格式、调试符号级别和代码本身。
您必须假设通过此技术获得的行号可能是错误的。话虽如此,我在自己的内核工作中经常使用这种技术,而且还没有遇到任何问题(敲木头)。请记住,如果您遇到一些毫无意义的事情,您可能会得到一个错误的行号。
Unfortunately, or fortunately depending on your perspective, with high levels of optimization it is possible for the compiler to create assembly code that the debug format cannot make a reasonable C code line to assembly instruction(s) mapping. What type of cases you can run into this problem depends on the compiler, optimization level, debug symbol format, debug symbol level, and the code itself.
You have to assume that line numbers gained via this technique could be wrong. That being said, I use this technique frequently in my own kernel work and I have not had any problems yet (knocks on wood). Just remember that if you are faced with something that just makes no sense, you could have a bad line number.