确定内核故障转储中源代码中的确切行

发布于 2024-10-25 18:05:08 字数 2518 浏览 1 评论 0原文

你好我正在使用我的驱动程序在接口上运行 bi-di 'iperf' 测试。重现步骤是在一个接口上运行 bi-di I/O（其他接口未激活）：

上运行 iperf -c -P 8 -t 100000 -I 10
在 DUT iperf -c 几乎立即从对等方获得与上述相同的参数（在上述“iperf send”的第一个 10 秒结束后）两者都使用“iperf -s -w 256K”

崩溃不是在驱动程序中发生的，而是在“iperf”上下文中发生的。我将复制粘贴堆栈跟踪：

 PID: 8855   TASK: f7036550  CPU: 0   COMMAND: "iperf"
 #0 [c074bed0] crash_kexec at c0443233
 #1 [c074bf14] die at c04064d3
 #2 [c074bf44] do_page_fault at c062134b
 #3 [c074bf94] error_code (via page_fault) at c0405abb
    EAX: f5888100  EBX: 00000000  ECX: 00100100  EDX: 00200200  EBP: 00000001
    DS:  007b      ESI: f5888000  ES:  007b      EDI: cb614000
    CS:  0060      EIP: c05c4e94  ERR: ffffffff  EFLAGS: 00010046
 #4 [c074bfc8] net_rx_action at c05c4e94
 #5 [c074bfe4] __do_softirq at c042aa65
--- <soft IRQ> ---
 #0 [f281ac4c] do_softirq at c04073e5
 #1 [f281ac58] do_IRQ at c04074d9
 #2 [f281ac70] common_interrupt at c0405975
    EAX: 39383736  EBX: f281af4c  ECX: 00000428  EDX: 31303938  EBP: f378b042
    DS:  007b      ESI: f378b1c2  ES:  007b      EDI: 09fdb448
    CS:  0060      EIP: c04f1c07  ERR: ffffffba  EFLAGS: 00000202
 #3 [f281aca4] __copy_to_user_ll at c04f1c07
 #4 [f281acb0] memcpy_toiovec at c05bfecc
 #5 [f281acc4] skb_copy_datagram_iovec at c05c059b
 #6 [f281acf4] tcp_rcv_established at c05ef40a
 #7 [f281ad20] tcp_v4_do_rcv at c05f48c5
 #8 [f281ad54] tcp_prequeue_process at c05e6bdd
 #9 [f281ad5c] tcp_recvmsg at c05e90e2
#10 [f281ad9c] sock_common_recvmsg at c05bb1c4
#11 [f281adc0] sock_recvmsg at c05b8dc6
#12 [f281aea0] sys_recvfrom at c05ba6ab
#13 [f281af64] sys_recv at c05ba727
#14 [f281af80] sys_socketcall at c05bab52
#15 [f281afb8] system_call at c0404f44
    EAX: ffffffda  EBX: 0000000a  ECX: b6ba2340  EDX: 00014268
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 09fbe630
    SS:  007b      ESP: b6ba2328  EBP: b6ba2378
    CS:  0073      EIP: 004ad410  ERR: 00000066  EFLAGS: 00000293
crash>

崩溃时的 EIP 是 net_rx_action:0xdd/19ca。现在我已经编译了 kernel-2.6.18-238 源（运行 DUT 的操作系统的源版本）并执行了 'objdump -S ./net/core/ dev.o> ./net/core/dev.c 上的 dev_o_dmp'，其中包含 net_rx_acdtion() 的定义。现在，在“dev_o_dmp”文件中，net_rx_action() 有许多内联定义，因此无法完全反映源文件中的流程。在这种情况下，将 0xdd 添加到 net_rx_action 的基地址（例如 32FF）=> 是否安全？ 340C .ie 340C 是导致崩溃“内核分页请求错误”的违规行号

有关如何处理的任何提示/建议调试这个问题会有很大帮助

原文

Hi
I am running a bi-di 'iperf' test on an interface using my driver.
Steps to repro would be to run bi-di I/O on one interface(other interface is not active):

Run iperf -c -P 8 -t 100000 -I 10 on DUT
iperf -c with same params as above from peer almost immediately ( after 1st 10s of above 'iperf send' are over)
With 'iperf -s -w 256K' on both

The crash is not happening as such in the driver but in the 'iperf' context. I am going to copy-paste the stack trace:

 PID: 8855   TASK: f7036550  CPU: 0   COMMAND: "iperf"
 #0 [c074bed0] crash_kexec at c0443233
 #1 [c074bf14] die at c04064d3
 #2 [c074bf44] do_page_fault at c062134b
 #3 [c074bf94] error_code (via page_fault) at c0405abb
    EAX: f5888100  EBX: 00000000  ECX: 00100100  EDX: 00200200  EBP: 00000001
    DS:  007b      ESI: f5888000  ES:  007b      EDI: cb614000
    CS:  0060      EIP: c05c4e94  ERR: ffffffff  EFLAGS: 00010046
 #4 [c074bfc8] net_rx_action at c05c4e94
 #5 [c074bfe4] __do_softirq at c042aa65
--- <soft IRQ> ---
 #0 [f281ac4c] do_softirq at c04073e5
 #1 [f281ac58] do_IRQ at c04074d9
 #2 [f281ac70] common_interrupt at c0405975
    EAX: 39383736  EBX: f281af4c  ECX: 00000428  EDX: 31303938  EBP: f378b042
    DS:  007b      ESI: f378b1c2  ES:  007b      EDI: 09fdb448
    CS:  0060      EIP: c04f1c07  ERR: ffffffba  EFLAGS: 00000202
 #3 [f281aca4] __copy_to_user_ll at c04f1c07
 #4 [f281acb0] memcpy_toiovec at c05bfecc
 #5 [f281acc4] skb_copy_datagram_iovec at c05c059b
 #6 [f281acf4] tcp_rcv_established at c05ef40a
 #7 [f281ad20] tcp_v4_do_rcv at c05f48c5
 #8 [f281ad54] tcp_prequeue_process at c05e6bdd
 #9 [f281ad5c] tcp_recvmsg at c05e90e2
#10 [f281ad9c] sock_common_recvmsg at c05bb1c4
#11 [f281adc0] sock_recvmsg at c05b8dc6
#12 [f281aea0] sys_recvfrom at c05ba6ab
#13 [f281af64] sys_recv at c05ba727
#14 [f281af80] sys_socketcall at c05bab52
#15 [f281afb8] system_call at c0404f44
    EAX: ffffffda  EBX: 0000000a  ECX: b6ba2340  EDX: 00014268
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 09fbe630
    SS:  007b      ESP: b6ba2328  EBP: b6ba2378
    CS:  0073      EIP: 004ad410  ERR: 00000066  EFLAGS: 00000293
crash>

the EIP at the time of crash is net_rx_action:0xdd/19ca. Now i have compiled the kernel-2.6.18-238 sources( the source version of the OS on which the DUT is running) and did an 'objdump -S ./net/core/dev.o > dev_o_dmp' on the ./net/core/dev.c which has the definition of the net_rx_acdtion(). Now in the 'dev_o_dmp' file the net_rx_action() has lots of inline definitions and hence somehow does not exactly mirror the flow in the source file. In such a scenario ,is it safe to add 0xdd to the base addr of net_rx_action (say 32FF) => 340C .i.e would 340C be the offending line number that is giving rise to the crash ' kernel paging request error'

Any tips /recommendations on how to go about debugging this problem would be of great help

分享到QQ

分享到微博