确定内核故障转储中源代码中的确切行

发布于 2024-10-25 18:05:08 字数 2518 浏览 1 评论 0原文

你好 我正在使用我的驱动程序在接口上运行 bi-di 'iperf' 测试。 重现步骤是在一个接口上运行 bi-di I/O(其他接口未激活):

  • 上运行 iperf -c -P 8 -t 100000 -I 10
  • 在 DUT iperf -c 几乎立即从对等方获得与上述相同的参数(在上述“iperf send”的第一个 10 秒结束后) 两者都使用“iperf -s -w 256K”

崩溃不是在驱动程序中发生的,而是在“iperf”上下文中发生的。我将复制粘贴堆栈跟踪:

 PID: 8855   TASK: f7036550  CPU: 0   COMMAND: "iperf"
 #0 [c074bed0] crash_kexec at c0443233
 #1 [c074bf14] die at c04064d3
 #2 [c074bf44] do_page_fault at c062134b
 #3 [c074bf94] error_code (via page_fault) at c0405abb
    EAX: f5888100  EBX: 00000000  ECX: 00100100  EDX: 00200200  EBP: 00000001
    DS:  007b      ESI: f5888000  ES:  007b      EDI: cb614000
    CS:  0060      EIP: c05c4e94  ERR: ffffffff  EFLAGS: 00010046
 #4 [c074bfc8] net_rx_action at c05c4e94
 #5 [c074bfe4] __do_softirq at c042aa65
--- <soft IRQ> ---
 #0 [f281ac4c] do_softirq at c04073e5
 #1 [f281ac58] do_IRQ at c04074d9
 #2 [f281ac70] common_interrupt at c0405975
    EAX: 39383736  EBX: f281af4c  ECX: 00000428  EDX: 31303938  EBP: f378b042
    DS:  007b      ESI: f378b1c2  ES:  007b      EDI: 09fdb448
    CS:  0060      EIP: c04f1c07  ERR: ffffffba  EFLAGS: 00000202
 #3 [f281aca4] __copy_to_user_ll at c04f1c07
 #4 [f281acb0] memcpy_toiovec at c05bfecc
 #5 [f281acc4] skb_copy_datagram_iovec at c05c059b
 #6 [f281acf4] tcp_rcv_established at c05ef40a
 #7 [f281ad20] tcp_v4_do_rcv at c05f48c5
 #8 [f281ad54] tcp_prequeue_process at c05e6bdd
 #9 [f281ad5c] tcp_recvmsg at c05e90e2
#10 [f281ad9c] sock_common_recvmsg at c05bb1c4
#11 [f281adc0] sock_recvmsg at c05b8dc6
#12 [f281aea0] sys_recvfrom at c05ba6ab
#13 [f281af64] sys_recv at c05ba727
#14 [f281af80] sys_socketcall at c05bab52
#15 [f281afb8] system_call at c0404f44
    EAX: ffffffda  EBX: 0000000a  ECX: b6ba2340  EDX: 00014268
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 09fbe630
    SS:  007b      ESP: b6ba2328  EBP: b6ba2378
    CS:  0073      EIP: 004ad410  ERR: 00000066  EFLAGS: 00000293
crash>

崩溃时的 EIPnet_rx_action:0xdd/19ca。现在我已经编译了 kernel-2.6.18-238 源(运行 DUT 的操作系统的源版本)并执行了 'objdump -S ./net/core/ dev.o> ./net/core/dev.c 上的 dev_o_dmp',其中包含 net_rx_acdtion() 的定义。现在,在“dev_o_dmp”文件中,net_rx_action() 有许多内联定义,因此无法完全反映源文件中的流程。在这种情况下,将 0xdd 添加到 net_rx_action 的基地址(例如 32FF)=> 是否安全? 340C .ie 340C 是导致崩溃“内核分页请求错误”的违规行号

有关如何处理的任何提示/建议调试这个问题会有很大帮助

Hi
I am running a bi-di 'iperf' test on an interface using my driver.
Steps to repro would be to run bi-di I/O on one interface(other interface is not active):

  • Run iperf -c -P 8 -t 100000 -I 10 on DUT
  • iperf -c with same params as above from peer almost immediately ( after 1st 10s of above 'iperf send' are over)
    With 'iperf -s -w 256K' on both

The crash is not happening as such in the driver but in the 'iperf' context. I am going to copy-paste the stack trace:

 PID: 8855   TASK: f7036550  CPU: 0   COMMAND: "iperf"
 #0 [c074bed0] crash_kexec at c0443233
 #1 [c074bf14] die at c04064d3
 #2 [c074bf44] do_page_fault at c062134b
 #3 [c074bf94] error_code (via page_fault) at c0405abb
    EAX: f5888100  EBX: 00000000  ECX: 00100100  EDX: 00200200  EBP: 00000001
    DS:  007b      ESI: f5888000  ES:  007b      EDI: cb614000
    CS:  0060      EIP: c05c4e94  ERR: ffffffff  EFLAGS: 00010046
 #4 [c074bfc8] net_rx_action at c05c4e94
 #5 [c074bfe4] __do_softirq at c042aa65
--- <soft IRQ> ---
 #0 [f281ac4c] do_softirq at c04073e5
 #1 [f281ac58] do_IRQ at c04074d9
 #2 [f281ac70] common_interrupt at c0405975
    EAX: 39383736  EBX: f281af4c  ECX: 00000428  EDX: 31303938  EBP: f378b042
    DS:  007b      ESI: f378b1c2  ES:  007b      EDI: 09fdb448
    CS:  0060      EIP: c04f1c07  ERR: ffffffba  EFLAGS: 00000202
 #3 [f281aca4] __copy_to_user_ll at c04f1c07
 #4 [f281acb0] memcpy_toiovec at c05bfecc
 #5 [f281acc4] skb_copy_datagram_iovec at c05c059b
 #6 [f281acf4] tcp_rcv_established at c05ef40a
 #7 [f281ad20] tcp_v4_do_rcv at c05f48c5
 #8 [f281ad54] tcp_prequeue_process at c05e6bdd
 #9 [f281ad5c] tcp_recvmsg at c05e90e2
#10 [f281ad9c] sock_common_recvmsg at c05bb1c4
#11 [f281adc0] sock_recvmsg at c05b8dc6
#12 [f281aea0] sys_recvfrom at c05ba6ab
#13 [f281af64] sys_recv at c05ba727
#14 [f281af80] sys_socketcall at c05bab52
#15 [f281afb8] system_call at c0404f44
    EAX: ffffffda  EBX: 0000000a  ECX: b6ba2340  EDX: 00014268
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 09fbe630
    SS:  007b      ESP: b6ba2328  EBP: b6ba2378
    CS:  0073      EIP: 004ad410  ERR: 00000066  EFLAGS: 00000293
crash>

the EIP at the time of crash is net_rx_action:0xdd/19ca. Now i have compiled the kernel-2.6.18-238 sources( the source version of the OS on which the DUT is running) and did an 'objdump -S ./net/core/dev.o > dev_o_dmp' on the ./net/core/dev.c which has the definition of the net_rx_acdtion(). Now in the 'dev_o_dmp' file the net_rx_action() has lots of inline definitions and hence somehow does not exactly mirror the flow in the source file. In such a scenario ,is it safe to add 0xdd to the base addr of net_rx_action (say 32FF) => 340C .i.e would 340C be the offending line number that is giving rise to the crash ' kernel paging request error'

Any tips /recommendations on how to go about debugging this problem would be of great help

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

中二柚 2024-11-01 18:05:08

不幸的是,或者幸运的是,根据您的观点,通过高级别的优化,编译器可能会创建调试格式无法将合理的 C 代码行映射到汇编指令的汇编代码。您可能遇到此问题的情况类型取决于编译器、优化级别、调试符号格式、调试符号级别和代码本身。

您必须假设通过此技术获得的行号可能是错误的。话虽如此,我在自己的内核工作中经常使用这种技术,而且还没有遇到任何问题(敲木头)。请记住,如果您遇到一些毫无意义的事情,您可能会得到一个错误的行号。

Unfortunately, or fortunately depending on your perspective, with high levels of optimization it is possible for the compiler to create assembly code that the debug format cannot make a reasonable C code line to assembly instruction(s) mapping. What type of cases you can run into this problem depends on the compiler, optimization level, debug symbol format, debug symbol level, and the code itself.

You have to assume that line numbers gained via this technique could be wrong. That being said, I use this technique frequently in my own kernel work and I have not had any problems yet (knocks on wood). Just remember that if you are faced with something that just makes no sense, you could have a bad line number.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文