使用有效指针的 Memcpy 段错误

发布于 2024-11-26 23:05:06 字数 5561 浏览 6 评论 0原文

我在程序中使用 libcurl，并遇到了段错误。在我向curl 项目提交错误之前，我想我应该做一些调试。我所发现的东西对我来说似乎很奇怪，而且我还无法理解它。

首先，段错误回溯：

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe77f6700 (LWP 592)]
0x00007ffff6a2ea5c in memcpy () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff6a2ea5c in memcpy () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff5bc29e5 in x509_name_oneline (a=0x7fffe3d9c3c0,
    buf=0x7fffe77f4ec0 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%", size=255) at ssluse.c:629
#2  0x00007ffff5bc2a6f in cert_verify_callback (ok=1, ctx=0x7fffe77f50b0)
    at ssluse.c:645
#3  0x00007ffff72c9a80 in ?? () from /lib/libcrypto.so.0.9.8
#4  0x00007ffff72ca430 in X509_verify_cert () from /lib/libcrypto.so.0.9.8
#5  0x00007ffff759af58 in ssl_verify_cert_chain () from /lib/libssl.so.0.9.8
#6  0x00007ffff75809f3 in ssl3_get_server_certificate ()
   from /lib/libssl.so.0.9.8
#7  0x00007ffff7583e50 in ssl3_connect () from /lib/libssl.so.0.9.8
#8  0x00007ffff5bc48f0 in ossl_connect_step2 (conn=0x7fffe315e9a8, sockindex=0)
    at ssluse.c:1724
#9  0x00007ffff5bc700f in ossl_connect_common (conn=0x7fffe315e9a8,
    sockindex=0, nonblocking=false, done=0x7fffe77f543f) at ssluse.c:2498
#10 0x00007ffff5bc7172 in Curl_ossl_connect (conn=0x7fffe315e9a8, sockindex=0)
    at ssluse.c:2544
#11 0x00007ffff5ba76b9 in Curl_ssl_connect (conn=0x7fffe315e9a8, sockindex=0)
...

对 memcpy 的调用如下所示：

  memcpy(buf, biomem->data, size);
(gdb) p buf
$46 = 0x7fffe77f4ec0 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%"
(gdb) p biomem->data
$47 = 0x7fffe3e1ef60 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%"
(gdb) p size
$48 = 255

如果我向上移动一帧，我会看到为 buf 传入的指针来自调用函数中定义的局部变量：

char buf[256];

这就是它开始变得奇怪的地方。我可以手动检查 buf 和 biomem->data 的所有 256 字节，而不会让 gdb 抱怨内存不可访问。我还可以使用 gdb set 命令手动写入所有 256 字节的 buf，不会出现任何错误。那么如果涉及的所有内存都是可读可写的，为什么memcpy会失败呢？

同样有趣的是，我可以使用 gdb 手动调用 memcpy 以及所涉及的指针。只要我传递的大小<= 160，它运行就没有问题。一旦我通过 161 或更高，gdb 就会得到一个 sigsegv。我知道 buf 大于 160，因为它是在堆栈上创建为 256 的数组。biomem->data 有点难以计算，但我可以使用 gdb 读取超过字节 160 的数据。

我还应该提到这个函数（或者更确切地说是我调用的导致此的curl 方法）在崩溃之前多次成功完成。我的程序使用curl 在运行时重复调用Web 服务API。它每 5 秒左右调用一次 API，运行大约 14 小时后崩溃。我的应用程序中的其他内容可能会越界写入并踩踏导致错误条件的内容。但似乎很可疑，尽管时间有所不同，但它每次都在完全相同的点崩溃。所有指针在 gdb 中看起来都正常，但 memcpy 仍然失败。 Valgrind 没有发现任何边界错误，但我没有让我的程序用 valgrind 运行 14 小时。

在 memcpy 本身中，反汇编如下所示：

(gdb) x/20i $rip-10
   0x7ffff6a2ea52 <memcpy+242>: jbe    0x7ffff6a2ea74 <memcpy+276>
   0x7ffff6a2ea54 <memcpy+244>: lea    0x20(%rdi),%rdi
   0x7ffff6a2ea58 <memcpy+248>: je     0x7ffff6a2ea90 <memcpy+304>
   0x7ffff6a2ea5a <memcpy+250>: dec    %ecx
=> 0x7ffff6a2ea5c <memcpy+252>: mov    (%rsi),%rax
   0x7ffff6a2ea5f <memcpy+255>: mov    0x8(%rsi),%r8
   0x7ffff6a2ea63 <memcpy+259>: mov    0x10(%rsi),%r9
   0x7ffff6a2ea67 <memcpy+263>: mov    0x18(%rsi),%r10
   0x7ffff6a2ea6b <memcpy+267>: mov    %rax,(%rdi)
   0x7ffff6a2ea6e <memcpy+270>: mov    %r8,0x8(%rdi)
   0x7ffff6a2ea72 <memcpy+274>: mov    %r9,0x10(%rdi)
   0x7ffff6a2ea76 <memcpy+278>: mov    %r10,0x18(%rdi)
   0x7ffff6a2ea7a <memcpy+282>: lea    0x20(%rsi),%rsi
   0x7ffff6a2ea7e <memcpy+286>: lea    0x20(%rdi),%rdi
   0x7ffff6a2ea82 <memcpy+290>: jne    0x7ffff6a2ea30 <memcpy+208>
   0x7ffff6a2ea84 <memcpy+292>: data32 data32 nopw %cs:0x0(%rax,%rax,1)
   0x7ffff6a2ea90 <memcpy+304>: and    $0x1f,%edx
   0x7ffff6a2ea93 <memcpy+307>: mov    -0x8(%rsp),%rax
   0x7ffff6a2ea98 <memcpy+312>: jne    0x7ffff6a2e969 <memcpy+9>
   0x7ffff6a2ea9e <memcpy+318>: repz retq
(gdb) info registers
rax            0x0      0
rbx            0x7fffe77f50b0   140737077268656
rcx            0x1      1
rdx            0xff     255
rsi            0x7fffe3e1f000   140737016623104
rdi            0x7fffe77f4f60   140737077268320
rbp            0x7fffe77f4e90   0x7fffe77f4e90
rsp            0x7fffe77f4e48   0x7fffe77f4e48
r8             0x11     17
r9             0x10     16
r10            0x1      1
r11            0x7ffff6a28f7a   140737331236730
r12            0x7fffe3dde490   140737016358032
r13            0x7ffff5bc2a0c   140737316137484
r14            0x7fffe3d69b50   140737015880528
r15            0x0      0
rip            0x7ffff6a2ea5c   0x7ffff6a2ea5c <memcpy+252>
eflags         0x10203  [ CF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb) p/x $rsi
$50 = 0x7fffe3e1f000
(gdb) x/20x $rsi
0x7fffe3e1f000: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f010: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f020: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f030: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f040: 0x00000000      0x00000000      0x00000000      0x00000000

我使用 libcurl 版本 7.21.6、c-ares 版本 1.7.4 和 openssl 版本 1.0.0d。我的程序是多线程的，但我已经使用 openssl 注册了互斥回调。该程序在 64 位 Ubuntu 11.04 桌面上运行。 libc 是 2.13。

原文

I'm using libcurl in my program, and running into a segfault. Before I filed a bug with the curl project, I thought I'd do a little debugging. What I found seemed very odd to me, and I haven't been able to make sense of it yet.

First, the segfault traceback:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe77f6700 (LWP 592)]
0x00007ffff6a2ea5c in memcpy () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff6a2ea5c in memcpy () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff5bc29e5 in x509_name_oneline (a=0x7fffe3d9c3c0,
    buf=0x7fffe77f4ec0 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%", size=255) at ssluse.c:629
#2  0x00007ffff5bc2a6f in cert_verify_callback (ok=1, ctx=0x7fffe77f50b0)
    at ssluse.c:645
#3  0x00007ffff72c9a80 in ?? () from /lib/libcrypto.so.0.9.8
#4  0x00007ffff72ca430 in X509_verify_cert () from /lib/libcrypto.so.0.9.8
#5  0x00007ffff759af58 in ssl_verify_cert_chain () from /lib/libssl.so.0.9.8
#6  0x00007ffff75809f3 in ssl3_get_server_certificate ()
   from /lib/libssl.so.0.9.8
#7  0x00007ffff7583e50 in ssl3_connect () from /lib/libssl.so.0.9.8
#8  0x00007ffff5bc48f0 in ossl_connect_step2 (conn=0x7fffe315e9a8, sockindex=0)
    at ssluse.c:1724
#9  0x00007ffff5bc700f in ossl_connect_common (conn=0x7fffe315e9a8,
    sockindex=0, nonblocking=false, done=0x7fffe77f543f) at ssluse.c:2498
#10 0x00007ffff5bc7172 in Curl_ossl_connect (conn=0x7fffe315e9a8, sockindex=0)
    at ssluse.c:2544
#11 0x00007ffff5ba76b9 in Curl_ssl_connect (conn=0x7fffe315e9a8, sockindex=0)
...

The call to memcpy looks like this:

  memcpy(buf, biomem->data, size);
(gdb) p buf
$46 = 0x7fffe77f4ec0 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%"
(gdb) p biomem->data
$47 = 0x7fffe3e1ef60 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%"
(gdb) p size
$48 = 255

If I go up a frame, I see that the pointer passed in for buf came from a local variable defined in the calling function:

char buf[256];

Here's where it starts to get weird. I can manually inspect all 256 bytes of both buf and biomem->data without gdb complaining that the memory isn't accesible. I can also manually write all 256 bytes of buf using the gdb set command, without any error. So if all the memory involved is readable and writable, why does memcpy fail?

Also interesting is that I can use gdb to manually call memcpy with the pointers involved. As long as I pass a size <= 160, it runs without a problem. As soon as I pass 161 or higher, gdb gets a sigsegv. I know buf is larger than 160, because it was created on the stack as an array of 256. biomem->data is a little harder to figure, but I can read well past byte 160 with gdb.

I should also mention that this function (or rather the curl method I call that leads to this) completes successfully many times before the crash. My program uses curl to repeatedly call a web service API while it runs. It calls the API every five seconds or so, and runs for about 14 hours before it crashes. It's possible that something else in my app is writing out of bounds and stomping on something that creates the error condition. But it seems suspicious that it crashes at exactly the same point every time, although the timing varies. And all the pointers seem ok in gdb, but memcpy still fails. Valgrind doesn't find any bounds errors, but I haven't let my program run with valgrind for 14 hours.

Within memcpy itself, the disassembly looks like this:

(gdb) x/20i $rip-10
   0x7ffff6a2ea52 <memcpy+242>: jbe    0x7ffff6a2ea74 <memcpy+276>
   0x7ffff6a2ea54 <memcpy+244>: lea    0x20(%rdi),%rdi
   0x7ffff6a2ea58 <memcpy+248>: je     0x7ffff6a2ea90 <memcpy+304>
   0x7ffff6a2ea5a <memcpy+250>: dec    %ecx
=> 0x7ffff6a2ea5c <memcpy+252>: mov    (%rsi),%rax
   0x7ffff6a2ea5f <memcpy+255>: mov    0x8(%rsi),%r8
   0x7ffff6a2ea63 <memcpy+259>: mov    0x10(%rsi),%r9
   0x7ffff6a2ea67 <memcpy+263>: mov    0x18(%rsi),%r10
   0x7ffff6a2ea6b <memcpy+267>: mov    %rax,(%rdi)
   0x7ffff6a2ea6e <memcpy+270>: mov    %r8,0x8(%rdi)
   0x7ffff6a2ea72 <memcpy+274>: mov    %r9,0x10(%rdi)
   0x7ffff6a2ea76 <memcpy+278>: mov    %r10,0x18(%rdi)
   0x7ffff6a2ea7a <memcpy+282>: lea    0x20(%rsi),%rsi
   0x7ffff6a2ea7e <memcpy+286>: lea    0x20(%rdi),%rdi
   0x7ffff6a2ea82 <memcpy+290>: jne    0x7ffff6a2ea30 <memcpy+208>
   0x7ffff6a2ea84 <memcpy+292>: data32 data32 nopw %cs:0x0(%rax,%rax,1)
   0x7ffff6a2ea90 <memcpy+304>: and    $0x1f,%edx
   0x7ffff6a2ea93 <memcpy+307>: mov    -0x8(%rsp),%rax
   0x7ffff6a2ea98 <memcpy+312>: jne    0x7ffff6a2e969 <memcpy+9>
   0x7ffff6a2ea9e <memcpy+318>: repz retq
(gdb) info registers
rax            0x0      0
rbx            0x7fffe77f50b0   140737077268656
rcx            0x1      1
rdx            0xff     255
rsi            0x7fffe3e1f000   140737016623104
rdi            0x7fffe77f4f60   140737077268320
rbp            0x7fffe77f4e90   0x7fffe77f4e90
rsp            0x7fffe77f4e48   0x7fffe77f4e48
r8             0x11     17
r9             0x10     16
r10            0x1      1
r11            0x7ffff6a28f7a   140737331236730
r12            0x7fffe3dde490   140737016358032
r13            0x7ffff5bc2a0c   140737316137484
r14            0x7fffe3d69b50   140737015880528
r15            0x0      0
rip            0x7ffff6a2ea5c   0x7ffff6a2ea5c <memcpy+252>
eflags         0x10203  [ CF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb) p/x $rsi
$50 = 0x7fffe3e1f000
(gdb) x/20x $rsi
0x7fffe3e1f000: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f010: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f020: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f030: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f040: 0x00000000      0x00000000      0x00000000      0x00000000

I'm using libcurl version 7.21.6, c-ares version 1.7.4, and openssl version 1.0.0d. My program is multithreaded, but I have registered mutex callbacks with openssl. The program is running on Ubuntu 11.04 desktop, 64-bit. libc is 2.13.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无法言说的痛 2024-12-03 23:05:06

显然，libcurl 正在过度读取源缓冲区，并进入不可读的内存（位于 0x7fffe3e1f000 的页面 - 您可以通过查看 /proc 来确认内存不可读//maps 用于正在调试的程序）。

这就是事情开始变得奇怪的地方。我可以手动检查两者的所有 256 字节
buf 和 biomem->data ，而 gdb 不会抱怨内存不可访问。

有一个众所周知的 Linux 内核缺陷：即使对于具有 PROT_NONE 的内存（并在尝试从进程本身读取它时导致 SIGSEGV），的尝试GDB 到 ptrace(PEEK_DATA,...) 成功。这解释了为什么您可以检查 GDB 中源缓冲区的 256 个字节，尽管实际上只有 96 个字节是可访问的。

尝试在 Valgrind 下运行您的程序，它很可能会告诉您，您正在 memcpy 进入堆分配的缓冲区，该缓冲区太小。