为什么它没有给出分段违规?
下面的代码据说会产生分段冲突:
#include <stdio.h>
#include <string.h>
void function(char *str) {
char buffer[16];
strcpy(buffer,str);
}
int main() {
char large_string[256];
int i;
for( i = 0; i < 255; i++)
large_string[i] = 'A';
function(large_string);
return 1;
}
它的编译和运行如下:
gcc -Wall -Wextra hw.cpp && a.exe
但没有任何输出。
注意
如果您真正了解下面的情况,上面的代码确实会覆盖 ret 地址等。
具体来说,ret 地址将为 0x41414141
。
重要 这需要深厚的堆栈知识
The code below is said to give a segmentation violation:
#include <stdio.h>
#include <string.h>
void function(char *str) {
char buffer[16];
strcpy(buffer,str);
}
int main() {
char large_string[256];
int i;
for( i = 0; i < 255; i++)
large_string[i] = 'A';
function(large_string);
return 1;
}
It's compiled and run like this:
gcc -Wall -Wextra hw.cpp && a.exe
But there is nothing output.
NOTE
The above code indeed overwrites the ret address and so on if you really understand what's going underneath.
The ret address will be 0x41414141
to be specific.
Important
This requires profound knowledge of stack
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
正如大家所说,您的程序有未定义的行为。事实上,你的程序有比你想象的更多的错误,但在它已经未定义之后,它不会再进一步未定义。
这是我对为什么没有输出的猜测。您没有完全禁用优化。编译器发现 function() 中的代码对程序的其余部分没有任何定义的影响。编译器优化了对 function() 的调用。
As everyone says, your program has undefined behaviour. In fact your program has more bugs than you thought it did, but after it's already undefined it doesn't get any further undefined.
Here's my guess about why there was no output. You didn't completely disable optimization. The compiler saw that the code in function() doesn't have any defined effect on the rest of the program. The compiler optimized out the call to function().
事实上,长字符串很可能以 i 中的零字节终止。假设 main 中的变量按照声明的顺序排列(据我所知,这在语言规范中没有任何要求,但在实践中似乎很可能),那么 large_string 将首先出现在内存中,然后是 i 。该循环将 i 设置为 0 并计数到 255。无论 i 是大端存储还是小端存储,无论哪种方式,它都有一个零字节。因此,在遍历 large_string 时,在字节 256 或 257 处您将遇到空字节。
除此之外,我还必须研究生成的代码才能弄清楚为什么它没有失败。正如您似乎指出的那样,我希望缓冲区的副本会覆盖 strcpy 的返回地址,因此当它尝试返回时,您将进入深空的某个地方,并且很快就会在某些地方爆炸。
但正如其他人所说,“未定义”意味着“不可预测”。
Odds are that the long string is, in fact, terminated by the zero byte in i. Assuming that the variables in main are laid out in the order they are declared -- which isn't required by anything in the language spec that I know of but seems likely in practice -- then large_string would be first in memory, followed by i. The loop sets i to 0 and counts up to 255. Whether i is stored big-endian or little-endian, either way it has a zero byte in it. So in traversing large_string, at either byte 256 or 257 you'll hit a null byte.
Beyond that, I'd have to study the generated code to figure out why this didn't blow. As you seem to indicate, I'd expect that the copy to buffer would overwrite the return address from the strcpy, so when it tried to return you'd be going into deep space some where and would quickly blow up on something.
But as others say, "undefined" means "unpredictable".
您的“char buffer[16]”中可能有任何内容,包括 \0。 strcpy 复制直到找到第一个 \0 - 因此不会超出 16 个字符的边界。
There may be anything in your 'char buffer[16]', including \0. strcpy copies till it finds first \0 - thus not going above your boundary of 16 characters.
你只是运气好而已。代码没有理由必须生成分段错误(或任何其他类型的错误)。不过,这可能仍然是一个坏主意。您可能会通过增加
large_string
的大小来使其失败。You're just getting lucky. There's no reason that code has to generate a segmentation fault (or any other kind of error). It's still probably a bad idea, though. You can probably get it to fail by increasing the size of
large_string
.在您的实现中,
buffer
可能紧邻堆栈上的large_string
下方。因此,当对strcpy
的调用溢出buffer
时,它只是将大部分内容写入large_string
中,而不会造成任何特定的损害。它将至少写入 255 个字节,但是否写入更多取决于 Large_string 上面的内容(以及 Large_string 最后一个字节的未初始化值)。它似乎在造成任何损坏或段错误之前就停止了。幸运的是,对
function
的调用的返回地址没有被丢弃。它要么位于堆栈上的缓冲区下方,要么位于寄存器中,或者可能该函数是内联的,我不记得没有优化会做什么。如果您懒得检查反汇编,我也不能;-)。所以你可以毫无问题地返回和退出。说代码会出现段错误的人可能并不可靠。它会导致未定义的行为。这次的行为是不输出任何内容并退出。
[编辑:我检查了我的编译器(cygwin 上的 GCC),对于此代码,它使用标准 x86 调用约定和入口/出口代码。它确实会出现段错误。]
Probably in your implementation
buffer
is immediately belowlarge_string
on the stack. So when the call tostrcpy
overflowsbuffer
, it's just writing most of the way intolarge_string
without doing any particular damage. It will write at least 255 bytes, but whether it writes more depends what's abovelarge_string
(and the uninitialised value of the last byte of large_string). It seems to have stopped before doing any damage or segfaulting.By fluke, the return address of the call to
function
isn't being trashed. Either it's belowbuffer
on the stack or it's in a register, or maybe the function is inlined, I can't remember what no optimisation does. If you can't be bothered to check the disassembly, I can't either ;-). So you're returning and exiting without problems.Whoever said that code would give a segfault probably isn't reliable. It results in undefined behaviour. On this occasion, the behaviour was to output nothing and exit.
[Edit: I checked on my compiler (GCC on cygwin), and for this code it is using the standard x86 calling convention and entry/exit code. And it does segfault.]
您正在通过调用 gcc (而不是 g++)来编译 .cpp (c++) 程序...不确定这是否是原因,但在 Linux 系统上(由于默认的 .exe 输出,您似乎在 Windows 上运行)当尝试按照您的说明进行编译时,它会抛出以下错误:
/tmp/ccSZCCBR.o:(.eh_frame+0x12): undefined reference to `__gxx_personality_v0'
Collect2: ld 返回 1 退出状态
You're compiling a .cpp (c++) program by invoking gcc (instead of g++)... not sure if this is the cause, but on a linux system (it appears your running on windows due to the default .exe output) it throws the following error when trying to compile as you have stated:
/tmp/ccSZCCBR.o:(.eh_frame+0x12): undefined reference to `__gxx_personality_v0'
collect2: ld returned 1 exit status
它的 UB(未定义行为)。
Strcpy
可能已将更多字节复制到缓冲区指向的内存中,此时可能不会引起问题。Its UB ( undefined behavior).
Strcpy
might have copied more bytes into memory pointed by buffer and it might not cause problem at that moment.这是未定义的行为,这意味着任何事情都可能发生。该程序甚至看起来可以正常工作。
看来您只是碰巧没有覆盖(短)程序的其余部分仍然需要的任何内存部分(或者超出了程序地址空间/写保护/...),所以没有什么特别的事情发生。至少没有什么会导致任何输出。
It's undefined behavior, which means that anything can happen. The program can even appear to work correctly.
It seem that you just happen to not overwrite any parts of memory that are still needed by the rest of the (short) program (or are out of the programs address space/write protected/...), so nothing special happens. At least nothing that would lead to any output.
堆栈上的某处有一个零字节停止
strcpy()
,并且堆栈上有足够的空间不会命中受保护的页面。尝试在该函数中打印出strlen(buffer)
。无论如何,结果都是未定义的行为。养成使用
strlcpy(3)
系列的习惯的功能。There's a zero byte on the stack somewhere that stops the
strcpy()
and there's enough room on the stack not to hit protected page. Try printing outstrlen(buffer)
in that function. In any case the result is undefined behavior.Get into habit of using
strlcpy(3)
family of functions.您可以通过其他方式测试这一点:
在我的机器中,这只会在 i = 37000 左右时导致 SIGSEGV! (通过使用 gdb 检查核心进行测试)。
为了防止这些问题,请使用 malloc 调试器测试您的程序...并使用大量 malloc,因为据我所知,没有可以查看静态内存的内存调试库。示例:电围栏
现在只要
i=10 就会触发 SIGSEGV代码>,正如预期的那样。
You can test this in other ways:
In my machine, this causes SIGSEGV only at around i = 37000! (tested by inspecting the core with gdb).
To guard against these problems, test your programs using a malloc debugger... and use lots of mallocs, since there are no memory debugging libraries that I know of that can look into static memory. Example: Electric Fence
And now the SIGSEGV is triggered as soon as
i=10
, as would be expected.