格式字符串漏洞 - printf
为什么会打印0x08480110处内存地址的值?我不确定为什么有 5 %08x 参数 - 这会将您带到堆栈的哪里?
address = 0x08480110
address (encoded as 32 bit le string): "\x10\x01\x48\x08"
printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");
此示例取自本文第 11 页 http://crypto.stanford.edu /cs155/papers/formatstring-1.2.pdf
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为本文提供的
printf()
示例有点令人困惑,因为这些示例使用字符串文字作为格式字符串,而这些示例通常不允许所描述的漏洞类型。此处描述的格式字符串漏洞取决于用户输入提供的格式字符串。因此,示例:
最好呈现为:
由于
outstring
数组是自动数组,因此编译器可能会将其放入堆栈中。将用户输入复制到outstring
数组后,堆栈上的“单词”将如下所示(假设为小端):编译器会将其他项目放入堆栈中,因为它认为合适(其他局部变量,保存的寄存器,等等)。
当即将进行 printf() 调用时,堆栈可能如下所示:
请注意,我完全制作了这些条目 - 每个编译器都会以不同的方式使用堆栈(因此格式字符串漏洞必须针对特定的具体场景进行定制,换句话说,您不会总是像本示例中那样使用 5 个虚拟格式说明符 - 作为攻击者,您需要弄清楚特定漏洞需要多少个虚拟格式说明符。
现在要打电话
printf()
,参数(outstring
的地址)被压入堆栈并调用printf()
,因此参数区域堆栈的看起来像:然而, printf 并不真正知道有多少参数已经被放置在堆栈上 - 它通过在格式字符串中找到的格式说明符(它“肯定”得到的一个参数) )。 所以printf() 获取格式字符串参数并开始处理它,当它到达与我的示例中的“已保存的 EDI”相对应的第一个“%08x”时,下一个“%08x”将被处理。打印
保存了ECX'等。因此,“%08x”格式说明符只是消耗堆栈上的数据,直到它返回到攻击者能够输入的字符串。攻击者可以通过一种反复试验来确定需要多少个格式(可能是通过测试运行一系列“%08x”格式,直到他可以“看到”格式字符串的开始位置)。
无论如何,当
printf()
开始处理“%s”格式说明符时,它已经消耗了outstring
缓冲区所在的所有堆栈条目。 “%s”说明符将其堆栈条目视为指针,并且用户放入该缓冲区的字符串经过精心设计,具有0x08480110
的二进制表示形式,因此printf( )
将以 ASCIIZ 字符串的形式打印该地址处的所有内容。I think that the paper provides its
printf()
examples in a somewhat confusing way because the examples use string literals for format strings, and those don't generally permit the type of vulnerability being described. The format string vulnerability as described here depends on the format string being provided by user input.So the example:
Might better be presented as:
Since the
outstring
array is an automatic, the compiler will likely put it on the stack. After copying the user input to theoutstring
array, it'll look like the following as 'words' on the stack (assuming little endian):The compiler will put other items on the stack as it sees fit (other local variables, saved registers, whatever).
When the
printf()
call is about to be made, the stack might look like:Note that I'm completely making those entries up - each compiler will use the stack in different ways (so a format string vulnerability has to be custom crafted for a particular exact scenario. In other words, you won't always use 5 dummy format specifiers like in this example - as the attacker you'd need to figure out how many dummies the particular vulnerability would need.
Now to call
printf()
, the argument (the address ofoutstring
) is pushed on to the stack andprintf()
is called, so the argument area of the stack looks like:However, printf doesn't really know anything about how many arguments have been placed on the stack for it - it goes by the format specifiers it finds in the format string (the one argument it's 'sure' to get). So
printf()
gets the format string argument and starts processing it. When it gets to the 1st "%08x" that will correspond to the 'saved EDI' in my example, then next "%08x" will print thesaved ECX' and so on. So the "%08x" format specifiers are just eating up data on the stack until it gets back to the string the attacker was able to input. Determining how many of those are needed is something an attacker would do by a kind of trial and error (probably by a test run that has a whole slew of "%08x" formats until he can 'see' where the format string starts).
Anyway, when
printf()
gets to processing the "%s" format specifier, it has consumed all the stack entries up to where theoutstring
buffer resides. The "%s" specifier treats its stack entry as a pointer, and the string that the user has put into that buffer has been carefully crafted to have a binary representation of0x08480110
, soprintf()
will print out whatever is at that address as an ASCIIZ string.您有 6 个格式说明符(5 个
%08x
和一个%s
),但您没有为这些格式说明符提供值。你立即陷入了未定义行为的领域——任何事情都可能发生,而且没有错误的答案。但是,在正常的事件过程中,传递给
printf()
的值将存储在堆栈中,因此printf()
中的代码从堆栈中读取值就好像额外的值已经被传递了一样。函数返回地址也位于堆栈中。无法保证我可以看到实际会生成值 0x08480110。这种攻击很大程度上取决于特定的程序和错误的函数调用,并且您很可能会得到一个非常不同的值。示例代码很可能是在假设 32 位 Intel(小端)CPU 的情况下编写的,而不是 64 位或大端 CPU。警告,在 MacOS X 10.6.7 上使用 GCC 4.2.1 (XCode 3) 进行 32 位编译,以下代码:
产生以下结果:
调整代码片段,将其编译为完整的程序,忽略编译 可以看到,我最终从
printf()
语句中“找到”了主程序中的字符串。当我以 64 位模式编译它时,我得到了一个核心转储。两个结果都是完全正确的;程序调用未定义的行为,因此程序所做的任何操作都是有效的。如果您好奇,请搜索“鼻恶魔”以获取有关未定义行为的更多信息。并习惯于尝试解决此类问题。
这会产生另一种变体
:
您可能会识别十六进制输出中的格式字符串 - 例如,0x41 是大写 A。
该代码的 64 位输出既相似又不同:
You have 6 format specifiers (5 lots of
%08x
and one of%s
), but you do not provide values for those format specifiers. You immediately fall into the realm of undefined behaviour - anything could happen and there is no wrong answer.However, in the normal course of events, the values passed to
printf()
would have been stored on the stack, so the code inprintf()
reads values off the stack as if the extra values had been passed. The function return address is on the stack, too. There is no guarantee that I can see that the value 0x08480110 will actually be produced. This sort of attack very much depends on the the specific program and faulty function call, and you might well get a very different value. The example code is most likely written assuming a 32-bit Intel (little-endian) CPU - rather than a 64-bit or big-endian CPU.Adapting the code fragment, compiling it into a complete program, ignoring the compilation warnings, using a 32-bit compilation on MacOS X 10.6.7 with GCC 4.2.1 (XCode 3), the following code:
produces the following result:
As you can see, I eventually 'found' the string in the main program from the
printf()
statement. When I compiled it in 64-bit mode, I got a core dump instead. Both results are perfectly correct; the program invokes undefined behaviour, so anything the program does is valid. If you're curious, search for 'nasal demons' for more information on undefined behaviour.And get used to experimenting with these sorts of issues.
Another variation
This produces:
You might recognize the format string in the hex output - 0x41 is capital A, for example.
The 64-bit output from that code is both similar and different:
你误解了报纸的意思。
您链接的文本假设堆栈上的当前位置是 0x08480110(查看周围的文本)。
printf()
将从堆栈上的任何位置转储数据。格式字符串开头的
\x10\x01\x48\x08
只是将(假定的)地址打印到转储数据前面的 stdout。这些数字绝不会修改转储数据的地址。You misunderstood the paper.
The text you linked is assuming that the current position on the stack is 0x08480110 (look at the surrounding text). The
printf()
will dump data from wherever on the stack you happen to be.The
\x10\x01\x48\x08
at the beginning of the format string is merely to print the (assumed) address to stdout in front of the dumped data. In no way do these numbers modify the address from which the data is dumped.你关于“带你上堆栈”的说法是正确的,但只是勉强正确;它依赖于参数在堆栈上传递而不是在寄存器中传递的假设。 (对于可变参数函数来说,这可能是一个安全的假设,但仍然是关于实现细节的假设。 )
每个
%08x
要求以十六进制打印“下一个unsigned int
参数”; “下一个参数”位置实际发生的情况取决于体系结构和编译器。如果您将流程中获得的值与/proc/self/maps
进行比较,您也许能够缩小某些数字的含义范围。You're correct about "take you up the stack", but only barely; it relies on the assumption that arguments are passed on the stack, rather than in registers. (Which, for a variadic function is probably a safe assumption, but still an assumption about implementation details.)
Each
%08x
asks for the 'nextunsigned int
argument' to be printed in hex; what actually occurs in that 'next argument' location is both architecture and compiler dependent. If you compare the values you get with/proc/self/maps
for the process, you might be able to narrow down what some of the numbers mean.一点理论
如果您想了解在自定义地址写入的实际技巧,请跳至第二部分。
让我们尝试在
printf()
技巧中调整格式字符串。但是直接将十六进制地址编码为格式字符串是行不通的。重点是伪装一些地址,这些地址将被利用来攻击堆栈,但我的格式字符串“ABABABAB”以 .rodata 部分结束,而不是像我们想要的那样在堆栈中结束。
当在进程内存映射中查找该地址时
它可能是 .rodata 部分:
并检查 readelf:
到目前为止还可以,但奇怪的是当我转储堆栈并期望在堆栈帧中找到 ABABABAB 字符串地址作为传递给 printf() 的参数时。
您可以看到 main() 0x555555555165 的返回地址,并期望在地址 0x7fffffffdde0 处的堆栈上找到格式字符串地址
但是,当我们转储堆栈而不是格式字符串地址时,在 __libc_start_call_main() 堆栈帧返回地址和 printf() 堆栈之间,函数参数应该是 8 个字节的零帧返回地址:
那么格式字符串的地址是如何传递给prIntf()的呢?
当我们转储寄存器时,我们在rsi寄存器中看到了格式字符串地址。
因为函数参数(在本例中为字符串地址)出于速度目的将在 rsi 和 rdi 寄存器中传递,而不是在堆栈中传递,所以我们不能使用格式字符串和字符串参数这招。
因此,我们可以使用创建为本地(自动)变量的字符串来将其放入堆栈中,在当前堆栈帧的返回地址之前。
实际示例
无论如何,我尝试了这个小示例,它起作用了,打印出放入本地字符串中的地址(在堆栈上创建)。因此,我们可以使用这个技巧使本地字符串模仿我们想要访问的地址:
我们必须打印 5 个随机值,直到达到我们想要的本地字符串!
使用十六进制格式 %x 在堆栈上显示字符串 avro、nana、loli 的十六进制表示(使用 %s 字符串格式会导致分段错误,因为 printf () 会将这些值解释为字符串的地址,但这些“地址”可能不在进程的映射区域或受保护的内存区域中):
所以现在我们使用堆栈上的局部变量来“伪装”为数据使用权。
但是如果我们可以用它来尝试在该地址上写入呢?
让我们将最后一个 %X 格式说明符更改为 %n。
我们将使用该数据作为变量的地址,其中 printf() 存储已打印的字符数,而不是使用 %X 在堆栈上打印数据内容。
所以想法是获得对自定义地址的写访问权限。
我们的假地址 0x61616161616161 表示为 ASCII“aaaaaaa”,以 %rax 寄存器结尾,printf 将在此写入
地址已打印的字符数(存储在 r12 中):
但在我们的例子中,这将使用 SEGV 分段错误,因为地址 0x61616161616161 未映射到进程内存中。
我希望这有帮助!
A little theory
If you want to see actual trick to write at custom address jump to second part.
Lets try tweaking format string in
printf()
trick.But encoding a HEX address into a format string directly was not working. WHole point is masquerading some address which would be exploited for attack into stack, but my format string "ABABABAB" ended in .rodata section and nor in Stack as we wanted to.
When this address is looked for in process memory map
it is probably .rodata section:
and check with readelf:
So far OK, but weird part is when I dumped stack and expected to find ABABABAB string address in stack frame as argument passed to printf().
you can see return address to main() 0x555555555165, and expect to find format string address on stack at address 0x7fffffffdde0
But when we dump stack instead of format string address there is just 8 bytes of zeros where function argument should be, between __libc_start_call_main() stack frame return address and printf() stack frame return address:
So how is address of format string passed to prIntf()?
When we dumped registers we saw format string address in rsi register.
Because function arguments (string address in this case) will be passed in rsi and rdi registers for purpose of speed and not in the stack we cant use format string and string arguments for this trick.
So we can just use strings created as local (automatic) variables to be put in stack, before return address in current stack frame.
Actual example
Anyway I tried this small example and it worked, printed out addresses put in local strings (created on stack). So we could use this trick to make local strings mimic addresses we want to access:
We have to print 5 random values until we reached what we wanted, our local strings!
Using hexadecimal format %x showed HEX representation of strings avro, nana, loli on stack (using %s string format would cause segmentation fault because printf() would interpret those values as addresses of strings but those "addresses" are probably not in mapped area of the process or are in protected memory area):
So now we used local variables on stack to "masquerade" as data access.
But what if we can use this to try to write on that address?
Lets change last %X format specifier to %n.
Instead of printing content of data on stack with %X, we will use this data as address of variable where printf() stores number of characters already printed.
So idea is to gain write access to custom address.
Our FAKE address 0x61616161616161 represented as ASCII "aaaaaaa" ends in %rax register, and printf will write at this
address number of characters already printed (stored in r12):
But in our case this will use SEGV segmentation fault since address 0x61616161616161 is not mapped into process memory.
I hope this helps!