使用 Objdump 的结果构建控制流图
我正在尝试构建通过调用 objdump -d 返回的程序集结果的控制流图。目前我想到的最好的方法是将结果的每一行放入一个链表中,并分离出每一行的内存地址、操作码和操作数。我根据 objdump 结果的常规性质将它们分开(内存地址是表示每行的字符串中从字符 2 到字符 7)。
完成此操作后,我将开始实际的 CFG 指令。 CFG 中的每个节点都保存起始和结束内存地址、指向前一个基本块的指针以及指向任何子基本块的指针。然后,我将检查 objdump 结果并将操作码与 x86_64 中所有控制流操作码的数组进行比较。如果操作码是控制流操作码,我将地址记录为基本块的末尾,并根据操作码添加两个子指针(条件操作码)或一个(调用或返回)。
我正在用 C 实现这个过程,看起来它会起作用,但感觉非常脆弱。有人有什么建议,或者有什么我没有考虑到的吗?
感谢您花时间阅读本文!
编辑:
这个想法是使用它来比较 DynamoRIO 生成的系统调用的堆栈跟踪与目标二进制文件的预期 CFG,我希望像这样构建它会促进这一点。我没有重新使用可用的内容,因为 A)我没有真正考虑过它,B)我需要将图形放入可用的数据结构中,以便我可以进行路径比较。我将看一下您所访问的页面上的一些实用程序,感谢您为我指明了正确的方向。感谢您的评论,我真的很感激!
I'm attempting to build a control-flow graph of the assembly results that are returned via a call to objdump -d . Currently the best method I've come up with is to put each line of the result into a linked list, and separate out the memory address, opcode, and operands for each line. I'm separating them out by relying on the regular nature of objdump results (the memory address is from character 2 to character 7 in the string that represents each line) .
Once this is done I start the actual CFG instruction. Each node in the CFG holds a starting and ending memory address, a pointer to the previous basic block, and pointers to any child basic blocks. I'm then going through the objdump results and comparing the opcode against an array of all control-flow opcodes in x86_64. If the opcode is a control-flow one, I record the address as the end of the basic block, and depending on the opcode either add two child pointers (conditional opcode) or one (call or return ) .
I'm in the process of implementing this in C, and it seems like it will work but feels very tenuous. Does anyone have any suggestions, or anything that I'm not taking into account?
Thanks for taking the time to read this!
edit:
The idea is to use it to compare stack traces of system calls generated by DynamoRIO against the expected CFG for a target binary, I'm hoping that building it like this will facilitate that. I haven't re-used what's available because A) I hadn't really though about it and B) I need to get the graph into a usable data structure so I can do path comparisons. I'm going to take a look at some of the utilities on the page you lined to, thanks for pointing me in the right direction. Thanks for your comments, I really appreciate it!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您应该使用专为程序分析而设计的 IL。有几个。
DynInst 项目 (dyninst.org) 有一个提升程序,可以将 ELF 二进制文件转换为函数/程序的 CFG(或者我上次查看时就是这样做的)。 DynInst 是用 C++ 编写的。
BinNavi 使用 IDA(交互式反汇编器)的输出来构建 IDA 识别的 IL 失控流程图。我还推荐一份 IDA 副本,它可以让您直观地抽查 CFG。一旦您在 BinNavi 中拥有了一个程序,您就可以获得其函数/CFG 的 IL 表示。
函数指针只是静态识别控制流图的麻烦的开始。跳转表(在某些情况下为 switch case 语句生成的类型,在其他情况下手动生成)也会带来麻烦。我所知道的每个代码分析框架都以非常启发式的方法处理这些问题。然后是异常和异常处理,以及自修改代码。
祝你好运!您已经从 DynamoRIO 跟踪中获取了大量信息,我建议您从该跟踪中利用尽可能多的信息......
You should use an IL that was designed for program analysis. There are a few.
The DynInst project (dyninst.org) has a lifter that can translate from ELF binaries into CFGs for functions/programs (or it did the last time I looked). DynInst is written in C++.
BinNavi uses the ouput from IDA (the Interactive Disassembler) to build an IL out of control flow graphs that IDA identifies. I would also recommend a copy of IDA, it will let you spot check CFGs visually. Once you have a program in BinNavi you can get its IL representation of a function/CFG.
Function pointers are just the start of your troubles for statically identifying the control flow graph. Jump tables (the kinds generated for switch case statements in certain cases, by hand in others) throw a wrench in as well. Every code analysis framework I know of deals with those in a very heuristics-heavy approach. Then you have exceptions and exception handling, and also self-modifying code.
Good luck! You're getting a lot of information out of the DynamoRIO trace already, I suggest you utilize as much information as you can from that trace...
我发现你的问题是因为我有兴趣寻找同样的东西。
我什么也没找到,为此编写了一个简单的 python 脚本并将其放在 github 上:
https://github.com/zestrada/playground/blob/master/objdump_cfg /objdump_to_cfg.py
请注意,我有一些启发式方法来处理永远不会返回的函数、32 位 x86 上的 gcc 堆栈保护器等...您可能想要也可能不想要这样的东西。
我对待间接调用的方式与您的做法类似(基本上,图中有一个节点,该节点是从间接返回时的源)。
希望这对任何想要在类似限制下进行类似分析的人有所帮助。
I found your question since I was interested in looking for the same thing.
I found nothing and wrote a simple python script for this and threw it on github:
https://github.com/zestrada/playground/blob/master/objdump_cfg/objdump_to_cfg.py
Note that I have some heuristics to deal with functions that never return, the gcc stack protector on 32bit x86, etc... You may or may not want such things.
I treat indirect calls similar to how you do (basically have a node in the graph that is a source when returning from an indirect).
Hopefully this is helpful for anyone looking to do similar analysis with similar restrictions.
我过去也遇到过类似的问题,为此目的编写了 asm2cfg 工具: https://github.com/卡朱/asm2cfg。该工具支持 GDB 反汇编和 objdump 输入,并将 CFG 输出为点或 pdf。
希望有人觉得这有帮助!
I was also facing a similar issue in the past and wrote asm2cfg tool for this purpose: https://github.com/Kazhuu/asm2cfg. Tool has support for GDB disassembly and objdump inputs and spits out CFG as a dot or pdf.
Hopefully someone finds this helpful!