我可以识别“功能”吗?在 x86 二进制文件中?
“函数”是指二进制文件的一个块(或块图),它从一个点开始(可能来自 CALL 指令之一),可能建立一个堆栈帧,并具有 RET 形式的一个或多个端点(并且根据调用约定,它也可能展开所述堆栈帧)。
我当前的想法是将各种条件分支指令视为图中的连接点,并以这种方式对代码进行广度优先搜索。这到底可行吗?如果没有,更好的方法是什么?
我的目标就是:提取函数。纯粹是为了做这件事。如果我有时间和想法的话,也许稍后会做一些奇特的事情。
"Function" meaning a chunk (or a graph of chunks) of the binary that starts at a point (likely arriving from one of the CALL instructions), possibly sets up a stack frame, and has one or more endpoints in the form of RETs (and depending on the calling convention it may also unwind said stack frame).
My current idea is to treat the various conditional branching instructions as junctions in a graph and do a Breadth-first search on the code this way. Is this viable at all? If not, what's a better approach?
My objective with this is just what it is: extract the functions. Purely for the sake of doing it. Maybe doing something fancy later if I have the time and notion.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用反汇编程序库,例如BeaEngine来执行此操作为您辛苦工作,然后搜索 call 的助记符。
You can use a disassembler library like BeaEngine to do the hard work for you and then search on resulting mnemonics for call.
如果没有符号表,我会说:几乎不可能。至少没有误报/漏报。
你首先需要的是一个反汇编程序。仅仅寻找字节组合并不能解决问题,该组合可能是某些“随机”数据的一部分。然后,跟踪 CALL 可能是最好的解决方案,因为函数不一定总是以相同的操作码序列开头。但即使是反汇编程序也可能会遇到困难,并被文本段中嵌入的数据弄糊涂。
即使您能够找到这些函数,如果没有调试符号,您也无法获得它们的名称(在编译后的程序中,不再需要名称,只需要地址)。
此外,您很难找出该函数接受什么类型的参数。例如,一个函数可能接受 2 个参数,但两者都不使用。在这种情况下,您需要进行函数调用,并在调用函数之前查看堆栈是如何准备的。
Without a symbol table I would say: almost impossible. At least without false positives/negatives.
What you need first is a disassembler. Just looking for a byte combination won't cut it, the combination might be part of some "random" data. Then, tracing the CALLs is likely the best solution as a function doesn't necessarily always start with the same opcode sequence. But even a disassembler might have a hard time and get confused by embedded data in the text segment.
Even if you were able to find the functions, you cannot get their names without debug symbols (in the compiled program there's no need for names any more, only addresses).
Also, you'd have a very hard time finding out what kind of parameters the function accepts. For example, a function might accept 2 argument but uses neither. In this case you would need a function call and look at how the stack is prepared in advance of calling the function.
你必须寻找类似的东西:
You have to look for things like: