将 ASM 转换为 C(不是逆向工程)
我用谷歌搜索,发现数量惊人的轻率回复,基本上都是在嘲笑提出这样问题的提问者。
Microchip 免费提供一些源代码(我不想将其发布在这里,以防万一。基本上,谷歌 AN937,单击第一个链接,其中有一个“源代码”链接及其压缩文件)。它在 ASM 中,当我看到它时,我开始斗鸡眼。我想将其转换为类似于 ac 类型语言的语言,以便我可以继续进行。因为诸如: 之类的行
GLOBAL _24_bit_sub
movf BARGB2,w
subwf AARGB2,f
可能非常简单,但对我来说毫无意义。
可能有一些自动 ASM 到 C 的翻译器,但我能找到的只是人们说这是不可能的。说实话,这不可能是不可能的。两种语言都有结构,并且该结构肯定可以翻译。
I googled and I see a surprising amount of flippant responses basically laughing at the asker for asking such a question.
Microchip provides some source code for free (I don't want to post it here in case that's a no-no. Basically, google AN937, click the first link and there's a link for "source code" and its a zipped file). Its in ASM and when I look at it I start to go cross-eyed. I'd like to convert it to something resembling a c type language so that I can follow along. Because lines such as:
GLOBAL _24_bit_sub
movf BARGB2,w
subwf AARGB2,f
are probably very simple but they mean nothing to me.
There may be some automated ASM to C translator out there but all I can find are people saying its impossible. Frankly, its impossible for it to be impossible. Both languages have structure and that structure surely can be translated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
将函数从 asm 转换为 C 很困难,但可以手动完成。使用反编译器转换整个程序将给您带来无法理解的代码,因为大部分结构在编译过程中丢失了。如果没有有意义的变量和函数名称,生成的 C 代码仍然很难理解。
由于重复的模式和结构,基本程序的 C 编译器的输出(尤其是未优化的输出)可以转换为 C。
It is difficult to convert a function from asm to C but doable by hand. Converting an entire program with a decompiler will give you code that can be impossible to understand since to much of the structure was lost during compilation. Without meaningful variable and function names the resultant C code is still very difficult to understand.
The output of a C compiler (especially unoptimised output) of an basic program could be translatable to C because of repeated patterns and structures.
你绝对可以用汇编程序编写 ac 程序。问题是它可能看起来不像你的想法,或者也许会。我的 PIC 生锈了,但是使用另一个汇编器,假设你有
C 语言,可以说它变得
可能更具可读性。当值从内存跳转到寄存器并返回并且寄存器被重用时,您可能会失去对变量名称的任何感觉。如果您谈论的是旧图片,其中两个寄存器一个累加器和另一个寄存器,那么实际上可能会更容易,因为变量大部分都在内存中,您查看地址,例如
Long 和 Pull out 之类的东西,但很清楚mem[0x12] = mem[0x12] + mem[0x13];
这些内存位置可能是变量,不会像具有一堆寄存器的处理器的编译 C 代码那样跳转。该图片可能会让您更轻松地找出变量,然后进行搜索并替换以在文件中命名它们。
您正在寻找的称为静态二进制翻译,不一定是从一个二进制文件到另一个二进制文件(一个处理器到另一个处理器)的翻译,但在这种情况下是从 pic 二进制文件到 C 的翻译。理想情况下,您希望采用应用笔记并使用微芯片工具将其组装成二进制文件,然后进行翻译。您也可以进行动态二进制转换,但您更不可能找到其中之一,而且它通常不会产生 C 语言,而是一个二进制文件到另一个二进制文件。有没有想过沃尔玛那些 15 美元的吃豆人和加拉加操纵杆是如何工作的?来自街机的 ROM 使用静态二进制翻译进行转换、优化和清理,并为手持设备中的新目标处理器编译了 C 或任何中间语言。我想并不是所有的都是这样完成的,但我很确定有些是这样完成的。
百万美元的问题,你能找到一个图片的静态二进制翻译器吗?谁知道呢,你可能必须自己写一个。猜猜这意味着什么,您编写一个反汇编程序,而不是反汇编为本机汇编程序语法中的指令(如 add r0,r1),而是让反汇编程序打印出 r0=r0+r1;当您完成这个反汇编程序时,您将非常了解 pic 汇编语言,因此您将不再需要 asm 到 C 的翻译器。你有一个先有鸡还是先有蛋的问题。
You can absolutely make a c program from assembler. The problem is it may not look like what you are thinking, or maybe it will. My PIC is rusty but using another assembler, say you had
In C lets say that becomes
Possibly more readable. You lose any sense of variable names perhaps as values are jumping from memory to registers and back and the registers are being reused. If you are talking about the older pics that had what two registers an accumulator and another, well it actually might be easier because variables were in memory for the most part, you look at the address, something like
Long and drawn out but it is clear that mem[0x12] = mem[0x12] + mem[0x13];
These memory locations are likely variables that will not jump around like compiled C code for a processor with a bunch of registers. The pic might make it easier to figure out the variables and then do a search and replace to name them across the file.
What you are looking for is called a static binary translation, not necessarily a translation from one binary to another (one processor to another) but in this case a translation from pic binary to C. Ideally you would want to take the assembler given in the app note and assemble it to a binary using the microchip tools, then do the translation. You can do dynamic binary translation as well but you are even less likely to find one of those and it doesnt normally result in C but one binary to another. Ever wonder how those $15 joysticks at wal-mart with pac-man and galaga work? The rom from the arcade was converted using static binary translation, optimized and cleaned up and the C or whatever intermediate language compiled for the new target processor in the handheld box. I imagine not all of them were done this way but am pretty sure some were.
The million dollar question, can you find a static binary translator for a pic? Who knows, you probably have to write one yourself. And guess what that means, you write a disassembler, and instead of disassembling to an instruction in the native assembler syntax like add r0,r1 you have your disassembler print out r0=r0+r1; By the time you finish this disassembler though you will know the pic assembly language so well that you wont need the asm to C translator. You have a chicken and egg problem.
从已编译的程序中获取完全相同的源代码基本上是不可能的。但反编译器一直是计算机科学的一个研究领域(例如dcc 反编译器,这是一个博士项目)。
有多种算法可用于对汇编代码进行模式匹配并生成等效的 C 代码,但很难以适用于所有输入的通用方式来完成此操作。
您可能想查看 Boomerang,了解通用反编译器的半新开源成果。
Getting the exact same source code back from a compiled program is basically impossible. But decompilers have been an area of research in computer science (e.g. the dcc decompiler, which was a PhD project).
There are various algorithms that can be used to do pattern matching on assembly code and generate equivalent C code, but it is very hard to do this in a general way that works well for all inputs.
You might want to check out Boomerang for a semi-recent open source effort at a generalized decompiler.
我曾经参与过一个项目,其中知识产权的重要部分是用 x86 汇编代码编码的一些重要算法。为了将代码移植到嵌入式系统,该代码的开发人员(不是我)使用了一个名为 MicroAPL 的工具(如果我没记错的话):
我对该工具的表现感到非常非常惊讶。
另一方面,我认为这是一种“如果你必须问,你就买不起”类型的东西(它们的价格范围是将项目一次性转换为大约 4 条装配线处理一美元)。
但是,通常您从供应商获得的汇编例程被打包为可以从 C 调用的函数 - 因此,只要例程执行您想要的操作(在您想要使用的处理器上),您可能只需要汇编它们并或多或少忘记它们——它们只是你从 C 调用的库函数。
I once worked a project where a significant part of the intellectual property was some serious algorithms coded up in x86 assembly code. To port the code to an embedded system, the developer of that code (not me) used a tool from an outfit called MicroAPL (if I recall correctly):
I was very, very surprised at how well the tool did.
On the other hand, I think it's one of those "if you have to ask, you can't afford it" type of things (their price ranges for a one-off conversion of a project work out to around 4 lines of assembly processed for a dollar).
But, often the assembly routines you get from a vendor are packaged as functions that can be called from C - so as long as the routines do what you want (on the processor you want to use), you might just need to assemble them and more or less forget about them - they're just library functions you call from C.
您不能确定性将汇编代码转换为C。中断、自修改代码和其他低级事物除了C 中的内联汇编之外没有任何表示形式。只有一些汇编到 C 过程的工作范围。更不用说生成的 C 代码可能比实际阅读汇编代码更难理解……除非您以此为基础开始用 C 重新实现汇编代码,否则它还是有点用的。查看 IDA 的 Hex-Rays 插件。
You can't deterministically convert assembly code to C. Interrupts, self modifying code, and other low level things have no representation other than inline assembly in C. There is only some extent to which an assembly to C process can work. Not to mention the resultant C code will probably be harder to understand than actually reading the assembly code... unless you are using this as a basis to begin reimplementation of the assembly code in C, then it is somewhat useful. Check out the Hex-Rays plugin for IDA.
是的,很有可能将汇编代码逆向工程为高质量的 C。
我在 MicroAPL 工作,该公司生产一种名为 Relogix 的工具,用于将汇编代码转换为 C。在其他帖子中提到过这一点。
请查看我们网站上的示例:
http://www.microapl.co .uk/asm2c/index.html
Yes, it's very possible to reverse-engineer assembler code to good quality C.
I work for a MicroAPL, a company which produces a tool called Relogix to convert assembler code to C. It was mentioned in one of the other posts.
Please take a look at the examples on our web site:
http://www.microapl.co.uk/asm2c/index.html
不,不是。编译丢失信息:最终目标代码中的信息比 C 源代码中的信息少。反编译器无法神奇地从无到有地创建该信息,因此真正的反编译是不可能的。
No, it's not. Compilation loses information: there is less information in the final object code than in the C source code. A decompiler cannot magically create that information from nothing, and so true decompilation is impossible.
这并非不可能,只是非常困难。熟练的汇编和 C 程序员可能可以做到这一点,或者您可以考虑使用反编译器。其中一些在将 asm 转换为 C 方面做得相当好,尽管您可能需要重命名一些变量和方法。
请访问此站点,获取可用于 x86 架构的反编译器列表。
It isn't impossible, just very hard. A skilled assembly and C programmer could probably do it, or you could look at using a Decompiler. Some of these do quite a good job of converting the asm to C, although you will probably have to rename some variables and methods.
Check out this site for a list of decompilers available for the x86 architecture.
看看这个:反编译器
Check out this: decompiler
不容易做到。
除了可读性之外,C 相对于 ASM 的一大优势是它可以防止“聪明”的编程技巧。
您可以在汇编程序中执行许多没有直接 C 等效项的操作,
。
另一个问题是数据类型,大多数汇编器本质上只有两种可互换的数据类型:字节和字 可能有一些语言结构来定义整数和浮点数
等等,但没有尝试检查内存是否按定义使用。所以将ASM存储映射到C数据类型是非常困难的。
此外,所有汇编程序存储本质上都是一个“结构”;存储按照其定义的顺序进行布局(与 C 不同,C 中存储是根据运行时的突发奇想来排序的)。许多 ASM 程序依赖于精确的存储布局 - 要在 C 中实现相同的效果,您需要将所有存储定义为单个结构的一部分。
另外还有很多被滥用的指令(在古老的 IBM 主机上,LA、加载地址、指令经常用于执行简单的算术,因为它速度更快并且不需要溢出寄存器)
虽然技术上可能可以将其转换为 C生成的 C 代码的可读性低于转换后的 ASM 代码。
Not easily possible.
One of the great advantages of C over ASM apart from readability was that it prevented "clever" programing tricks.
There are numerous things you can do in assembler that have no direct C equivalent,
or involve tortuous syntax in C.
The other problem is datatypes most assemblers essentialy have only two interchangeable datatypes: bytes and words. There may be some language constructs to define ints and floats
etc. but there is no attempt to check that the memory is used as defined. So its very difficult to map ASM storage to C data types.
In addition all assembler storage is essentially a "struct"; storage is layed out in the order it is defined (unlike C where storage is ordered at the whim of the runtime). Many ASM programs depend on the exact storage layout - to acheive the same effect in C you would need to define all storage as part of a single struct.
Also there are a lot of absused instructions ( on olde worldy IBM manframes the LA, load address, instruction was regulary used to perform simple arithimatic as it was faster and didnt need an overflow register )
While it may be technically possible to translate to C the resulting C code would be less readable than the ASM code that was transalated.
我可以 99% 保证地说,这种汇编语言没有现成的转换器,因此您需要编写一个。您可以简单地用 C 函数替换 ASM 命令来实现它:
这部分很简单:)
然后你需要实现每个功能。您可以将寄存器声明为全局变量以使事情变得简单。另外,您也可以不使用函数,而是使用#defines,如果需要,可以调用函数。这将有助于参数/结果处理。
特殊情况是 ASM 指令/标签,我认为它只能用 #defines 进行转换。
当您接触到一些特定于 CPU 的功能时,乐趣就开始了。这可以是带有堆栈操作的简单函数调用、一些特定的 IO/内存操作。更有趣的是使用程序计数器寄存器进行操作,用于计算或使用/计算滴答/延迟。
但如果这种硬核发生的话,还有另一种方法。这也太硬核了:)
有一种名为动态重新编译的技术。它在许多模拟器中使用。
您不需要重新编译 ASM,但想法几乎相同。您可以从第一步开始使用所有#define,但添加对所需功能的支持(增加 PC/Ticks)。您还需要为您的代码添加一些虚拟环境,例如内存/IO 管理器等。
祝您好运:)
I can say with 99% guarantee, there is no ready converter for this assembly language, so you need to write one. You can simply implement it replacing ASM command with C function:
This part is easy :)
Then you need to implement each function. You can declare registers as globals to make things easy. Also you can use not functions, but #defines, calling functions if needed. This will help with arguments/results processing.
Special case is ASM directives/labels, I think it can be converted with #defines only.
The fun starts when you'll reach some CPU-specific features. This can be simple function calls with stack operations, some specific IO/Memory operations. More fun are operations with Program Counter register, used for calculations, or using/counting ticks/latencies.
But there is another way, if this hardcore happens. It's hardcore too :)
There is a technique named dynamic recompilation exists. It's used in many emulators.
You don't need recompile your ASM, but the idea is almost the same. You can use all your #defines from first step, but add support of needed functionality to them (incrementing PC/Ticks). Also you need to add some virtual environment for your code, such as Memory/IO managers, etc.
Good luck :)
我认为拿起一本关于 PIC 汇编的书并学习阅读它会更容易。汇编程序通常很容易学习,因为它的级别很低。
I think it is easier to pick up a book on PIC assembly and learn to read it. Assembler is generally quite simple to learn, as it is so low level.
查看 asm2c
Check out asm2c