识别二进制文件中的算法
你们中有人知道一种识别已编译文件中算法的技术,例如通过测试某些模式的反汇编吗?
我所掌握的罕见信息是,库中有一些(未导出的)代码可以解压缩 Byte[] 的内容,但我不知道它是如何工作的。 我有一些文件,我认为这些文件是以未知的方式压缩的,并且看起来这些文件没有任何压缩标头或预告片。 我假设没有加密,但只要我不知道如何解压缩,它对我来说毫无价值。
我的库是一个用于低容量目标的 ARM9 二进制文件。
编辑: 它是一种无损压缩,存储二进制数据或纯文本。
Does anyone of you know a technique to identify algorithms in already compiled files, e.g. by testing the disassembly for some patterns?
The rare information I have are that there is some (not exported) code in a library that decompresses the content of a Byte[], but I have no clue how that works.
I have some files which I believe to be compressed in that unknown way, and it looks as if the files come without any compression header or trailer. I assume there's no encryption, but as long as I don't know how to decompress, its worth nothing to me.
The library I have is an ARM9 binary for low capacity targets.
EDIT:
It's a lossless compression, storing binary data or plain text.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以采用几个方向,使用 IDA Pro 等工具进行静态分析,或者加载到 GDB 或模拟器中并按照这种方式跟踪代码。 他们可能对数据进行异或以隐藏算法,因为已经有许多好的无损压缩技术。
You could go a couple directions, static analysis with something like IDA Pro, or load into GDB or an emulator and follow the code that way. They may be XOR'ing the data to hide the algorithm, since there are already many good loss less compression techniques.
解压缩算法涉及紧密循环中的显着循环。 您可能首先开始寻找循环(递减寄存器,如果不为 0 则向后跳转)。
鉴于它是一个小目标,您很有可能手动解码它,尽管现在看起来很难,一旦您深入研究它,您会发现您可以自己识别各种编程结构。
您还可以考虑将其反编译为更高级的语言,这比汇编更容易,但如果您不知道它是如何编译的,那么仍然很困难。
http://www.google.com/search?q=arm%20decompiler -
亚当
Decompression algorithms involve significantly looping in tight loops. You might first start looking for loops (decrement register, jump backwards if not 0).
Given that it's a small target, you have a good chance of decoding it by hand, though it looks hard now once you dive into it you'll find that you can identify various programming structures yourself.
You might also consider decompiling it to a higher level language, which would be easier than assembly, though still hard if you don't know how it was compiled.
http://www.google.com/search?q=arm%20decompiler
-Adam
执行此操作的可靠方法是反汇编该库并读取解压缩例程的结果汇编代码(可能还可以在调试器中单步执行)以准确查看它在做什么。
但是,您也许可以查看压缩后的幻数文件并找出使用了哪种压缩。 例如,如果使用 DEFLATE 进行压缩,则前两个字节将为十六进制
78 ;9c
; 如果使用 bzip2,42 5a
; 如果使用 gzip,1f 8b
。The reliable way to do this is to disassemble the library and read the resulting assembly code for the decompression routine (and perhaps step through it in a debugger) to see exactly what it is doing.
However, you might be able to look at the magic number for the compressed file and so figure out what kind of compression was used. If it's compressed with DEFLATE, for example, the first two bytes will be hexadecimal
78 9c
; if with bzip2,42 5a
; if with gzip,1f 8b
.根据我的经验,大多数情况下文件是使用普通的旧 Deflate 进行压缩的。 您可以尝试使用 zlib 打开它们,从不同的偏移量开始以补偿自定义标头。 问题是,zlib 本身添加了自己的标头。 在 python 中(我猜其他实现也具有该功能),您可以传递给 zlib.decompress -15 作为历史缓冲区大小(即 zlib.decompress(data,-15)),这会导致它解压缩原始紧缩数据,没有 zlib 的标头。
From my experience, most of the times the files are compressed using plain old Deflate. You can try using zlib to open them, starting from different offset to compensate for custom headers. Problem is, zlib itself adds its own header. In python (and I guess other implementations has that feature as well), you can pass to zlib.decompress -15 as the history buffer size (i.e. zlib.decompress(data,-15)), which cause it to decompress raw deflated data, without zlib's headers.
通过查看程序集完成的逆向工程可能存在版权问题。 特别是,从版权的角度来看,这样做来编写解压缩程序几乎与您自己使用程序集一样糟糕。 但后者要容易得多。 因此,如果您的动机只是为了能够编写自己的解压缩实用程序,那么您最好只移植您拥有的程序集。
Reverse engineering done by viewing the assembly may have copyright issues. In particular, doing this to write a program for decompressing is almost as bad, from a copyright standpoint, as just using the assembly yourself. But the latter is much easier. So, if your motivation is just to be able to write your own decompression utility, you might be better off just porting the assembly you have.