有哪些工具可以帮助解码未知的二进制数据格式?
我知道 Hex Workshop 和 010 Editor 都支持结构。 对于已知的固定格式来说,这些在有限的范围内是可以的,但对于更复杂的东西来说很难使用,特别是对于未知的格式。 我想我正在寻找脚本语言或可编写脚本的 GUI 工具的模块。
例如,我希望能够从有限的已知信息(也许是一个幻数)中找到数据块中的结构。 一旦找到一个结构,然后按照已知的长度和偏移量单词来查找其他结构。 然后在有意义的地方递归和迭代地重复这个过程。
在我的梦想中,甚至可能根据我已经告诉系统的信息自动识别可能的偏移和长度!
What tools are available to aid in decoding unknown binary data formats?
I know Hex Workshop and 010 Editor both support structures. These are okay to a limited extent for a known fixed format but get difficult to use with anything more complicated, especially for unknown formats. I guess I'm looking at a module for a scripting language or a scriptable GUI tool.
For example, I'd like to be able to find a structure within a block of data from limited known information, perhaps a magic number. Once I've found a structure, then follow known length and offset words to find other structures. Then repeat this recursively and iteratively where it makes sense.
In my dreams, perhaps even automatically identify possible offsets and lengths based on what I've already told the system!
发布评论
评论(9)
以下是我想到的一些提示:
根据我的经验,交互式脚本语言(我使用 Python)可以提供很大的帮助。 您可以编写一个简单的框架来处理二进制流和一些简单的算法。 然后您可以编写脚本来获取二进制文件并检查各种内容。 例如:
对各个部分做一些统计分析。 例如,随机数据会告诉您这部分可能是压缩/加密的。 零可能意味着部件之间的填充。 分散的零可能表示整数值或 Unicode 字符串等。 尝试发现各种偏移。 尝试将二进制文件的一部分转换为 2 或 4 字节整数或浮点数,打印它们并查看它们是否有意义。 编写一些函数来搜索数据中重复或非常相似的部分,这样您就可以轻松找到标题。
尝试找到尽可能多的字符串,尝试不同的编码(c 字符串、pascal 字符串、utf8/16 等)。 有一些很好的工具可以做到这一点(我认为 Hex Workshop 有这样的工具)。 字符串可以告诉你很多东西。
祝你好运!
Here are some tips that come to mind:
From my experience, interactive scripting languages (I use Python) can be a great help. You can write a simple framework to deal with binary streams and some simple algorithms. Then you can write scripts that will take your binary and check various things. For example:
Do some statistical analysis on various parts. Random data, for example, will tell you that this part is probably compressed/encrypted. Zeros may mean padding between parts. Scattered zeros may mean integer values or Unicode strings and so on. Try to spot various offsets. Try to convert parts of the binary into 2 or 4 byte integers or into floats, print them and see if they make sence. Write some functions that will search for repeating or very similar parts in the data, this way you can easily spot headers.
Try to find as many strings as possible, try different encodings (c strings, pascal strings, utf8/16, etc.). There are some good tools for that (I think that Hex Workshop has such a tool). Strings can tell you a lot.
Good luck!
对于 Mac OS X,有一个比我的 iBored 还要好的工具:Synalyze It!
(http://www.synaanalysis.net/)
与 iBored,它更适合非阻塞文件,同时还可以完全控制结构,包括可脚本性(使用 Lua)。 而且它也可以更好地可视化结构。
For Mac OS X, there's a great tool that's even better than my iBored: Synalyze It!
(http://www.synalysis.net/)
Compared to iBored, it is better suited for non-blocked files, while also giving full control over structures, including scriptability (with Lua). And it visualizes structures better, too.
图普尼; 据我所知,Microsoft Research 无法直接提供该工具,但有一篇关于此工具的论文可能会引起想要编写类似程序(可能是开源)的人的兴趣:
Tupni:输入格式的自动逆向工程 (@ACM 数字图书馆)
Tupni; to my knowledge not directly available out of Microsoft Research, but there is a paper about this tool which can be of interest to someone wanting to write a similar program (perhaps open source):
Tupni: Automatic Reverse Engineering of Input Formats (@ ACM digital library)
我最近发布的自己的工具“iBored”可以完成部分工作。 我编写了该工具来可视化和调试文件系统格式(UDF、HFS、ISO9660、FAT 等),并实现了搜索、复制,甚至是结构和模板支持。 结构支持非常简单,模板是动态识别结构的一种方法。
整个事情都可以用 Visual BASIC 语言进行编程,允许您测试值、读取特定块等等。
该工具是免费的,适用于所有平台(Win、Mac、Linux),但由于它是我刚刚发布给公众共享的个人工具,因此没有太多文档记录。
但是,如果您想尝试一下并提供反馈,我可能会添加更多有用的功能。
我什至愿意开源它,但由于它是用 REALbasic 编写的,我怀疑很多人会加入这样的项目。
链接:iBored 主页
My own tool "iBored", which I released just recently, can do parts of this. I wrote the tool to visualize and debug file system formats (UDF, HFS, ISO9660, FAT etc.), and implemented search, copy and later even structure and templates support. The structure support is pretty straight-forward, and the templates are a way to identify structures dynamically.
The entire thing is programmable in a Visual BASIC dialect, allowing you to test values, read specific blocks, and all.
The tool is free, works on all platforms (Win, Mac, Linux), but as it's personal tool which I just released to the public to share it, it's not much documented.
However, if you want to give it a try, and like to give feedback, I might add more useful features.
I'd even open source it, but as it's written in REALbasic, I doubt many people will join such a project.
Link: iBored home page
我仍然偶尔使用一个名为 AXE(高级十六进制编辑器)的旧十六进制编辑器。 它现在似乎已经基本上从互联网上消失了,尽管谷歌应该仍然能够为你找到它。 我知道的最后一个版本是 3.4 版,但我实际上只使用过免费个人使用的 2.1 版。
它最有趣的功能,也是我在解读各种游戏和图形格式时最常用的功能,是它的图形视图模式。 这基本上只是向您显示文件,其中每个字节都变成了颜色编码的像素。 尽管听起来很简单,但它有时使我的逆向工程尝试变得更加容易。
我认为通过眼睛进行分析与自动分析完全相反,并且图形模式对于查找和跟踪偏移量没有多大用处......
后来的版本有一些听起来可以满足您的需求的功能(脚本、规律查找器、语法生成器),但我不知道它们有多好。
I still occasionally use an old hex editor called A.X.E., Advanced Hex Editor. It seems to have largely disappeared from the Internet now, though Google should still be able to find it for you. The last version I know of was version 3.4, but I've really only used the free-for-personal-use version 2.1.
Its most interesting feature, and the one I've had the most use for deciphering various game and graphics formats, is its graphical view mode. That basically just shows you the file with each byte turned into a color-coded pixel. And as simple as that sounds, it has made my reverse-engineering attempts a lot easier at times.
I suppose doing it by eye is quite the opposite of doing automatic analysis, though, and the graphical mode won't be much use for finding and following offsets...
The later version has some features that sound like they could fit your needs (scripts, regularity finder, grammar generator), but I have no idea how good they are.
Hachoir 是一个 Python 库,用于将任何二进制格式解析为字段,然后浏览字段。 它有很多常见格式的解析器,但您也可以为您的文件编写自己的解析器(例如,当使用读取或写入二进制文件的代码时,我通常首先编写 Hachoir 解析器以进行调试)。 不过,看起来该项目现在几乎不活跃。
There is Hachoir which is a Python library for parsing any binary format into fields, and then browse the fields. It has lots of parsers for common formats, but you can also write own parsers for your files (eg. when working with code that reads or writes binary files, I usually write a Hachoir parser first to have a debugging aid). Looks like the project is pretty much inactive by now, though.
Kaitai 是一种用于描述数据流中的二进制结构的开源语言。 它配备了一个翻译器,可以输出多种编程语言的解析代码,以便包含在您自己的程序代码中。
Kaitai is an open-source language for describing binary structures in data streams. It comes with a translator that can output parsing code for many programming languages, for inclusion in your own program code.
我的项目 icebuddha.com 支持使用 python 来描述浏览器中的格式。
My project icebuddha.com supports this using python to describe the format in the browser.
我对类似问题的回答的剪切和粘贴:
一个工具是 WinOLS,旨在解释和编辑车辆发动机管理计算机二进制图像(主要是查找表中的数字数据)。 它支持各种字节序格式(我认为不是 PDP),并以各种宽度和偏移量查看数据、定义数组区域(映射)并使用各种缩放和偏移选项以 2D 或 3D 方式可视化它们。 它还具有启发式/统计自动地图查找器,可能适合您。
它是一个商业工具,但免费演示可以让您执行所有操作,但保存对二进制文件的更改并使用您不需要的引擎管理功能。
A cut'n'paste of my answer to a similar question:
One tool is WinOLS, which is designed for interpreting and editing vehicle engine managment computer binary images (mostly the numeric data in their lookup tables). It has support for various endian formats (though not PDP, I think) and viewing data at various widths and offsets, defining array areas (maps) and visualising them in 2D or 3D with all kinds of scaling and offset options. It also has a heuristic/statistical automatic map finder, which might work for you.
It's a commercial tool, but the free demo will let you do everything but save changes to the binary and use engine management features you don't need.