二进制模板文件
我继承了一个使用旧文件格式来存储数据的项目,我可以访问进入该文件格式的数据以及生成的文件,但我无权访问模板,并且我需要重新创建它。
对二进制文件进行逆向工程的最佳方法是什么?我如何确定使用什么语言/加密,或者我是否需要?一旦我这样做了,获取信息的最佳程序是什么(免费,首选)?这是在 Windows 系统上进行的,但我运行的是 OpenSUSE linux box,我并不反对使用它来帮助解决该问题。
I've inherited a project that's using a legacy file format to store its data, I have access to the data going into that file format, and the resulting file, but I don't have access to the template, and I need to recreate it.
Whats the best way to go about reverse engineering the binary file? How do I figure out what language/encryption is used, or do I even need to? Once I do, whats the best program (free, preferred) to get the information out? This is on a Windows system, but I run an OpenSUSE linux box that I'm not opposed to using for help with the issue.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
大约一年前的评论是我所做的核心,即从行为上对格式进行逆向工程。将遗留程序视为黑匣子。我假设您仍然可以以某种方式运行旧系统。第一件事是将整个遗留程序变成一个可以以某种方式调用的子例程。这可能意味着编写脚本、在虚拟机、模拟和/或模拟设备等中运行它;无论什么有效。如果您还不知道如何执行此操作,请针对您的具体情况提出单独的问题。不过,我们的目标是自动化旧版软件的使用,以便您可以针对它运行探测和测试套件。
您提到可以使用加密。首先处理这个问题。强密码具有所谓的雪崩特性:更改输入的单个位会更改输出的 50% 位,相当于伪随机位翻转。您希望使用雪崩属性来 (1) 测试加密是否存在,以及 (2) 找出加密结构。例如,如果数据库一次加密一行,则在任意位置更改存储行的一位将更改加密行的平均一半位。显然,如果更改一位会改变整个文件,则与仅更改少数位(例如校验和,等)相比,您会遇到不同类型的问题。如果您有任何形式的加密,您可能需要在调试器下运行旧版本并以这种方式找出算法;这可能不值得。
正如您所看到的,所有这一切都意味着需要大量调用旧版本来探测其行为。您不想手动执行此操作;见第一段。为了解决另一个问题,您不太可能找到现成的代码来提取数据;这是一个自定义代码工作。现在您的自动化正在运行,您想要设置单元测试,调用遗留代码来查看应该发生什么。
这不是一个快速的过程,也不容易。始终将成功实现这一目标的预期成本与以其他方式获取数据的成本(包括支付手动数据输入费用)进行比较。
The comment from about a year ago is the core of what I'd do, which is to reverse engineer the format behaviorally. Treat the legacy program as a black box. I'm assuming you can still run the legacy system somehow. The very first thing is to turn the whole legacy program into a subroutine that can be called somehow. That may mean scripting, running it inside a VM, simulated and/or mocked devices, etc.; whatever works. Ask a separate question for your particular situation if you don't know how to do this already. The goal, though, is to automate the use of the legacy software so that you can run probe and test suites against it.
You mention encryption may be used. Deal with this first. Strongs ciphers have what's called the avalanche property: changing a single bit of the input changes 50% of the bits of the output, in what's tantamount to a pseudorandom bit-flip. You want to use the avalanche property to (1) test for the existence of encryption, and (2) find out the encryption structure. For example, if a database encrypted a row-at-a-time, then changing one bit of the stored row anywhere would change average-half of the bits of the encrypted row. Clearly, if changing one bit alters the whole file, you have a different kind of problem than if only a few bits change (e.g. checksums, etc.). If you have encryption in any form, you might need to run the legacy under a debugger and figure out the algorithm that way; this might not be worth it.
As you can see, all this means lots of invocations of the legacy to probe its behavior. You don't want to do this by hand; see first paragraph. To address another concern, it's unlikely you're going to find off-the-shelf code to extract the data; it's a custom code job. So now that your automation is working, you want to set up unit testing, calling the legacy code to see what should be expected.
This isn't a quick process, nor easy. Always compare the anticipated cost of succeeding at this with the cost of acquiring the data in some other way, including paying for manual data entry.