使用人工智能技术对文件格式进行逆向工程

发布于 2024-08-13 14:46:14 字数 256 浏览 11 评论 0原文

这是为了扩展问题:帮助逆向工程的工具二进制文件格式

是否有任何公开可用的工具可以使用集群和/或数据挖掘技术来逆向工程文件格式?

例如,使用该工具,您将拥有一组具有相同格式的文件,并且该工具的输出将是通用结构?

This is to extend the question: Tools to help reverse engineer binary file formats

Are there any tools that are publicly available that uses clustering and/or data mining techniques to reverse engineer file formats?

For example, with the tool you would have a collection of files that have the same format and the output of the tool would be the generic structure?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小镇女孩 2024-08-20 14:46:14

如果有一种真正有效的二进制编码格式(ZIP 文件就是一个例子),那么每一位的信息内容就很高。本质上,它看起来像一个完美的随机数。

如果没有额外的知识,你无法从中推断出任何东西。

从理论上讲,如果二进制编码效率不高,您就很难看到结构。但这听起来还是很难;你如何开始猜测领域的边界在哪里?

人工智能机器学习类型会告诉你,除非你已经“几乎”知道,否则你无法学到任何东西。通常,他们通过用至少你可以推理的问题标记来编码问题来成功。

我认为如果不提供更多信息,您就无法做到这一点。您了解文件格式吗?字段大小总是小于 N 位?仅对 ASCII 字符串进行编码,还是反之亦然?

If one had a truly efficient binary encoding format (ZIP files are an example), then the information content in each bit is high. Essentially, it will look like a perfect random number.

You can't infer anything from that without additional knowledge.

If the binary encoding isn't efficient, in theory, you have some faint chance of seeing structure. But this still sounds really hard; how do you even begin guessing where the boundaries of fields are?

The AI machine learning types will tell you, you can't learn anything unless you already "almost" know it. Often they succeed by encoding the the problem with problem-tokens that at least you can reason about.

I don't think you can do this without providing more information. Do you know anything about the file formats? Field sizes are always less than N bits? Only ASCII strings are encoded or vice versa?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文