变质发生器

发布于 2024-08-15 00:20:14 字数 73 浏览 9 评论 0原文

我正在尝试寻找有关变质发生器不同设计的参考资料,有人可以为我指出正确的方向吗?我浏览了 ACM 中的一些论文,但找不到我想要的内容。

I am trying to find references about different designs of metamorphic generators can someone point me to the right direction. I have gone through some papers in ACM but couldn't find what I am looking for.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

情仇皆在手 2024-08-22 00:20:14

如果你提到变质引擎,不幸的是我不知道有什么好的参考资料。我认为这源于由于病毒编写者通常如何使用该主题,该主题仍然是禁忌。我认为这是不合理的,因为这项技术本身就很有趣。我一直对自我修改和自我修复系统着迷。也可以说它与人工智能领域有一些联系。

对于不知情的人来说,变质引擎是一个可执行文件,它改变了本身的每个字节和指令,因此虽然新文件内容与上一代相比是唯一的,但整体算法是相同的。当病毒首次使用该技术时,防病毒软件供应商在识别病毒方面遇到了很大的困难,因为当每一代病毒都不同时,简单地通过签名识别病毒是无效的。多态性和变态病毒的引入标志着防病毒软件从特征码识别转向启发式识别的时代。也就是说,您不必查看确切的代码或字节流,而是尝试推断代码的作用。

在实现这样的事情时,人们会遇到几个问题,这取决于所使用的可执行格式和 CPU 架构:
某些 RISC 架构无法保存完整的 32 位立即数,因此代码段将不可避免地保存立即数的数据池,这是通过双重查找来获取的。这是一个严重的问题,因为您需要一种方法来明确地将代码与数据分开。也就是说,某些数据值可以合法地表示为代码,反之亦然。
如果您的程序链接到动态库(例如 C 运行时),您还需要重新计算重定位使用的信息,这并非易事。
最大的问题是,对于每一代人来说,此类项目的规模往往呈指数级增长。如果最初的“简化器”算法(如下所述)效果不佳,则会添加越来越多的垃圾代码。 “糟糕的工作”意味着它无法将代码完美地简化回原来的样子。上一代的任何额外“膨胀”都会累积。

一般技术的工作原理如下:
应用程序必须读取自身,并解析可执行格式(ELF、COFF、a.out、PE)。然后对于每组N条指令,它尝试简化算法。例如,添加值 X,然后减去值 X 实际上是空操作,可以忽略。 a*b+a*c可以简化为a*(b+c),节省一条指令。因此,这个简化器找到了整个算法的骨架,因为它之前经历了变质。
接下来,您可以通过相反的操作再次混淆代码。取 N 条指令并将它们替换为执行相同操作的其他指令。其他阶段包括将数据立即数拆分为几个部分、混淆字符串并将代码拆分为几个新函数以及移动代码。所有这一切都是在跟踪代码和数据引用的同时完成的。最后,代码被组装并链接回其作为可执行文件的形式。

这是令人难以置信的复杂。仅适用于真正的核心汇编编码人员。你已被警告过。

If you refer to metamorphic engines, I unfortunately don't know about any good references. I think this stems from the subject still being taboo due to how it's usually used by virus writers. I think this is unjustified though, as the technique is interesting on its own merit. I've always been fascinated by self-modifying and self-repairing systems. And one could also say it is slightly related to the AI-field.

For the uninformed, a metamorphic engine is an executable file which changes every byte and instruction in itself such that while the new file content is unique compared to the previous generation, the overall algorithm is the same. Anti-virus software vendors had major trouble identifying viruses when the technique was first used by viruses, as simply identifying viruses by signature wasn't effective when each generation was different. The introduction of polymorphic and metamorphic viruses marked the era where anti-virus software switched from identification by signatures to heuristics. That is, instead of looking at the exact code or byte stream, you rather try to deduce what the code does.

One will run into several problems when implementing such a thing, which depend on the executable format used, and the CPU architecture:
Some RISC architectures can't hold full 32-bit immediates, so the code segment will inevitably hold data pools for immediates, which is fetched with a double lookup. That is a serious show stopper, because you need a way to separate code from data unambiguously. That is, some data values can be legally represented as code, and vice-versa.
If your program links against dynamic libraries like say, the C runtime, you also need to recalculate the information used by relocation, which is non-trivial.
And the biggest problem is that such programs tend to grow exponentially in size for each new generation. If the inital "simplifier" algorithm (described below) does a poor job, more and more garbage code is added. And "poor job" kind of means that it does not manage to simplify the code back to its original flawlessly. Any extra 'bloat' from the previous generation accumulates.

The general technique works as follows:
The application has to read itself, and parse the executable format (ELF, COFF, a.out, PE). Then for each group of N instructions, it tries to simplify the algorithm. For example, an addition of value X followed by a subtraction by value X is effectively a noop and can be ignored. a*b+a*c can be simplified to a*(b+c), saving one instruction. So this simplifier finds the bare skeleton of the overall algorithm, since it previously went through metamorphism.
Following that, you obfuscate the code again by doing the reverse. Take N instructions and replace them with something else which does the same thing. Other stages involves splitting up data immediates into several parts, obfuscating strings and splitting up code into several new functions, and moving the code around. All this is done while keeping track of code and data references. Then finally, the code is assembled and linked back to its form as an executable file.

It's mind-bogglingly complex. For true hardcore assembly coders only. You have been warned.

原来是傀儡 2024-08-22 00:20:14

寻找 virii 作者编写的引擎:

1) z0mbie
2) mental driller (metaph0r)
3) vecna

同时在 google 上搜索“Project Bukowski”

look for engines written by virii writers:

1) z0mbie
2) mental driller (metaph0r)
3) vecna

also search on google for "Project Bukowski"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文