什么是代码洞,代码洞有合法用途吗?
我第一次遇到这个词是在 StackOverflow 问题“C# 理论:在 asm 中将 JMP 写入 codecave”。 我看到根据 维基词典,代码洞穴是:
未使用的内存块,某人(通常是软件破解者)可以用来注入 自定义编程代码来修改程序的行为。
我找到正确的定义了吗? 如果是这样,代码洞是否有任何合法用途?
I encountered this word for the first time in the StackOverflow question "C# Theoretical: Write a JMP to a codecave in asm." I see that according to Wiktionary, a code cave is:
an unused block of memory that someone, typically a software cracker, can use to inject
custom programming code to modify the behavior of a program.
Did I find the correct definition? If so, is there any legitimate use for a code cave?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
人们可能希望有意创建一个代码洞,作为使用自修改代码的一部分。
当然,假设那个人是疯了。
One might wish to intentionally create a code cave as a part of using self-modifying code.
Assuming, of course, that one is insane.
我已经使用过它们,尽管直到今天我才听说过“代码洞穴”这个词。 维基词典的定义表明,代码洞是破解者在他或她试图破解的可执行文件中发现的东西。 您引用的问题并不是这样使用的。 相反,它表明代码洞穴正在使用 VirtualAllocEx 进行分配,以在目标进程中创建一个全新的内存块。 这样就无需在目标中搜索未使用的空间,并保证您有足够的空间来放置所有新代码。
最终,我认为“代码洞”只是存储运行时生成的代码的地方。 该代码不一定有任何邪恶的目的。 到那时,什么是代码洞的问题就变得完全无趣了。 有趣的部分是在运行时生成代码的原因是什么,以及有哪些技术可以确保新代码在您需要时运行。
I've used them, although I'd never heard the term code cave until today. The Wiktionary definition suggests that a code cave is something the cracker finds in the executable he or she is attempting to crack. The question you cite doesn't use it that way. Instead, it suggests the code cave is being allocated with
VirtualAllocEx
to create a brand new block of memory in the target process. That removes the need to search for unused space in the target, and it guarantees you'll have enough space to put all your new code.Ultimately, I think a "code cave" is just a place to store run-time-generated code. There doesn't have to be any nefarious purpose to that code. And at that point, the question of what a code cave is becomes entirely uninteresting. The interesting parts are what reasons there are for generating code at run time, and what techniques there are for making sure that new code gets run when you want it.
代码洞通常是由编译器为了对齐而创建的,并且通常位于大量函数之间。 结构和跳转之间也应该存在代码洞(在某些架构中),但通常数量不会很大。
您还可以搜索归零内存块,但不能保证程序不会使用它们。
我想从理论上讲,如果您丢失了源代码,您可以使用它们来修补有缺陷的程序,并且您的程序不会增加大小。
编辑
对于那些建议代码洞穴仅适用于运行时生成的代码的人:这是一个不完整的定义。 很多时候,我在“代码洞”中编写了数据结构,并更新了指向那里的指针,而且我怀疑我不是唯一这样做的人。
Code caves are usually created by compilers for alignment and are often located between functions in copious amounts. There should also be code caves between structures and jumps (in some architectures), but usually not in any significant amounts.
You also might search for a block of zeroed memory, but there's no guarantee that the program won't use them.
I suppose theoretically, if you lost your source code, you could patch your buggy program by using them, and your program wouldn't grow in size.
Edit
To those of you suggesting code caves are only for run-time generated code: that is an incomplete definition. Many times I have written a data structure in a "code cave" and updated pointers to point there, and I suspect I am not the only person to do so.
一些合法用途:无需重新启动即可修补实时操作系统二进制文件(微软就是这样做的)、为防火墙和防病毒挂钩低级操作系统功能(文件系统、网络)、在没有源代码的情况下扩展应用程序(例如抓取对 DrawText 的低级操作系统调用,以便您可以为盲人大声朗读它们)
some legitimate uses: patching live OS binaries without a reboot (MS does this), hooking low level OS functionality (filesystem, network) for firewall and antivirus, extending an application when you don't have source code (like scraping low level OS calls to DrawText so you can read them aloud for blind people)
此处的描述方式让我想起了补丁点——合法使用。
The way it's described here reminds me of patchpoints -- a legit use.
不熟悉这个术语,但热补丁机制可以使用保留空间来存储代码补丁。 您挂钩有缺陷的函数并将其重定向到新改进的函数。 它可以即时完成,无需关闭关键设备(大型电信交换机)。
Unfamiliar with the term but hot-patching mechanisms could use reserved space to store code patches. You hook into the defective function and redirect it to the new-improved function. It can be done on-the-fly without taking down critical equipment (large telecom switches).
它可用于在运行时注入代码。 它可用于以静态语言编写自修改代码,前提是操作系统允许(NX 位未设置等)。 它有一些用途,但这不是您应该在典型的商业应用程序中考虑的事情。
It can be used to inject code at runtime. It can be used to write self-modifying code in static languages assuming that the OS lets you (NX bit not set, etc). There are uses for it, but it's not something you should be thinking about in your typical business app.
对我来说这听起来是正确的定义。
至于合法使用,我想说的是:除非你只是为了实验而实验,并且愿意接受后果,否则不要这样做。
这种类型的东西永远不应该进入生产代码:
That sounds like the correct definition to me.
As for a legitimate use, let me say this: Don't do it unless you are simply experimenting for the sake of experimenting, and are willing to accept the consequences.
There is no way that this type of thing should ever go into production code:
自修改代码不应被轻视,但有时可以带来巨大的性能提升。 如果您已经编程很长时间,您可能已经在没有意识到的情况下使用了它。
在 486 及更高版本广泛使用之前,许多 PC 不包含硬件浮动支持。 这让编写涉及浮点的程序的人们陷入了困境。 如果他们将程序编译为使用内联浮点指令,那么它将在具有浮点处理器的机器上运行得很快,而在没有浮点处理器的机器上则根本无法运行。 如果他们使用软件浮点模拟编译程序,它将在所有机器上运行,但即使在具有硬件浮点的机器上运行速度也很慢。
许多编译器库使用了一种有趣的自修改代码技巧。 默认行为是在需要浮点运算的地方放置陷阱指令。 陷阱处理程序将模拟软件中的指令,或者如果它检测到它正在具有浮点硬件的机器上运行,它将通过用适当的硬件浮点指令替换陷阱指令来修改代码并执行它。 结果是软件可以在所有机器上运行,并且在具有浮点硬件的机器上运行速度几乎与代码被编译为直接使用浮点硬件一样快(因为大多数浮点密集型操作发生在执行多次的循环中) )。
Self-modifying code should not be considered lightly, but can sometimes bring big performance gains. If you've been programming for very long, you've probably used it without realizing it.
Prior to the widespread use of the 486 and higher, many PCs did not include hardware floating support. This left people writing programs involving floating point with a dilemma. If they compiled their program to use in-line floating point instructions it would run fast on a machine with a floating point processor, and not at all on machines without one. If they compiled their program with software floating point emulation, it would run on all machines, but slowly even on machines with hardware floating point.
Many compilers libraries used an interesting trick with self-modifying code. The default behavior was to put a trap instruction where a floating point operation was needed. The trap handler would either emulate the instruction in software, or if it detected it was running on a machine with floating point hardware, it would modify the code by replacing the trap instruction with the appropriate hardware floating point instruction and execute it. The result was software that ran on all machines, and ran almost as fast on a machine with floating point hardware as if the code had been compiled to use floating point hardware directly (since most floating point intensive operations occur in loops that are executed many times).