没有 %gs 寄存器的 x86 子集:使用 %gs 而不是捕获模拟的二进制修补代码?
由于过于复杂的原因无法在这里解释,我需要在 x86 子集的平台上运行 x86 GCC 编译的 Linux 程序。该平台没有%gs寄存器, 这意味着它必须被模拟,因为 GCC 依赖于 %gs 寄存器的存在。
目前我有一个包装器,当程序尝试访问 %gs 寄存器时捕获异常并模拟它。但这太慢了。有没有一种方法可以让我用等效的指令提前修补 ELF 中的操作码,从而避免陷阱和模拟?
For reasons too complicated to explain here, I have the need to run a x86 GCC-compiled Linux program on a platform that is a subset of x86. This platform does not have the %gs register,
which means it has to be emulated, because GCC relies on the presence of the %gs register.
Currently I have a wrapper which catches the exceptions when the program attempts to access the %gs register, and emulates it. But this is dog slow. Is there a way that I can patch the opcodes in the ELF ahead of time with equivalent instructions, so that the trap-and-emulate is avoided?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Have you tried compiling your code with the
-mno-tls-direct-seg-refs
option? From my GCC man page (i686-apple-darwin10-gcc-4.2.1):continue
(This is assuming Adam Rosenfields solution is not applicable. It, or a similar approach, is probably a better way to solve it.)
You haven't stated how you're emulating the %gs register, but it's probably going to be tough to patch every usage in general unless you have some special knowledge about the program, because otherwise you only have 2 bytes (in the worst, common case) you can modify with your patch. Of course, if you're using something like %es = %gs it should be relatively straight forward.
Assuming this can somehow be made to work in your case the strategy is to scan the executable sections of the ELF-file and patch any instruction that uses or modifies the GS register. That is at least the following instructions:
65
expect for branch instructions in which case the prefix indicates something else)push gs
(0F A8
)pop gs
(0F A9
)mov r/m16, gs
(8C /r
)mov gs, r/m16
(8E /r
)mov gs, r/m64
(REX.W 8E /r
) (If you support 64-bit mode)And any others instructions that allow segment registers (I don't think that are that many more, but I'm not 100% sure).
This is all comming from Intel® 64 and IA-32 Architectures Software Developer's Manual Combined Volumes 2A and 2B: Instruction Set Reference, A-Z. Be aware that the instructions are sometimes prefixed with other prefixes, sometimes not, so you should probably use a library to do the instruction decoding rather than blindly searching for byte sequences.
Some of the above instructions should be relatively straight forward to turn into
call my_patch
or similar, but you're probably going to have trouble finding something that fits in two bytes and works in general.int XX
(CD XX
) might be a good candidate if you can setup an interrupt vector, but I'm not sure it's gonna be faster than the method you're currently using. You will of course need to record which instruction was patched out and have the interrupt handler (or whatever) react differently depending on the return address (that your handler receives).You might be able to setup a trampoline if you can find room within -128..127 bytes and use
JMP rel8
(EB cb
) to jump to the trampoline (usually anotherJMP
, but this time with more room for the target address), which then handles the instruction emulation and jumps back to the instruction following the patched out %gs usage.Lastly I'd recommend keeping the trap-and-emulate code running to catch any cases you might not have thought off (self-modifying or injected code for instance). This way you can also log any unhandled cases and add them to your solution.