ebp + JIT 编译器中的 6 代替 +8
我正在虚拟机中实现一个简单的 JIT 编译器,我编写它是为了好玩(主要是为了了解更多有关语言设计的知识),并且我遇到了一些奇怪的行为,也许有人可以告诉我原因。
首先,我为 C 和 C++ 定义了一个 JIT“原型”:
#ifdef __cplusplus
typedef void* (*_JIT_METHOD) (...);
#else
typedef (*_JIT_METHOD) ();
#endif
我有一个compile() 函数,它将把东西编译成 ASM 并将其粘贴在内存中的某个位置:
void* compile (void* something)
{
// grab some memory
unsigned char* buffer = (unsigned char*) malloc (1024);
// xor eax, eax
// inc eax
// inc eax
// inc eax
// ret -> eax should be 3
/* WORKS!
buffer[0] = 0x67;
buffer[1] = 0x31;
buffer[2] = 0xC0;
buffer[3] = 0x67;
buffer[4] = 0x40;
buffer[5] = 0x67;
buffer[6] = 0x40;
buffer[7] = 0x67;
buffer[8] = 0x40;
buffer[9] = 0xC3; */
// xor eax, eax
// mov eax, 9
// ret 4 -> eax should be 9
/* WORKS!
buffer[0] = 0x67;
buffer[1] = 0x31;
buffer[2] = 0xC0;
buffer[3] = 0x67;
buffer[4] = 0xB8;
buffer[5] = 0x09;
buffer[6] = 0x00;
buffer[7] = 0x00;
buffer[8] = 0x00;
buffer[9] = 0xC3; */
// push ebp
// mov ebp, esp
// mov eax, [ebp + 6] ; wtf? shouldn't this be [ebp + 8]!?
// mov esp, ebp
// pop ebp
// ret -> eax should be the first value sent to the function
/* WORKS! */
buffer[0] = 0x66;
buffer[1] = 0x55;
buffer[2] = 0x66;
buffer[3] = 0x89;
buffer[4] = 0xE5;
buffer[5] = 0x66;
buffer[6] = 0x66;
buffer[7] = 0x8B;
buffer[8] = 0x45;
buffer[9] = 0x06;
buffer[10] = 0x66;
buffer[11] = 0x89;
buffer[12] = 0xEC;
buffer[13] = 0x66;
buffer[14] = 0x5D;
buffer[15] = 0xC3;
// mov eax, 5
// add eax, ecx
// ret -> eax should be 50
/* WORKS!
buffer[0] = 0x67;
buffer[1] = 0xB8;
buffer[2] = 0x05;
buffer[3] = 0x00;
buffer[4] = 0x00;
buffer[5] = 0x00;
buffer[6] = 0x66;
buffer[7] = 0x01;
buffer[8] = 0xC8;
buffer[9] = 0xC3; */
return buffer;
}
最后,我有了主要的块程序:
int main (int argc, char **args)
{
DWORD oldProtect = (DWORD) NULL;
int i = 667, j = 1, k = 5, l = 0;
// generate some arbitrary function
_JIT_METHOD someFunc = (_JIT_METHOD) compile(NULL);
// windows only
#if defined _WIN64 || defined _WIN32
// set memory permissions and flush CPU code cache
VirtualProtect(someFunc,1024,PAGE_EXECUTE_READWRITE, &oldProtect);
FlushInstructionCache(GetCurrentProcess(), someFunc, 1024);
#endif
// this asm just for some debugging/testing purposes
__asm mov ecx, i
// run compiled function (from wherever *someFunc is pointing to)
l = (int)someFunc(i, k);
// did it work?
printf("result: %d", l);
free (someFunc);
_getch();
return 0;
}
如您所见,我对compile()函数进行了一些测试,以确保获得预期的结果,几乎一切正常,但我有一个问题......
在大多数教程中或文档资源,要获取传递的函数的第一个值(在整数的情况下),您需要执行 [ebp+8]
、第二个 [ebp+12]
和等等。由于某种原因,我必须执行 [ebp+6]
然后 [ebp+10]
等等。谁能告诉我为什么?
I'm implementing a simplistic JIT compiler in a VM I'm writing for fun (mostly to learn more about language design) and I'm getting some weird behavior, maybe someone can tell me why.
First I define a JIT "prototype" both for C and C++:
#ifdef __cplusplus
typedef void* (*_JIT_METHOD) (...);
#else
typedef (*_JIT_METHOD) ();
#endif
I have a compile()
function that will compile stuff into ASM and stick it somewhere in memory:
void* compile (void* something)
{
// grab some memory
unsigned char* buffer = (unsigned char*) malloc (1024);
// xor eax, eax
// inc eax
// inc eax
// inc eax
// ret -> eax should be 3
/* WORKS!
buffer[0] = 0x67;
buffer[1] = 0x31;
buffer[2] = 0xC0;
buffer[3] = 0x67;
buffer[4] = 0x40;
buffer[5] = 0x67;
buffer[6] = 0x40;
buffer[7] = 0x67;
buffer[8] = 0x40;
buffer[9] = 0xC3; */
// xor eax, eax
// mov eax, 9
// ret 4 -> eax should be 9
/* WORKS!
buffer[0] = 0x67;
buffer[1] = 0x31;
buffer[2] = 0xC0;
buffer[3] = 0x67;
buffer[4] = 0xB8;
buffer[5] = 0x09;
buffer[6] = 0x00;
buffer[7] = 0x00;
buffer[8] = 0x00;
buffer[9] = 0xC3; */
// push ebp
// mov ebp, esp
// mov eax, [ebp + 6] ; wtf? shouldn't this be [ebp + 8]!?
// mov esp, ebp
// pop ebp
// ret -> eax should be the first value sent to the function
/* WORKS! */
buffer[0] = 0x66;
buffer[1] = 0x55;
buffer[2] = 0x66;
buffer[3] = 0x89;
buffer[4] = 0xE5;
buffer[5] = 0x66;
buffer[6] = 0x66;
buffer[7] = 0x8B;
buffer[8] = 0x45;
buffer[9] = 0x06;
buffer[10] = 0x66;
buffer[11] = 0x89;
buffer[12] = 0xEC;
buffer[13] = 0x66;
buffer[14] = 0x5D;
buffer[15] = 0xC3;
// mov eax, 5
// add eax, ecx
// ret -> eax should be 50
/* WORKS!
buffer[0] = 0x67;
buffer[1] = 0xB8;
buffer[2] = 0x05;
buffer[3] = 0x00;
buffer[4] = 0x00;
buffer[5] = 0x00;
buffer[6] = 0x66;
buffer[7] = 0x01;
buffer[8] = 0xC8;
buffer[9] = 0xC3; */
return buffer;
}
And finally I have the main chunk of the program:
int main (int argc, char **args)
{
DWORD oldProtect = (DWORD) NULL;
int i = 667, j = 1, k = 5, l = 0;
// generate some arbitrary function
_JIT_METHOD someFunc = (_JIT_METHOD) compile(NULL);
// windows only
#if defined _WIN64 || defined _WIN32
// set memory permissions and flush CPU code cache
VirtualProtect(someFunc,1024,PAGE_EXECUTE_READWRITE, &oldProtect);
FlushInstructionCache(GetCurrentProcess(), someFunc, 1024);
#endif
// this asm just for some debugging/testing purposes
__asm mov ecx, i
// run compiled function (from wherever *someFunc is pointing to)
l = (int)someFunc(i, k);
// did it work?
printf("result: %d", l);
free (someFunc);
_getch();
return 0;
}
As you can see, the compile()
function has a couple of tests I ran to make sure I get expected results, and pretty much everything works but I have a question...
On most tutorials or documentation resources, to get the first value of a function passed (in the case of ints) you do [ebp+8]
, the second [ebp+12]
and so forth. For some reason, I have to do [ebp+6]
then [ebp+10]
and so forth. Could anyone tell me why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的操作码看起来很可疑:它们充满了
0x66
和0x67
地址/数据大小覆盖前缀,这些前缀(在 32 位代码段中)将转换为 32 位操作转换成16位的。例如is
而不是
(这似乎解释了观察到的行为:按下
bp
会使堆栈指针减少 2 而不是 4)。Your opcodes look suspicious: they're full of
0x66
and0x67
address/data size override prefixes, which (in a 32-bit code segment) will turn 32-bit operations into 16-bit ones. e.g.is
rather than
(which seems to explain the observed behaviour: pushing
bp
decrements the stack pointer by 2 instead of 4).您的问题是
66
和67
字节——分别是操作数大小覆盖和地址大小覆盖。由于您在 32 位模式下运行此代码,因此这些字节告诉处理器您需要 16 位操作数和地址,而不是 32 位操作数和地址。
66 55
反汇编为PUSH BP
,它只推送 2 个字节而不是 4 个字节,因此您的地址偏移了 2。67
字节前两个示例也是不必要的,但因为您只访问寄存器而不是内存,所以它们没有任何效果并且不会破坏任何东西(还)。这些字节也应该被删除。看起来您正在使用专为 16 位代码设计的框架,或者您可以通过某种方式告诉它您需要 32 位代码。
Your problem is the
66
and67
bytes -- operand size override and address size override, respectively.Since you're running this code in 32-bit mode, these bytes tell the processor that you want 16-bit operands and addresses instead of 32-bit ones. The
66 55
disassembles toPUSH BP
, which pushes only 2 bytes instead of 4, hence your addresses being off by 2.The
67
bytes in the first two examples are also unncessary, but because you're only accessing registers and not memory, they have no effect and don't break anything (yet). Those bytes should also be removed.It looks like you're using a framework designed for 16-bit code, or perhaps there's a way you can tell it you want 32-bit code.