FPU 指令崩溃且 asm 代码不起作用
抱歉我的英语很差,
我正在努力提高我的 ASM 能力,我发现很容易 使用机器代码例程进行处理的入口点 从c代码中,
我以这种方式使用它
char asmRoutineData2[] =
{
0xC8, 0x00, 0x00, 0x00, // enter 0, 0
0xB8, 0xff, 0x00 ,0x00 ,0x00, // mov eax, 65538
0xC9, // leave
0xc3 // ret
};
int (*asmRoutine)(void) = (int (*)(void)) asmRoutineData;
int ret = asmRoutine();
,它对于某些例程非常有效 - 例如上面的
一些其他例程不起作用:
1)我遇到了麻烦,我无法获取由堆栈
这样的过程
char asmRoutine_body[] =
{
0xC8, 0x00, 0x00, 0x00, //enter
0x8B, 0x45, 0x08, // mov eax, [ebp+8]
0xC9, //leave
0xC3
};
传递的值,并且
int ( *asmRoutine)(int, int, int) = ( int (*)(int, int, int)) asmRoutine_body;
int ret = asmRoutine(77,66,55);
应该尽可能工作知道,但不知道,
我在 kompiler 生成的 asm 中查找,它似乎是正确的,
mov eax,offset _asmRoutineData
push 55
push 66
push 77
call eax
add esp,12
_asmRoutineData label byte
db 200 //enter
db 0
db 0
db 0
db 139 // mov eax, dword [ebp+8H] ; 8B. 45, 08
db 69
db 8
db 201 //leave
db 195 //ret
不知道出了什么问题(返回除我预期的 77 之外的其他值(对于 ebp+12 ebp+16,返回 66 或 55)
2)第二个麻烦是这种调用机器代码的方式 适用于我的算术指令,但它会使应用程序崩溃 (系统异常的某种方式)在fpu或sse指令上
为什么?以及我应该做什么才能让它为我工作(我喜欢写汇编 例程这样)
fir
//编辑
这是 sse 例程,应该获取 float4* 向量 a 和 b 进行点积并将结果放入 float4* c 中 (float4是一个由4个浮点数组成的结构体或表)
(奇怪的是它应该只获取两个向量并返回一个浮点数 通过 eax 但我可能从互联网上得到了但没有时间 测试并重写它)
/*
enter 0, 0 ; 0034 _ C8, 0000, 00
mov eax, dword [ebp+8H] ; 0038 _ 8B. 45, 08
mov ebx, dword [ebp+0CH] ; 003B _ 8B. 5D, 0C
mov ecx, dword [ebp+10H] ; 003E _ 8B. 4D, 10
movups xmm0, oword [eax] ; 0041 _ 0F 10. 00
movups xmm1, oword [ebx] ; 0044 _ 0F 10. 0B
mulps xmm0, xmm1 ; 0047 _ 0F 59. C1
movhlps xmm1, xmm0 ; 004A _ 0F 12. C8
addps xmm1, xmm0 ; 004D _ 0F 58. C8
movaps xmm0, xmm1 ; 0050 _ 0F 28. C1
shufps xmm1, xmm1, 1 ; 0053 _ 0F C6. C9, 01
addss xmm0, xmm1 ; 0057 _ F3: 0F 58. C1
movss dword [ecx], xmm0 ; 005B _ F3: 0F 11. 01
leave ; 005F _ C9
ret ; 0060 _ C3
*/
char asmDot_body[] =
{
0xC8, 0x00, 0x00, 0x00,
0x8B, 0x45, 0x08,
0x8B, 0x5D, 0x0C,
0x8B, 0x4D, 0x10,
0x0F, 0x10, 0x00,
0x0F, 0x10, 0x0B,
0x0F, 0x59, 0xC1,
0x0F, 0x12, 0xC8,
0x0F, 0x58, 0xC8,
0x0F, 0x28, 0xC1,
0x0F, 0xC6, 0xC9, 0x01,
0xF3, 0x0F, 0x58, 0xC1,
0xF3, 0x0F, 0x11, 0x01,
0xC9,
0xC3
};
void (*asmAddSSE)(float4*, float4*, float4*) = (void (*)(float4*, float4*, float4*)) asmDot_body;
float4 a = {1,2,1,0};
float4 b = {1,2,3,0};
float4 c = {0,0,0,0};
asmAddSSE(&a,&b,&c);
//编辑L8R
找到它!它工作起来非常酷&伟大的 (传递参数以及 fpu 甚至 sse) 我很高兴
tnx necrolis 声明它可以在您的系统上运行,
我开始尝试使用编译器开关来设置对齐方式和 还禁用一些,它是 -pr (使用 fastcall ) 启用,我应该将其关闭
(有两个compile.bat - 一个用于正常编译, 第二个用于 olso 生成程序集并且没有 -pr 开关 其次,我在上面写的 asm 代码没问题 - 但我的正常情况 compile.bat 生成的 fastcall 调用 ant 它变得很糟糕!)
sorry for my weak english
im trying to improve my asm abilities and i have found easy
entry point to working on it by using machine code routines
from c code
i am using it in such way
char asmRoutineData2[] =
{
0xC8, 0x00, 0x00, 0x00, // enter 0, 0
0xB8, 0xff, 0x00 ,0x00 ,0x00, // mov eax, 65538
0xC9, // leave
0xc3 // ret
};
int (*asmRoutine)(void) = (int (*)(void)) asmRoutineData;
int ret = asmRoutine();
and it works pretty excellent for some routines - such as above
some other do not work:
1)i got trouble and I cannot obtain value passed by stack
such procedure
char asmRoutine_body[] =
{
0xC8, 0x00, 0x00, 0x00, //enter
0x8B, 0x45, 0x08, // mov eax, [ebp+8]
0xC9, //leave
0xC3
};
and
int ( *asmRoutine)(int, int, int) = ( int (*)(int, int, int)) asmRoutine_body;
int ret = asmRoutine(77,66,55);
should work as far as i know but it does not
i looked up in asm generated by kompiler and it seem to be correct
mov eax,offset _asmRoutineData
push 55
push 66
push 77
call eax
add esp,12
_asmRoutineData label byte
db 200 //enter
db 0
db 0
db 0
db 139 // mov eax, dword [ebp+8H] ; 8B. 45, 08
db 69
db 8
db 201 //leave
db 195 //ret
do not know what is wrong (returns other values than my expected 77 (or 66 or 55 for ebp+12 ebp+16)
2) second trouble is that this way of calling machine code
works for arithmetic instructions form me but it crashes aplication
(some way of system exception) on fpu or sse instructions
why? and what i should do to make it work for me (i would love write assembly
routines such way)
fir
//EDIT
this is sse routine that should get a float4* vector a and b
make dot product and put result into float4* c
(float4 is a struct or table of 4 floats)
(strange couse it should anly get two vectors and return a float
by eax but i got if form internet possibly and got no moment to
test and rewrite it)
/*
enter 0, 0 ; 0034 _ C8, 0000, 00
mov eax, dword [ebp+8H] ; 0038 _ 8B. 45, 08
mov ebx, dword [ebp+0CH] ; 003B _ 8B. 5D, 0C
mov ecx, dword [ebp+10H] ; 003E _ 8B. 4D, 10
movups xmm0, oword [eax] ; 0041 _ 0F 10. 00
movups xmm1, oword [ebx] ; 0044 _ 0F 10. 0B
mulps xmm0, xmm1 ; 0047 _ 0F 59. C1
movhlps xmm1, xmm0 ; 004A _ 0F 12. C8
addps xmm1, xmm0 ; 004D _ 0F 58. C8
movaps xmm0, xmm1 ; 0050 _ 0F 28. C1
shufps xmm1, xmm1, 1 ; 0053 _ 0F C6. C9, 01
addss xmm0, xmm1 ; 0057 _ F3: 0F 58. C1
movss dword [ecx], xmm0 ; 005B _ F3: 0F 11. 01
leave ; 005F _ C9
ret ; 0060 _ C3
*/
char asmDot_body[] =
{
0xC8, 0x00, 0x00, 0x00,
0x8B, 0x45, 0x08,
0x8B, 0x5D, 0x0C,
0x8B, 0x4D, 0x10,
0x0F, 0x10, 0x00,
0x0F, 0x10, 0x0B,
0x0F, 0x59, 0xC1,
0x0F, 0x12, 0xC8,
0x0F, 0x58, 0xC8,
0x0F, 0x28, 0xC1,
0x0F, 0xC6, 0xC9, 0x01,
0xF3, 0x0F, 0x58, 0xC1,
0xF3, 0x0F, 0x11, 0x01,
0xC9,
0xC3
};
void (*asmAddSSE)(float4*, float4*, float4*) = (void (*)(float4*, float4*, float4*)) asmDot_body;
float4 a = {1,2,1,0};
float4 b = {1,2,3,0};
float4 c = {0,0,0,0};
asmAddSSE(&a,&b,&c);
//EDIT L8R
FOUND IT! and it works extremally cool & great
(passing arguments and also fpu and even sse)
Im happy
tnx necrolis for stating that it was working on yr system,
I began to try with compiler switches tu set up alignment and
also disable some and it was -pr (use fastcall ) that was
enebled and i should to turn it off
(got two compile.bat's - one for normal compilation and
second for olso generating assembly and no -pr switch in the
second so asm code i wrote abowe is okay - but my normal
compile.bat generated fastcall calls ant it goes bum!)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你的第一个问题是你假设代码是可执行的,如果幸运的话,DEP 关闭并且你可以从堆栈中执行代码,但通常(99.99% 的时间)你需要分配可执行内存来执行此操作。其次,像您所做的那样编写纯机器代码是可怕的,并且容易出现错误,如果您觉得无法使用编译器提供的内联汇编器,请使用类似 AsmJIT 代替(或任何其他内存中汇编器)。
然而,您的代码工作正常(当使用 __cdecl 调用时),一旦这些问题得到解决,它仍然不安全。 (我运行它并得到了 77 的预期结果,将其放入可执行内存中)。您可能会在修复虚拟和绝对调用/长跳转时遇到问题,这将使事情变得更加复杂。
FPU 和 SSE 指令上的崩溃很可能是对齐问题,但如果没有系统代码、程序集或正在使用的 CPU,则无法判断,在这种情况下,最好使用调试器,例如 ollydbg (这是免费)并逐步执行代码。
半修正代码:
输出:
77
Your very first problem is you assume that the code is executable, if you are lucky, DEP is off and you can execute code from your stack, but generally (99.99% of the time) you need to allocate executable memory to do this. Secondly, writing out pure machine code like you are doing is horrible, and prone to bugs, if you feel you cannot use the inline assembler provided by your compiler, use something like AsmJIT instead (or any other in-memory assembler).
Your code however works fine however (when called using
__cdecl
), when once those issues are addressed, its still unsafe though. (I ran it and got the expected result of 77, after putting it in executable memory). You will likely run into problems down the road with fixing up of virtual and absolute calls/long jumps, which will make this ever more complex.Your crashes on FPU and SSE instructions is mostly likely alignment problems, but its impossible to tell without a system code, your assembly, or what CPU you are using, and in cases like this, its best to use a debugger, such as ollydbg (which is free) and step through the code.
the semi-corrected code:
outputs:
77