通过手写汇编调用本机代码
我正在尝试从托管程序集中调用本机函数。我已经在预编译库上完成了此操作,一切都很顺利。目前我正在建立自己的图书馆,但我无法完成这项工作。
本机 DLL 源代码如下:
#define DERM_SIMD_EXPORT __declspec(dllexport)
#define DERM_SIMD_API __cdecl
extern "C" {
DERM_SIMD_EXPORT void DERM_SIMD_API Matrix4x4_Multiply_SSE(float *result, float *left, float *right);
}
void DERM_SIMD_API Matrix4x4_Multiply_SSE(float *result, float *left, float *right) {
__asm {
....
}
}
此后,我们拥有加载库并从函数指针创建委托的托管代码。
public unsafe class Simd
{
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate void MatrixMultiplyDelegate(float* result, float* left, float* right);
public static MatrixMultiplyDelegate MatrixMultiply;
public static void LoadSimdExtensions()
{
string assemblyPath = "Derm.Simd.dll";
IntPtr address = GetProcAddress.GetAddress(assemblyPath, "Matrix4x4_Multiply_SSE");
if (address != IntPtr.Zero) {
MatrixMultiply = (MatrixMultiplyDelegate)Marshal.GetDelegateForFunctionPointer(address, typeof(MatrixMultiplyDelegate));
}
}
}
使用上面的源代码运行没有错误(获取了函数指针,并且实际创建了委托。
当我调用委托时,问题出现了:它被执行了(我也可以调试它!),但是在函数退出时托管应用程序引发 System.ExecutionEngineException(当它没有异常退出时)。
实际问题是函数实现:它包含带有 SSE 指令的 asm 块;我删除了asm块,代码工作完美。
我怀疑我缺少一些注册表保存/恢复程序集,但我对此完全无知,
奇怪的是,如果我将调用约定更改为 __stdcall,调试版本“似乎”是。工作,而发布版本的行为就像使用了 __cdecl 调用约定
(仅仅因为我们在这里,你能澄清一下调用约定是否重要吗?)
好的,感谢 David。 Heffernan 评论 我发现导致问题的错误指令如下:
movups result[ 0], xmm4;
movups result[16], xmm5;
movups 指令将 16 个字节移动到(未对齐的)内存中。
该函数由以下代码调用:
unsafe {
float* prodFix = (float*)prod.MatrixBuffer.AlignedBuffer.ToPointer();
float* m1Fix = (float*)m2.MatrixBuffer.AlignedBuffer.ToPointer();
float* m2Fix = (float*)m1.MatrixBuffer.AlignedBuffer.ToPointer();
if (Simd.Simd.MatrixMultiply == null) {
// ... unsafe C# code
} else {
Simd.Simd.MatrixMultiply(prodFix, m1Fix, m2Fix);
}
}
其中MatrixBuffer是我的一个类;它的成员AlignedBuffer按以下方式分配:
// Allocate unmanaged buffer
mUnmanagedBuffer = Marshal.AllocHGlobal(new IntPtr((long)(size + alignment - 1)));
// Align buffer pointer
long misalignment = mUnmanagedBuffer.ToInt64() % alignment;
if (misalignment != 0)
mAlignedBuffer = new IntPtr(mUnmanagedBuffer.ToInt64() + misalignment);
else
mAlignedBuffer = mUnmanagedBuffer;
也许错误是由Marshal.AllocHGlobal或IntPtr黑魔法引起的?
这是发现错误的最小来源:
void Matrix4x4_Multiply_SSE(float *result, float *left, float *right)
{
__asm {
movups xmm0, right[ 0];
movups result, xmm0;
}
}
int main(int argc, char *argv[])
{
float r0[16];
float m1[16], m2[16];
m1[ 0] = 1.0f; m1[ 4] = 0.0f; m1[ 8] = 0.0f; m1[12] = 0.0f;
m1[ 1] = 0.0f; m1[ 5] = 1.0f; m1[ 9] = 0.0f; m1[13] = 0.0f;
m1[ 2] = 0.0f; m1[ 6] = 0.0f; m1[10] = 1.0f; m1[14] = 0.0f;
m1[ 3] = 0.0f; m1[ 7] = 0.0f; m1[11] = 0.0f; m1[15] = 1.0f;
m2[ 0] = 1.0f; m2[ 4] = 0.0f; m2[ 8] = 0.0f; m2[12] = 0.0f;
m2[ 1] = 0.0f; m2[ 5] = 1.0f; m2[ 9] = 0.0f; m2[13] = 0.0f;
m2[ 2] = 0.0f; m2[ 6] = 0.0f; m2[10] = 1.0f; m2[14] = 0.0f;
m2[ 3] = 0.0f; m2[ 7] = 0.0f; m2[11] = 0.0f; m2[15] = 1.0f;
r0[ 0] = 0.0f; r0[ 4] = 0.0f; r0[ 8] = 0.0f; r0[12] = 0.0f;
r0[ 1] = 0.0f; r0[ 5] = 0.0f; r0[ 9] = 0.0f; r0[13] = 0.0f;
r0[ 2] = 0.0f; r0[ 6] = 0.0f; r0[10] = 0.0f; r0[14] = 0.0f;
r0[ 3] = 0.0f; r0[ 7] = 0.0f; r0[11] = 0.0f; r0[15] = 0.0f;
Matrix4x4_Multiply_SSE(r0, m1, m2);
Matrix4x4_Multiply_SSE(r0, m1, m2);
return (0);
}
实际上,在第二个movups之后,堆栈更改结果值(存储在堆栈上),并存储值>xmm0 存储在结果中的已修改(错误)地址。
从 *Matrix4x4_Multiply_SSE* 退出后,原始内存不会被修改。
我错过了什么?
I'm trying to call a native function from a managed assembly. I've done this on pre-compiled libraries and everything has went well. At this moment I'm building my own library, and I can't get this work.
The native DLL source is the following:
#define DERM_SIMD_EXPORT __declspec(dllexport)
#define DERM_SIMD_API __cdecl
extern "C" {
DERM_SIMD_EXPORT void DERM_SIMD_API Matrix4x4_Multiply_SSE(float *result, float *left, float *right);
}
void DERM_SIMD_API Matrix4x4_Multiply_SSE(float *result, float *left, float *right) {
__asm {
....
}
}
Hereafter we have the managed code which loads the library and create a delegate from a function pointer.
public unsafe class Simd
{
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate void MatrixMultiplyDelegate(float* result, float* left, float* right);
public static MatrixMultiplyDelegate MatrixMultiply;
public static void LoadSimdExtensions()
{
string assemblyPath = "Derm.Simd.dll";
IntPtr address = GetProcAddress.GetAddress(assemblyPath, "Matrix4x4_Multiply_SSE");
if (address != IntPtr.Zero) {
MatrixMultiply = (MatrixMultiplyDelegate)Marshal.GetDelegateForFunctionPointer(address, typeof(MatrixMultiplyDelegate));
}
}
}
Using the sources above the code runs without errors (the function pointer is obtained, and the delegate is actually created.
The problem raises when I call the delegate: it is executed (and I can debug it also!), but at function exit the managed application raises a System.ExecutionEngineException (when it doesn't exit without exceptions).
The actual problem is the function implementation: it contains a asm block with SSE instructions; if I remove the asm block, the code works perfectly.
I suspect I am missing some registry save/restore assembly, but I'm completly ignorant on this side.
The strange thing is that if I change the calling convention to __stdcall, the debug version "seems" to work, while the release version behave as if __cdecl calling convetion was used.
(And just because here we are, can you clarify if the calling convetion matters?)
Ok, thank to the David Heffernan comment I find out that the bad instructions causing the problem are the following:
movups result[ 0], xmm4;
movups result[16], xmm5;
movups instructions moves 16 bytes into (unaligned) memory.
The function is called by the following code:
unsafe {
float* prodFix = (float*)prod.MatrixBuffer.AlignedBuffer.ToPointer();
float* m1Fix = (float*)m2.MatrixBuffer.AlignedBuffer.ToPointer();
float* m2Fix = (float*)m1.MatrixBuffer.AlignedBuffer.ToPointer();
if (Simd.Simd.MatrixMultiply == null) {
// ... unsafe C# code
} else {
Simd.Simd.MatrixMultiply(prodFix, m1Fix, m2Fix);
}
}
Where MatrixBuffer is a class of mine; its member AlignedBuffer is allocated in the following way:
// Allocate unmanaged buffer
mUnmanagedBuffer = Marshal.AllocHGlobal(new IntPtr((long)(size + alignment - 1)));
// Align buffer pointer
long misalignment = mUnmanagedBuffer.ToInt64() % alignment;
if (misalignment != 0)
mAlignedBuffer = new IntPtr(mUnmanagedBuffer.ToInt64() + misalignment);
else
mAlignedBuffer = mUnmanagedBuffer;
Maybe the error is caused by Marshal.AllocHGlobal or IntPtr black magic?
This is the minimal source to spot the error:
void Matrix4x4_Multiply_SSE(float *result, float *left, float *right)
{
__asm {
movups xmm0, right[ 0];
movups result, xmm0;
}
}
int main(int argc, char *argv[])
{
float r0[16];
float m1[16], m2[16];
m1[ 0] = 1.0f; m1[ 4] = 0.0f; m1[ 8] = 0.0f; m1[12] = 0.0f;
m1[ 1] = 0.0f; m1[ 5] = 1.0f; m1[ 9] = 0.0f; m1[13] = 0.0f;
m1[ 2] = 0.0f; m1[ 6] = 0.0f; m1[10] = 1.0f; m1[14] = 0.0f;
m1[ 3] = 0.0f; m1[ 7] = 0.0f; m1[11] = 0.0f; m1[15] = 1.0f;
m2[ 0] = 1.0f; m2[ 4] = 0.0f; m2[ 8] = 0.0f; m2[12] = 0.0f;
m2[ 1] = 0.0f; m2[ 5] = 1.0f; m2[ 9] = 0.0f; m2[13] = 0.0f;
m2[ 2] = 0.0f; m2[ 6] = 0.0f; m2[10] = 1.0f; m2[14] = 0.0f;
m2[ 3] = 0.0f; m2[ 7] = 0.0f; m2[11] = 0.0f; m2[15] = 1.0f;
r0[ 0] = 0.0f; r0[ 4] = 0.0f; r0[ 8] = 0.0f; r0[12] = 0.0f;
r0[ 1] = 0.0f; r0[ 5] = 0.0f; r0[ 9] = 0.0f; r0[13] = 0.0f;
r0[ 2] = 0.0f; r0[ 6] = 0.0f; r0[10] = 0.0f; r0[14] = 0.0f;
r0[ 3] = 0.0f; r0[ 7] = 0.0f; r0[11] = 0.0f; r0[15] = 0.0f;
Matrix4x4_Multiply_SSE(r0, m1, m2);
Matrix4x4_Multiply_SSE(r0, m1, m2);
return (0);
}
Pratically after the second movups, the stack changes the result value (stored on the stack), and stores the values of xmm0 on the modified (and wrong) address stored in result.
After having stepped out from *Matrix4x4_Multiply_SSE*, the original memory isn't modified.
What am I missing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
对齐校正错误。您需要添加
alignment-misalignment
来纠正对齐。因此代码应为:但是,我建议您首先在本机设置中测试该函数。一旦您知道它在那里工作,您就可以转移到托管设置,并知道任何问题都是由托管代码引起的。
The alignment correction is wrong. You need to add
alignment-misalignment
to correct the alignment. So the code should read:However, I would recommend that you test the function in a native setting first. Once you know it works there you can move to the managed setting and know that any problems are due to the managed code.
你的组装有缺陷。 前两个
示例没有写入指针指向的内存位置,而是写入指针本身的存储位置。由于参数来自堆栈,因此您确实使用 movups 指令覆盖了堆栈。当您调用例如 mov [x],10 时,您可以在调试器窗口中看到这一点,
您没有将 x 设置为 10,而是写入堆栈。
You assembly was flawed. There is a difference between
The first two examples did not write to the memory location pointed to the pointer but to the storage location of the pointer itself. Since the parameter comes from the stack you did overwrite with the movups instruction your stack. You can see this in the debugger window when you call e.g.
With mov [x],10 you do not set x to 10 but you write into your stack.
我找到一个解决方案。将指针值加载到 CPU 寄存器上,并使用寄存器重定向到内存:
使用这些指令使代码按预期工作。
但问题仍然没有完全解决,因为 movups 指令可以将内存地址作为第一个参数;因此,如果有人知道发生了什么事,我很高兴查看最佳答案。
I find out a solution. Loading pointer value on CPU register, and using the register for redirect to memory:
Using those instructions makes the code working as expected.
But the question remain unsolved completely, since the movups instruction can take as first argument a memory address; so if someone knows what's going on, I'm pleased to check the best answer.