从 C# 调用 .DLL 的奇怪问题
我正在尝试从 C# 调用 HtmlTidy 库 dll。 网上流传着一些例子,但没有什么明确的……而且我遇到了无穷无尽的麻烦。 我很确定问题出在 p/invoke 声明上......但是如果我知道哪里出错了,那就很危险了。
我从 http://www.paehl.com/open_source/?HTML_Tidy_for_Windows 这似乎是当前版本。
这是一个控制台应用程序,它演示了我遇到的问题:
using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;
namespace ConsoleApplication5
{
class Program
{
[StructLayout(LayoutKind.Sequential)]
public struct TidyBuffer
{
public IntPtr bp; // Pointer to bytes
public uint size; // # bytes currently in use
public uint allocated; // # bytes allocated
public uint next; // Offset of current input position
};
[DllImport("libtidy.dll")]
public static extern int tidyBufAlloc(ref TidyBuffer tidyBuffer, uint allocSize);
static void Main(string[] args)
{
Console.WriteLine(CleanHtml("<html><body><p>Hello World!</p></body></html>"));
}
static string CleanHtml(string inputHtml)
{
byte[] inputArray = Encoding.UTF8.GetBytes(inputHtml);
byte[] inputArray2 = Encoding.UTF8.GetBytes(inputHtml);
TidyBuffer tidyBuffer2;
tidyBuffer2.size = 0;
tidyBuffer2.allocated = 0;
tidyBuffer2.next = 0;
tidyBuffer2.bp = IntPtr.Zero;
//
// tidyBufAlloc overwrites inputArray2... why? how? seems like
// tidyBufAlloc is stomping on the stack a bit too much... but
// how? I've tried changing the calling convention to cdecl and
// stdcall but no change.
//
Console.WriteLine((inputArray2 == null ? "Array2 null" : "Array2 not null"));
tidyBufAlloc(ref tidyBuffer2, 65535);
Console.WriteLine((inputArray2 == null ? "Array2 null" : "Array2 not null"));
return "did nothing";
}
}
}
总而言之,我有点困惑。 任何帮助,将不胜感激!
I'm trying to call the HtmlTidy library dll from C#. There's a few examples floating around on the net but nothing definitive... and I'm having no end of trouble. I'm pretty certain the problem is with the p/invoke declaration... but danged if I know where I'm going wrong.
I got the libtidy.dll from http://www.paehl.com/open_source/?HTML_Tidy_for_Windows which seems to be a current version.
Here's a console app that demonstrates the problem I'm having:
using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;
namespace ConsoleApplication5
{
class Program
{
[StructLayout(LayoutKind.Sequential)]
public struct TidyBuffer
{
public IntPtr bp; // Pointer to bytes
public uint size; // # bytes currently in use
public uint allocated; // # bytes allocated
public uint next; // Offset of current input position
};
[DllImport("libtidy.dll")]
public static extern int tidyBufAlloc(ref TidyBuffer tidyBuffer, uint allocSize);
static void Main(string[] args)
{
Console.WriteLine(CleanHtml("<html><body><p>Hello World!</p></body></html>"));
}
static string CleanHtml(string inputHtml)
{
byte[] inputArray = Encoding.UTF8.GetBytes(inputHtml);
byte[] inputArray2 = Encoding.UTF8.GetBytes(inputHtml);
TidyBuffer tidyBuffer2;
tidyBuffer2.size = 0;
tidyBuffer2.allocated = 0;
tidyBuffer2.next = 0;
tidyBuffer2.bp = IntPtr.Zero;
//
// tidyBufAlloc overwrites inputArray2... why? how? seems like
// tidyBufAlloc is stomping on the stack a bit too much... but
// how? I've tried changing the calling convention to cdecl and
// stdcall but no change.
//
Console.WriteLine((inputArray2 == null ? "Array2 null" : "Array2 not null"));
tidyBufAlloc(ref tidyBuffer2, 65535);
Console.WriteLine((inputArray2 == null ? "Array2 null" : "Array2 not null"));
return "did nothing";
}
}
}
All in all I'm a bit stumpped. Any help would be appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您正在使用 TidyBuffer 结构的旧定义。 新结构更大,因此当您调用 allocate 方法时,它会覆盖 inputArray2 的堆栈位置。 新的定义是:
You are working with an old definition of the TidyBuffer structure. The new structure is larger so when you call the allocate method it is overwriting the stack location for inputArray2. The new definition is:
不管怎样,我们在工作中尝试了 Tidy,并改用了 HtmlAgilityPack。
For what it's worth, we tried Tidy at work and switched to HtmlAgilityPack.
尝试将 tidyBufAlloc 声明更改为:
注意 CharSet.Ansi 添加和“int allocSize”(而不是 uint)。
另外,请参阅此示例代码,了解在以下位置使用 HTML Tidy 的示例: C#。
在您的示例中,如果 inputHTML 很大,例如 50K,则 inputArray 和 inputArray2 也将各为 50K。
然后,您还尝试在 tidyBufAlloc 调用中分配 65K。
如果指针未正确初始化,则很可能正在使用随机的 .NET 堆地址。 因此,会发生覆盖部分或全部看似不相关的变量/缓冲区的情况。 很可能只是运气好,或者您已经分配了大缓冲区,您没有覆盖可能导致无效内存访问错误的代码块。
Try changing your tidyBufAlloc declaration to:
Note the CharSet.Ansi addition and the "int allocSize" (instead of uint).
Also, see this sample code for an example of using HTML Tidy in C#.
In your example, if inputHTML is large, say 50K, inputArray and inputArray2 will be also be 50K each.
You are then also trying to allocate 65K in the tidyBufAlloc call.
If a pointer is not initialised correctly, it is quite possible a random .NET heap address is being used. Hence overwriting part or all of a seemingly unrelated variable/buffer occurs. It is problaby just luck, or that you have already allocated large buffers, that you are not overwriting a code block which would likely cause a Invalid Memory access error.