从 C# 调用 .DLL 的奇怪问题

发布于 2024-07-19 02:08:52 字数 2193 浏览 6 评论 0原文

我正在尝试从 C# 调用 HtmlTidy 库 dll。 网上流传着一些例子,但没有什么明确的……而且我遇到了无穷无尽的麻烦。 我很确定问题出在 p/invoke 声明上......但是如果我知道哪里出错了,那就很危险了。

我从 http://www.paehl.com/open_source/?HTML_Tidy_for_Windows 这似乎是当前版本。

这是一个控制台应用程序,它演示了我遇到的问题:

using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;

namespace ConsoleApplication5
{
    class Program
    {
        [StructLayout(LayoutKind.Sequential)]
        public struct TidyBuffer
        {
            public IntPtr bp;         // Pointer to bytes
            public uint size;         // # bytes currently in use
            public uint allocated;    // # bytes allocated
            public uint next;         // Offset of current input position
        };

        [DllImport("libtidy.dll")]
        public static extern int tidyBufAlloc(ref TidyBuffer tidyBuffer, uint allocSize);


        static void Main(string[] args)
        {
            Console.WriteLine(CleanHtml("<html><body><p>Hello World!</p></body></html>"));
        }

        static string CleanHtml(string inputHtml)
        {
            byte[] inputArray = Encoding.UTF8.GetBytes(inputHtml);
            byte[] inputArray2 = Encoding.UTF8.GetBytes(inputHtml);

            TidyBuffer tidyBuffer2;
            tidyBuffer2.size = 0;
            tidyBuffer2.allocated = 0;
            tidyBuffer2.next = 0;
            tidyBuffer2.bp = IntPtr.Zero;

            //
            // tidyBufAlloc overwrites inputArray2... why? how? seems like
            // tidyBufAlloc is stomping on the stack a bit too much... but
            // how? I've tried changing the calling convention to cdecl and
            // stdcall but no change.
            //
            Console.WriteLine((inputArray2 == null ? "Array2 null" : "Array2 not null"));
            tidyBufAlloc(ref tidyBuffer2, 65535);
            Console.WriteLine((inputArray2 == null ? "Array2 null" : "Array2 not null"));
            return "did nothing";
        }
    }
}

总而言之,我有点困惑。 任何帮助,将不胜感激!

I'm trying to call the HtmlTidy library dll from C#. There's a few examples floating around on the net but nothing definitive... and I'm having no end of trouble. I'm pretty certain the problem is with the p/invoke declaration... but danged if I know where I'm going wrong.

I got the libtidy.dll from http://www.paehl.com/open_source/?HTML_Tidy_for_Windows which seems to be a current version.

Here's a console app that demonstrates the problem I'm having:

using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;

namespace ConsoleApplication5
{
    class Program
    {
        [StructLayout(LayoutKind.Sequential)]
        public struct TidyBuffer
        {
            public IntPtr bp;         // Pointer to bytes
            public uint size;         // # bytes currently in use
            public uint allocated;    // # bytes allocated
            public uint next;         // Offset of current input position
        };

        [DllImport("libtidy.dll")]
        public static extern int tidyBufAlloc(ref TidyBuffer tidyBuffer, uint allocSize);


        static void Main(string[] args)
        {
            Console.WriteLine(CleanHtml("<html><body><p>Hello World!</p></body></html>"));
        }

        static string CleanHtml(string inputHtml)
        {
            byte[] inputArray = Encoding.UTF8.GetBytes(inputHtml);
            byte[] inputArray2 = Encoding.UTF8.GetBytes(inputHtml);

            TidyBuffer tidyBuffer2;
            tidyBuffer2.size = 0;
            tidyBuffer2.allocated = 0;
            tidyBuffer2.next = 0;
            tidyBuffer2.bp = IntPtr.Zero;

            //
            // tidyBufAlloc overwrites inputArray2... why? how? seems like
            // tidyBufAlloc is stomping on the stack a bit too much... but
            // how? I've tried changing the calling convention to cdecl and
            // stdcall but no change.
            //
            Console.WriteLine((inputArray2 == null ? "Array2 null" : "Array2 not null"));
            tidyBufAlloc(ref tidyBuffer2, 65535);
            Console.WriteLine((inputArray2 == null ? "Array2 null" : "Array2 not null"));
            return "did nothing";
        }
    }
}

All in all I'm a bit stumpped. Any help would be appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

北凤男飞 2024-07-26 02:08:52

您正在使用 TidyBuffer 结构的旧定义。 新结构更大,因此当您调用 allocate 方法时,它会覆盖 inputArray2 的堆栈位置。 新的定义是:

    [StructLayout(LayoutKind.Sequential)]        
    public struct TidyBuffer        
    {
        public IntPtr allocator;  // Pointer to custom allocator            
        public IntPtr bp;         // Pointer to bytes            
        public uint size;         // # bytes currently in use            
        public uint allocated;    // # bytes allocated            
        public uint next;         // Offset of current input position        
    };        

You are working with an old definition of the TidyBuffer structure. The new structure is larger so when you call the allocate method it is overwriting the stack location for inputArray2. The new definition is:

    [StructLayout(LayoutKind.Sequential)]        
    public struct TidyBuffer        
    {
        public IntPtr allocator;  // Pointer to custom allocator            
        public IntPtr bp;         // Pointer to bytes            
        public uint size;         // # bytes currently in use            
        public uint allocated;    // # bytes allocated            
        public uint next;         // Offset of current input position        
    };        
傲影 2024-07-26 02:08:52

不管怎样,我们在工作中尝试了 Tidy,并改用了 HtmlAgilityPack。

For what it's worth, we tried Tidy at work and switched to HtmlAgilityPack.

数理化全能战士 2024-07-26 02:08:52

尝试将 tidyBufAlloc 声明更改为:

[DllImport("libtidy.dll", CharSet = CharSet.Ansi)]
private static extern int tidyBufAlloc(ref TidyBuffer Buffer, int allocSize);

注意 CharSet.Ansi 添加和“int allocSize”(而不是 uint)。

另外,请参阅此示例代码,了解在以下位置使用 HTML Tidy 的示例: C#。

在您的示例中,如果 inputHTML 很大,例如 50K,则 inputArray 和 inputArray2 也将各为 50K。

然后,您还尝试在 tidyBufAlloc 调用中分配 65K。

如果指针未正确初始化,则很可能正在使用随机的 .NET 堆地址。 因此,会发生覆盖部分或全部看似不相关的变量/缓冲区的情况。 很可能只是运气好,或者您已经分配了大缓冲区,您没有覆盖可能导致无效内存访问错误的代码块。

Try changing your tidyBufAlloc declaration to:

[DllImport("libtidy.dll", CharSet = CharSet.Ansi)]
private static extern int tidyBufAlloc(ref TidyBuffer Buffer, int allocSize);

Note the CharSet.Ansi addition and the "int allocSize" (instead of uint).

Also, see this sample code for an example of using HTML Tidy in C#.

In your example, if inputHTML is large, say 50K, inputArray and inputArray2 will be also be 50K each.

You are then also trying to allocate 65K in the tidyBufAlloc call.

If a pointer is not initialised correctly, it is quite possible a random .NET heap address is being used. Hence overwriting part or all of a seemingly unrelated variable/buffer occurs. It is problaby just luck, or that you have already allocated large buffers, that you are not overwriting a code block which would likely cause a Invalid Memory access error.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文