Unicode 导致关闭消息框以终止程序
我正在开发一个 Win32 API 包装器。为了使其兼容 Unicode,我执行了以下操作:
#ifndef UNICODE
#define gchar char
#define gstrcpy strcpy
#define gstrncpy strncpy
#define gstrlen strlen
#define gstrcat strcat
#define gstrncat strncat
#define gstrcmp strcmp
#define gstrtok strtok
#else
#define gchar wchar_t
#define gstrcpy lstrcpy
#define gstrncpy lstrncpy
#define gstrlen lstrlen
#define gstrcat lstrcat
#define gstrncat lstrncat
#define gstrcmp lstrcmp
#define gstrtok lstrtok
#endif
我还提供了
#define uni(s) TEXT(s)
由一个窗口组成的测试,
msg (uni("Left-click"));
每当用户左键单击该窗口时,该窗口都会创建一个消息框。问题是,无论创建多少消息,当我 #define UNICODE 时关闭其中 4 或 5 条消息后,显示的下一个消息框,无论是新消息还是最后一个关闭的消息框下的消息都会导致程序返回0xC0000005。不定义 UNICODE 将使这项工作完美。我的msg函数如下:
dword msg (cstr = uni(""), cstr = uni(""), hwin = null, dword = 0);
...
dword msg (cstr lpText, cstr lpCaption, hwin hWnd, dword uType)
{
return MessageBox (hWnd, lpText, lpCaption, uType);
}
其中dword是DWORD,cstr是pchar,pchar是gchar *,可以是char *或wchar_t *,hwin是HWND,null是0。
这可能不是消息框做的,但我还没有进行任何其他与文本相关的测试,所以我会看看它是否也会以其他方式崩溃。
有谁知道为什么会发生这种情况? MB 字符和 unicode 之间的差异不应导致程序反复崩溃。如果需要的话,我也可以上传标题和测试。
编辑: 我刚刚发现创建一条消息然后关闭实际窗口将导致相同的崩溃。 源代码 这是源代码的链接。请记住: a) 我只学过一年级编程课程(C++)。 b) 我的包装器的目的是使编写 win32 应用程序尽可能简单。 c)我喜欢自己制作东西(字符串类等)。
还忘记了这一点(废话),我正在使用 Code::Blocks (MinGW)。
编辑: 我之前没有意识到,程序正在尝试访问 0x00000000 处的内存。这就是导致问题的原因,但我不知道为什么它会尝试这样做。我相信尝试访问它的指令位于 winnt.dll 中的某个位置,但由于从未学习过如何调试,我仍在尝试找出如何找到我需要的信息。
编辑: 现在,无需更改它,但在另一台计算机上运行它,它引用 0x7a797877 而不是 0。
编辑:更改窗口过程以包含 WM_LBUTTONDOWN
并在内部调用 msg()
,而不是调用添加的过程使程序完美运行。 addmsg()
和窗口过程编码方式的某些问题会导致 _lpWindowName 和 _lpClassName 一段时间后数据损坏,但非指针成员仍然保留。
编辑: 经过所有这些混乱之后,我终于发现我的所有源代码中都缺少一个字符。当我将 msgparams
定义为 Window、UINT、WPARAM、LPARAM
以及同样的 msgfillparams
(名称除外)时,我忘记传递引用。我是按值传递窗口的!我仍然要感谢所有发帖的人,因为我确实在调试器方面受到了很大的打击,并且最终也学到了更多关于 Unicode 的知识。
I'm developing a Win32 API Wrapper. To make it Unicode-compliant, I do the following:
#ifndef UNICODE
#define gchar char
#define gstrcpy strcpy
#define gstrncpy strncpy
#define gstrlen strlen
#define gstrcat strcat
#define gstrncat strncat
#define gstrcmp strcmp
#define gstrtok strtok
#else
#define gchar wchar_t
#define gstrcpy lstrcpy
#define gstrncpy lstrncpy
#define gstrlen lstrlen
#define gstrcat lstrcat
#define gstrncat lstrncat
#define gstrcmp lstrcmp
#define gstrtok lstrtok
#endif
I also provide
#define uni(s) TEXT(s)
My test consisted of a window that creates a message box via
msg (uni("Left-click"));
whenever the user left-clicks the window. The problem is that, no matter how many messages are created, after 4 or 5 of these messages are closed when I #define UNICODE, the next message box shown, whether it be a new one or the one under the last one closed causes the program to return 0xC0000005. Not defining UNICODE will make this work perfectly. My msg function is as follows:
dword msg (cstr = uni(""), cstr = uni(""), hwin = null, dword = 0);
...
dword msg (cstr lpText, cstr lpCaption, hwin hWnd, dword uType)
{
return MessageBox (hWnd, lpText, lpCaption, uType);
}
where dword is DWORD, cstr is pchar, which is gchar *, which can be char * or wchar_t *, hwin is HWND, and null is 0.
It probably isn't the message box doing this, but I haven't done any other text-related stuff with the testing so I'll see if it crashes some other way too.
Does anyone know why this would happen? The difference between MB characters and unicode shouldn't cause the program to repeatedly crash. I can upload the headers and the test too, if needed.
Edit:
I just found out creating one message and then closing the actual window will result in the same crash. SOURCE CODE Here's the link for the source. Please keep in mind:
a) I only took one first-year programming course, ever (C++).
b) My wrapper's purpose is to make writing win32 apps as easy as possible.
c) I like to make things of my own (string class etc).
Also forgot this (duh), I'm using Code::Blocks (MinGW).
Edit:
I didn't realize before, but the program is trying to access memory at 0x00000000. This is what's causing the problem, but I have no idea why it would be trying to do this. I believe the instruction trying to access it is located somewhere in winnt.dll, but having never learned how to debug, I'm still trying to figure out how to find the information I need.
Edit:
Now, without changing it, but running it on a different computer, it's referencing 0x7a797877 instead of 0.
Edit: Changing the window procedure to include WM_LBUTTONDOWN
and call msg()
inside, rather than calling the added procedure makes the program work perfectly. Something with the way addmsg()
and the window procedure are coded causes the _lpWindowName and _lpClassName to have corrupted data after a while, but non-pointer members are still preserved.
EDIT:
After all of this mayhem I finally found out I was missing a single character in all of my source code. When I defined msgparams
as Window, UINT, WPARAM, LPARAM
and likewise with msgfillparams
(except with names) I forgot to pass a reference. I was passing the Window by value! I'd still like to thank everyone who posted, as I did get my butt kicked debugger-wise and ended up learning a lot more about Unicode as well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在问有关SO的问题之前你应该先做好功课。我的印象是,您几乎不知道 Unicode 在 Windows 上是如何工作的,并且需要很多页才能解释。
在 Windows 上,将应用程序从 ANSI 移植到 Unicode 是一件大事。付钱给有经验的人来做这件事似乎是合理的。
主要是所有与
char
一起使用的东西都必须与 wchar_t 一起使用。整个 API 还有其他功能,但您应该首先使用 Windows 支持,而不是编写自己的宏,第一步是使用 _T 而不是 W,这样您就能够开始更改代码并且仍然能够以 Unicode 和 ANSI 进行编译。
you should do your homework before asking questions on SO. My impression is that you have almost no idea about how Unicode works on Windows and it will require many pages to explain.
Porting an application from ANSI to Unicode is a big deal on Windows. It may seem reasonable to pay someone with experience do to this.
Mainly everything that worked with
char
will have to work with wchar_t.The entire API has other functions but you should start by using windows support for this, not writing your own
macros
and first step is to use _T not W so you'll b able to start changing code and still be able to compile in both Unicode and ANSI.你为什么还要为 ANSI 烦恼呢?所有 TCHAR 支持都可以追溯到 Win95 普及的时代,因此开发人员必须编写可以编译为 ANSI(对于 Win95)或 UNICODE(对于基于 NT 的 Windows)的代码。现在 Win95 已经过时了,没有必要为 TCHAR 烦恼:只需使用全 UNICODE,使用 L“Unicode 字符串”而不是 TEXT() 和 CRT 的 wcs 版本而不是 _t 版本。
话虽如此,这里有一些 ANSI/UNICODE 代码的常见错误来源,可以解释您所看到的一些内容:
一种可能性是某个地方存在损坏堆栈的错误 - 未初始化的变量、堆栈溢出等。在 unicode 中,堆栈上的任何字符或字符串可能会占用与 ANSI 版本不同的空间量,因此变量最终会出现在彼此不同的位置。很可能您在 ANSI 构建中“很幸运”,无论被损坏的是什么都不是重要的数据;而是重要的数据。但在 UNICODE 构建中,堆栈上的一些重要内容正在被破坏。 (例如,如果堆栈上的缓冲区溢出,则最终可能会覆盖堆栈上的返回地址,可能会导致下一个函数返回时崩溃。)
--
注意混淆字符的情况计数与字节计数:对于 ANSI,您可以将“sizeof()”几乎与字符计数互换使用(取决于您是否计算终止 NUL 空格);但对于 UNICODE,你不能:如果你把它们混淆了,你很容易就会导致缓冲区溢出。
例如:
在 Windows 上,使用 ARRAYSIZE(...) 而不是 sizeof() 来获取数组中的元素数量而不是数组的字节大小。
--
另一件需要注意的事情是您使用强制转换将它们“强制”为 CHAR 或 WCHAR 以避免编译器错误的任何字符串:例如。
这种类型的用法通常会导致仅显示字符串的第一个字符。
这比较棘手,您可能会得到一串垃圾,或者可能会得到某种错误。
这些情况很容易发现其中存在强制转换 - 一般来说,强制转换应被视为“危险信号”并不惜一切代价避免。不要使用它们来避免编译器错误,而是修复编译器警告的问题。
另请注意,您可能会混淆这些内容,但编译器不会向您发出警告 - 例如,使用 printf、scanf 等:编译器不会检查参数列表:
Why are you even bothering with ANSI in the first place? All the TCHAR support dates back to a time when Win95 was commonplace, so developers had to write code that could compile as ANSI (for Win95) or UNICODE (for NT-based Windows). Now that Win95 is long obsolete, there's no need to bother with TCHAR: just go all-UNICODE, using L"Unicode strings" instead of TEXT() and wcs-versions of the CRT rather than the _t-versions.
Having said that, here's some common sources of errors with ANSI/UNICODE code that could explain some of what you are seeing:
One possibility is that there's a bug somewhere that's corrupting the stack - uninitialized variable, stack overrun, and the like. In unicode, any chars or strings on the stack may take up a different amount of space compared to the ANSI version, so variables will end up in different places relative to one another. Chances are you are 'getting lucky' in the ANSI build, and whatever is being corrupted isn't important data; but on the UNICODE build, something important on the stack is getting nuked. (For example, if you overflow a buffer on the stack, you could end up overwriting the return address that's also on the stack, likely causing a crash at the next function return.)
--
Watch out for cases where you are mixing up character counts versus byte counts: with ANSI, you can use 'sizeof()' almost interchangeably with a character count (depending on whether you're counting the terminating NUL space or not); but with UNICODE, you can't: and if you get them mixed up, you can get a buffer overrun very easily.
For example:
On windows, Use the ARRAYSIZE(...) instead of sizeof() to get the number of elements in an array rather than the byte-size of the array.
--
Another thing to watch for is any strings where you have used casts to "force" them into CHAR or WCHAR to avoid compiler errors: eg.
This type of usage typically results in just the first character of the string showing.
This is trickier, you may get a string of garbage, or could get an error of some kind.
These cases are easy to spot there there are casts - generally speaking, casts should be viewed as a 'red flags' and avoided at all costs. Don't use them to avoid a compiler error, instead fix the issue that the compiler is warning about.
Also watch out for cases where you can get these mixed up but where the compiler won't warn you - eg with printf, scanf and friends: the compiler doesn't check the argument lists: