在不同本地化的计算机上加载资源时,如何防止非 Unicode 应用程序转换资源的字符集?
我们有一个使用 Visual Studio 编写的非 Unicode C++ 应用程序,该应用程序最初是为使用代码页 1252 字符集的计算机编写的。
我们的应用程序在读取资源内容后对其执行许多后处理步骤,包括在某些文件中查找资源字符串。
现在中国人开始使用该应用程序,他们的机器使用 PRC 语言环境(它将非 unicode 应用程序的默认代码页设置为 936,这是一种多字节字符集)。
看起来CString::LoadString
将执行一些转换。这会中断进一步的处理,因为我们在其他文件中查找的内容并不相同。
CMenu::GetMenuString
或 CWnd::GetWindowText
也是如此。
糟糕的是,我们不能简单地在文件上使用 iconv,因为 LoadString、GetMenuString 或 GetWindowText 的行为如下:
- 代码页 1252 中有效的某些字符在代码页 936 中无效(例如 î、û、ñ、œ)并被替换为问号 代码
- 页 1252 中有效的某些字符在代码页中无效936(eg É),但被替换为替代字符(É => é)
- 一些字符存在于两个代码页中,但没有相同的表示形式,在 CP936 中通常有两个字节
- 一些字符(包括所有 ASCII 字符)在两个代码页中都匹配。
我希望加载资源内容的这三个函数加载二进制内容,而不执行任何字符集转换。我尝试使用 LANGUAGE LANG_INVARIANT, SUBLANG_NEUTRAL 修改 .rc 文件,但这没有改变任何内容。
资源文件还包含#pragma code_page(1252)
;这可以安全删除吗?那个pragma 是做什么用的?
谢谢您的回答。
We have a non-Unicode, C++ application, written with Visual Studio, that has been originally written for machines using the codepage 1252 character set.
Our application performs many post-processing steps on the contents of the resources after reading them, including looking up for resource strings in some files.
Now people in China are starting to use the application, and their machines use the PRC locale (which sets the default codepage for non-unicode applications to 936, which is a multibyte character set).
It appears that CString::LoadString
will perform some conversion. This breaks further processing because the content that we are looking for in the other files is not the same.
The same goes for CMenu::GetMenuString
or CWnd::GetWindowText
.
Badly enough, we cannot simply use iconv
on our files because LoadString
, GetMenuString
or GetWindowText
will behave this way:
- some characters which are valid in codepage 1252 are not valid in codepage 936 (e.g. î, û, ñ, œ) and get replaced with question marks
- some characters which are valid in codepage 1252 are not valid in codepage 936 (e.g. É) but get replaced with an alternate character (É => é)
- some characters exist in both codepages but do not have the same representation, often with two bytes in CP936
- some characters (including all ASCII characters) match in both codepages.
I would like that those three functions which load resource contents load the binary content, without performing any character set conversion. I have tried to modify the .rc
file with LANGUAGE LANG_INVARIANT, SUBLANG_NEUTRAL
but this did not change anything.
The resource file also includes a #pragma code_page(1252)
; can this be safely removed? What is that pragma for?
Thank you for your answers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
也许你可以使用 BOOL
SetThreadLocale( LCID Locale );
MSDN :
SetThreadLocale 会影响 LANGUAGE 语句对资源的选择。该语句影响 CreateDialog、DialogBox、LoadMenu、LoadString 和 FindResource 等函数。它设置 CP_THREAD_ACP 隐含的代码页,但不影响 FindResourceEx。有关详细信息,请参阅代码页标识符。
Maybe you can use BOOL
SetThreadLocale( LCID Locale );
MSDN :
SetThreadLocale affects the selection of resources with a LANGUAGE statement. The statement affects such functions as CreateDialog, DialogBox, LoadMenu, LoadString, and FindResource. It sets the code page implied by CP_THREAD_ACP, but does not affect FindResourceEx. For more information, see Code Page Identifiers.
对于 LoadString,显而易见的做法是直接调用 Win32 API 函数 LoadStringW(),这将直接为您提供 Unicode 字符串。如果您使用 CString 的 CStringW 形式,它甚至可能会起作用,就像这样(未经测试!)
菜单和窗口函数会带来更多问题。应该可以直接调用 Win32 API GetMenuStringW() 的 Unicode 形式。窗口函数 GetWindowText() 是一个非常尴尬的函数:当然,您可以调用 Win32 函数 GetWindowTextW(),但返回的内容将取决于您调用它的窗口是否具有 ANSI 或 Unicode 窗口过程。如果底层窗口是 Windows 控件,那么通常可以获取底层窗口过程并直接调用它,但它并不漂亮,也没有多大乐趣。
有机会更详细地说明您如何尝试使用它吗?值得注意的是,您列出这些函数就好像所有 3 个函数都访问资源一样,但事实并非如此:只有 LoadString() 可以做到这一点。另外两个直接对运行进程中存在的菜单或窗口进行操作,而不是对资源进行操作。
作为如何解决 GetWindowTextW() 问题的示例,请查看 此项目中的 UnicodeEdit 类。这是一个 ANSI 应用程序,需要在 Windows 9X 上运行,但如果可能的话,还需要能够从编辑控件获取 Unicode 文本。诀窍在于该类会记住子类化之前的窗口过程是 Unicode 还是 ANSI,如果是 Unicode,则直接在其 GetWindowText() 中调用它。根据您的需要,这种方法可能会有所帮助。
For LoadString, the obvious thing to do would be to call the Win32 API function LoadStringW() directly, which will give you the Unicode string directly. It might even work if you use the CStringW form of CString, like this (not tested!)
The menu and window functions will give more problems. It should work to call the Unicode form of the Win32 API GetMenuStringW() directly. The window function GetWindowText() is the really awkward one: you can, of course, call the Win32 function GetWindowTextW(), but what that returns will depend on whether the window you call it on has an ANSI or Unicode window procedure. If the underlying window is a Windows control then it's usually possible to get at the underlying window procedure and call that directly, but it's not pretty and it's not much fun.
Any chance of more detail on how you're trying to use it? It's worth noting that you list these functions as if all 3 access resources, but that's not true: only LoadString() does that. The other two operate directly on the menu or window that exists in the running process, not on resources.
As an example of how it's possible to get around the GetWindowTextW() problems, have a look at the UnicodeEdit class from this project. This is an ANSI application that needed to work on Windows 9X, but also needed to be able to get Unicode text from an edit control if possible. The trick is that the class remembers whether the window procedure before subclassing was Unicode or ANSI, and if Unicode, calls that directly in its GetWindowText(). Depending on what you need, this sort of approach might help.