如何使用 c++ 查找字符是否属于特定代码页或调用winapi

发布于 2024-08-25 03:08:26 字数 59 浏览 13 评论 0原文

我们如何确定一个字符是否属于特定的代码页? 或者我们如何确定一个字符是否适合应用程序当前活动的 IME。

How can we find if a character belongs to a particular codepage?
or How can we determine whether a charcter fits into currently active IME for an application.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

听,心雨的声音 2024-09-01 03:08:26

使用 WC_ERR_INVALID_CHARS 标志,如果使用任何无效字符,WideCharToMultiByte 将彻底失败。如果您想知道目标代码页中未表示哪些字符,请使用 lpDefaultChar 和 lpUsedDefaultChar 参数。

LPCWSTR pszUtf16; // converted from utf8 source character
UINT nTargetCP = CP_ACP;
BOOL fBadCharacter = FALSE;
if(WideCharToMultiByte(nTargetCP,WC_NO_BEST_FIT_CHARS,pszUtf16,NULL,0,NULL,&fBadCharacter)
{
  if(fBadCharacter)
  {
    // at least one character in the string was not represented in nTargetCP
  } 
}

Use the WC_ERR_INVALID_CHARS flag and WideCharToMultiByte will fail outright if any invalid characters are used. If you want to know which characters are not represented in the target codepage, use the lpDefaultChar, and lpUsedDefaultChar parameters.

LPCWSTR pszUtf16; // converted from utf8 source character
UINT nTargetCP = CP_ACP;
BOOL fBadCharacter = FALSE;
if(WideCharToMultiByte(nTargetCP,WC_NO_BEST_FIT_CHARS,pszUtf16,NULL,0,NULL,&fBadCharacter)
{
  if(fBadCharacter)
  {
    // at least one character in the string was not represented in nTargetCP
  } 
}
鹊巢 2024-09-01 03:08:26

前面的两个答案正确地建议使用 MultiByteToWideChar 然后 WideCharToMultiByte 将 UTF-8 字符转换为 UTF-16,然后转换为当前的 Windows 代码页 (CP_ACP)。检查WideCharToMultiByte的结果,看看转换是否成功。

最初的问题不清楚的是,您在印地语方面遇到了特殊问题。对于这种语言,您的问题毫无意义,因为正如 Chris Becke 指出的那样,印地语没有 Windows ANSI 代码页。因此,您永远无法将印地语字符转换为 CP_ACP,并且 WideCharToMultiByte 将始终失败。

据我了解,要在 Windows 上使用印地语,您必须是调用 Unicode API 的 Unicode 应用程序。

The two previous answers have correctly suggested using MultiByteToWideChar then WideCharToMultiByte to translate your UTF-8 character to UTF-16, then to the current Windows codepage (CP_ACP). Check the result of WideCharToMultiByte to see if the conversion was successful.

What wasn't clear from the original question, is that you are having a particular issue with Hindi. For this language, your question is meaningless because there is no Windows ANSI codepage for Hindi, as Chris Becke pointed out. Therefore, you can never convert a Hindi character to CP_ACP, and WideCharToMultiByte will always fail.

To use Hindi on Windows, as far as I understand it, you must be a Unicode app that calls Unicode APIs.

街角迷惘 2024-09-01 03:08:26

使用 Windows 函数 WideCharToMultiByte 和 MultiByteToWideChar 您可以在 UTF-8 和 16 位 Unicode 字符之间进行转换。这些函数具有参数来指定代码页并指定遇到无效字符时的行为。

Using the windows functions WideCharToMultiByte and MultiByteToWideChar you can convert between UTF-8 and 16-bit Unicode characters. The functions have arguments to specify the code page and to specify the behavior if an invalid character is encountered.

尴尬癌患者 2024-09-01 03:08:26

谢谢克里斯..我正在运行以下代码

#define CP_HINDI 0 
#define CP_JAPANESE 932
#define CP_ENGLISH 1252

wchar_t wcsStringJapanese = 'あ';
wchar_t wcsStringHindi = 'र';
wchar_t wcsStringEnglish = 'A';

int main()  
{ 

    BOOL usedDefaultCharacter = FALSE;

    /* Test for ENGLISH */
    WideCharToMultiByte( CP_ENGLISH,
                        0, &wcsStringEnglish,
                        -1,  
                        NULL,
                        0, 
                        NULL, 
                        &usedDefaultCharacter); 
    printf("usedDefaultCharacters for English? %d \n",usedDefaultCharacter);

    usedDefaultCharacter = FALSE;

    /*TEST FOR JAPANESE */

     WideCharToMultiByte( CP_JAPANESE,
                         0,
                         &wcsStringJapanese,
                        -1,  
                        NULL,
                        0, 
                        NULL, 
                        &usedDefaultCharacter); 
    printf("usedDefaultCharacters for Japanese? %d \n",usedDefaultCharacter);

    //TEST FOR HINDI 
    usedDefaultCharacter = FALSE;

    WideCharToMultiByte( CP_HINDI,
                        0, 
                        &wcsStringHindi,
                        -1,  
                        NULL,
                        0, 
                        NULL, 
                        &usedDefaultCharacter); 
    printf("usedDefaultCharacters for Hindi? %d \n",usedDefaultCharacter);   

}

上面的代码返回:

usedDefaultCharacters for English? 0

使用日语默认字符? 0

使用了印地语的默认字符? 1

第三行不正确,因为印地语的代码页是 0 ,并且传递的字符串由印地语字符组成,并且usedDefaultChar 仍然设置为 1 .. 不应该是这种情况。

Thanks Chris..I am running the following code

#define CP_HINDI 0 
#define CP_JAPANESE 932
#define CP_ENGLISH 1252

wchar_t wcsStringJapanese = 'あ';
wchar_t wcsStringHindi = 'र';
wchar_t wcsStringEnglish = 'A';

int main()  
{ 

    BOOL usedDefaultCharacter = FALSE;

    /* Test for ENGLISH */
    WideCharToMultiByte( CP_ENGLISH,
                        0, &wcsStringEnglish,
                        -1,  
                        NULL,
                        0, 
                        NULL, 
                        &usedDefaultCharacter); 
    printf("usedDefaultCharacters for English? %d \n",usedDefaultCharacter);

    usedDefaultCharacter = FALSE;

    /*TEST FOR JAPANESE */

     WideCharToMultiByte( CP_JAPANESE,
                         0,
                         &wcsStringJapanese,
                        -1,  
                        NULL,
                        0, 
                        NULL, 
                        &usedDefaultCharacter); 
    printf("usedDefaultCharacters for Japanese? %d \n",usedDefaultCharacter);

    //TEST FOR HINDI 
    usedDefaultCharacter = FALSE;

    WideCharToMultiByte( CP_HINDI,
                        0, 
                        &wcsStringHindi,
                        -1,  
                        NULL,
                        0, 
                        NULL, 
                        &usedDefaultCharacter); 
    printf("usedDefaultCharacters for Hindi? %d \n",usedDefaultCharacter);   

}

The above code returns:

usedDefaultCharacters for English? 0

usedDefaultCharacters for Japanese? 0

usedDefaultCharacters for Hindi? 1

The third line is incorrect as the Codepage for Hindi is 0 , and the string passed consists of Hindi Character and still the usedDefaultChar is set to 1 .. which should not be the case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文