宽字符和区域设置

发布于 2025-01-16 21:01:02 字数 642 浏览 0 评论 0原文

#include <stdio.h>
#include <locale.h>
#include <wchar.h>

int main()
{
    setlocale(LC_CTYPE,"C");
    wprintf(L"大\n");
    
    return 0;
}

//result : ?

#include <stdio.h>
#include <locale.h>

int main()
{
    setlocale(LC_CTYPE,"C");
    printf("大\n");
    
    return 0;
}

//result : 大

#1和#2的区别只是打印功能。

我希望如果宽字符没有在某些区域设置中打印，那么多字节字符也不应该在同一区域设置中打印。

我很好奇为什么打印多字节字符串（#2），而不打印宽字符串（#1）？

我知道如果语言环境不是 "C"，宽字符会打印得很好。但为什么？？ locale到底是做什么的？

+）我认为多字节字符编码与区域设置相关，但是多字节字符打印得很好，与区域设置无关。计算机如何确定多字节字符编码？

原文

#include <stdio.h>
#include <locale.h>
#include <wchar.h>

int main()
{
    setlocale(LC_CTYPE,"C");
    wprintf(L"大\n");
    
    return 0;
}

//result : ?

#include <stdio.h>
#include <locale.h>

int main()
{
    setlocale(LC_CTYPE,"C");
    printf("大\n");
    
    return 0;
}

//result : 大

The difference between #1 and #2 is just printing function.

I expect that if wide character doesnt printed in certain locale, then multibyte character also should not be printed in the same locale.

I'm curious why multibyte string is printed(#2), whereas wide character string doesnt printed(#1)?

I know if locale is not "C", wide character will be printed well. but why?? What is the locale exactly do?

+) I thought multibyte characer encoding is locale dependent, but multibyte character is printed well regradless of locale.. How computer can determine multibyte character encoding?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谜兔 2025-01-23 21:01:02

如果您使用 Windows Console，则应该使用 _setmode 函数将 stdout 的默认转换模式更改为 Unicode，如果您想使用宽字符串。

例如：

#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <fcntl.h>
#include <io.h>

int main()
{
    setlocale(LC_CTYPE,"C");
    _setmode(_fileno(stdout), _O_U16TEXT);
    wprintf(L"大\n");
    
    return 0;
}

https:// /learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170

If you work with Windows Console you should use _setmode function to change the default translation mode of stdout to Unicode, if you want to work with wide strings.

For example:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <fcntl.h>
#include <io.h>

int main()
{
    setlocale(LC_CTYPE,"C");
    _setmode(_fileno(stdout), _O_U16TEXT);
    wprintf(L"大\n");
    
    return 0;
}

https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170

回复收藏 0 原文

始终不够 2025-01-23 21:01:02

L"大\n" 是一个包含 3 个元素的 wchar_t 数组。
数组[0] == L'大'(0xE5 0xA4 0xA7)
数组[1] == 换行符(0xA)
数组[2] == null (0x0)

但 C 语言环境不理解位置 0 处的第一个多字节字符.

“大\n”创建一个包含 5 个元素的 char（字节）数组。
数组[0] == (0xE5)
数组[1] == (0xA4)
数组[2] == (0xA7)
数组[3] == 换行符(0xA)
数组[4] == null (0x0)

第二个打印，因为它实际上创建了一个短字符数组。当您打印字符串时，它只是将每个字节发送到屏幕，直到到达空字符。

它在屏幕上打印为“大”，因为您的操作系统将字节序列视为“大”。

#include <stdio.h>
#include <stdlib.h>

int main() {
    char first_byte = strtol("0xE5",    NULL, 16);
    char second_byte = strtol("0xA4",NULL,16);
    char third_byte = strtol("0xA7",NULL,16);
    printf("%c%c%c\n", first_byte, second_byte, third_byte);
    
    return 0;
}

输出：大

L"大\n" is a wchar_t array of 3 elements.
array[0] == L'大' (0xE5 0xA4 0xA7)
array[1] == newline (0xA)
array[2] == null (0x0)

But the C locale doesn't understand the first multibyte character at position 0.

"大\n" creates a char (byte) array of 5 elements.
array[0] == (0xE5)
array[1] == (0xA4)
array[2] == (0xA7)
array[3] == newline (0xA)
array[4] == null (0x0)

The second one printed, because it actually creates an array of short characters. And when you print the string, it's just sending each byte to the screen until it reaches the null character.

And it prints as 大 on your screen, because your os treats that sequence of bytes as that.

#include <stdio.h>
#include <stdlib.h>

int main() {
    char first_byte = strtol("0xE5",    NULL, 16);
    char second_byte = strtol("0xA4",NULL,16);
    char third_byte = strtol("0xA7",NULL,16);
    printf("%c%c%c\n", first_byte, second_byte, third_byte);
    
    return 0;
}

Output: 大

回复收藏 0 原文

~没有更多了~