宽字符和区域设置
#1
#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int main()
{
setlocale(LC_CTYPE,"C");
wprintf(L"大\n");
return 0;
}
//result : ?
#2
#include <stdio.h>
#include <locale.h>
int main()
{
setlocale(LC_CTYPE,"C");
printf("大\n");
return 0;
}
//result : 大
#1和#2的区别只是打印功能。
我希望如果宽字符没有在某些区域设置中打印,那么多字节字符也不应该在同一区域设置中打印。
我很好奇为什么打印多字节字符串(#2),而不打印宽字符串(#1)?
我知道如果语言环境不是 "C"
,宽字符会打印得很好。但为什么?? locale到底是做什么的?
+)我认为多字节字符编码与区域设置相关,但是多字节字符打印得很好,与区域设置无关。计算机如何确定多字节字符编码?
#1
#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int main()
{
setlocale(LC_CTYPE,"C");
wprintf(L"大\n");
return 0;
}
//result : ?
#2
#include <stdio.h>
#include <locale.h>
int main()
{
setlocale(LC_CTYPE,"C");
printf("大\n");
return 0;
}
//result : 大
The difference between #1 and #2 is just printing function.
I expect that if wide character doesnt printed in certain locale, then multibyte character also should not be printed in the same locale.
I'm curious why multibyte string is printed(#2), whereas wide character string doesnt printed(#1)?
I know if locale is not "C"
, wide character will be printed well. but why?? What is the locale exactly do?
+) I thought multibyte characer encoding is locale dependent, but multibyte character is printed well regradless of locale.. How computer can determine multibyte character encoding?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您使用
Windows Console
,则应该使用_setmode
函数将stdout
的默认转换模式更改为 Unicode,如果您想使用宽字符串。例如:
https:// /learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170
If you work with
Windows Console
you should use_setmode
function to change the default translation mode ofstdout
to Unicode, if you want to work with wide strings.For example:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170
L"大\n" 是一个包含 3 个元素的 wchar_t 数组。
数组[0] == L'大'(0xE5 0xA4 0xA7)
数组[1] == 换行符(0xA)
数组[2] == null (0x0)
但 C 语言环境不理解位置 0 处的第一个多字节字符.
“大\n”创建一个包含 5 个元素的 char(字节)数组。
数组[0] == (0xE5)
数组[1] == (0xA4)
数组[2] == (0xA7)
数组[3] == 换行符(0xA)
数组[4] == null (0x0)
第二个打印,因为它实际上创建了一个短字符数组。当您打印字符串时,它只是将每个字节发送到屏幕,直到到达空字符。
它在屏幕上打印为“大”,因为您的操作系统将字节序列视为“大”。
输出:大
L"大\n" is a wchar_t array of 3 elements.
array[0] == L'大' (0xE5 0xA4 0xA7)
array[1] == newline (0xA)
array[2] == null (0x0)
But the C locale doesn't understand the first multibyte character at position 0.
"大\n" creates a char (byte) array of 5 elements.
array[0] == (0xE5)
array[1] == (0xA4)
array[2] == (0xA7)
array[3] == newline (0xA)
array[4] == null (0x0)
The second one printed, because it actually creates an array of short characters. And when you print the string, it's just sending each byte to the screen until it reaches the null character.
And it prints as 大 on your screen, because your os treats that sequence of bytes as that.
Output: 大