char 与 wchar_t

发布于 2024-12-06 05:29:32 字数 579 浏览 0 评论 0原文

我正在尝试打印出 wchar_t* 字符串。 代码如下:

#include <stdio.h>
#include <string.h>
#include <wchar.h>

char *ascii_ = "中日友好";  //line-1
wchar_t *wchar_ = L"中日友好";  //line-2

int main()
{
    printf("ascii_: %s\n", ascii_);  //line-3
    wprintf(L"wchar_: %s\n", wchar_);  //line-4
    return 0;
}

//Output
ascii_: 中日友好

问题:

  1. 显然我不应该将 CJK 字符分配给第 1 行中的 char* 指针,但我只是这样做了,并且第 3 行的输出是正确的,那么为什么?第 3 行中的 printf() 如何给我非 ASCII 字符?它是否以某种方式知道编码?

  2. 我假设第2行和第4行中的代码是正确的,但为什么我没有得到第4行的任何输出?

I'm trying to print out a wchar_t* string.
Code goes below:

#include <stdio.h>
#include <string.h>
#include <wchar.h>

char *ascii_ = "中日友好";  //line-1
wchar_t *wchar_ = L"中日友好";  //line-2

int main()
{
    printf("ascii_: %s\n", ascii_);  //line-3
    wprintf(L"wchar_: %s\n", wchar_);  //line-4
    return 0;
}

//Output
ascii_: 中日友好

Question:

  1. Apparently I should not assign CJK characters to char* pointer in line-1, but I just did it, and the output of line-3 is correct, So why? How could printf() in line-3 give me the non-ascii characters? Does it know the encoding somehow?

  2. I assume the code in line-2 and line-4 are correct, but why I didn't get any output of line-4?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

_蜘蛛 2024-12-13 05:29:32

首先,在源代码中使用非 ASCII 字符通常不是一个好主意。可能发生的情况是中文字符被编码为与 ascii 兼容的 UTF-8。

现在,至于为什么 wprintf() 不起作用。这与流方向有关。每个流只能设置为正常或宽。一旦设定,就无法更改。第一次使用时已设置。 (由于printf,这是ascii)。之后,由于方向不正确,wprintf 将无法工作。

换句话说,一旦您使用了 printf(),您就需要继续使用 printf()。同样,如果您从 wprintf() 开始,则需要继续使用 wprintf()

不能混合使用printf()wprintf()。 (Windows 上除外)

编辑:

回答有关为什么 wprintf 行甚至无法单独工作的问题。可能是因为代码在编译时将中日友好的UTF-8格式存储到了wchar_中。但是,wchar_t 需要 4 字节 unicode 编码。 (Windows 中为 2 字节)

因此,我可以想到两个选项:

  1. 不要为 wchar_t 烦恼,而只使用多字节 char。这是最简单的方法,但如果用户的系统未设置为中文语言环境,则可能会失败。
  2. 使用 wchar_t,但您需要使用 unicode 转义序列对中文字符进行编码。这显然会使其在源代码中不可读,但它可以在任何可以打印汉字字体的机器上运行,无论语言环境如何。

First of all, it's usually not a good idea to use non-ascii characters in source code. What's probably happening is that the chinese characters are being encoded as UTF-8 which works with ascii.

Now, as for why the wprintf() isn't working. This has to do with stream orientation. Each stream can only be set to either normal or wide. Once set, it cannot be changed. It is set the first time it is used. (which is ascii due to the printf). After that the wprintf will not work due the incorrect orientation.

In other words, once you use printf() you need to keep on using printf(). Similarly, if you start with wprintf(), you need to keep using wprintf().

You cannot intermix printf() and wprintf(). (except on Windows)

EDIT:

To answer the question about why the wprintf line doesn't work even by itself. It's probably because the code is being compiled so that the UTF-8 format of 中日友好 is stored into wchar_. However, wchar_t needs 4-byte unicode encoding. (2-bytes in Windows)

So there's two options that I can think of:

  1. Don't bother with wchar_t, and just stick with multi-byte chars. This is the easy way, but may break if the user's system is not set to the Chinese locale.
  2. Use wchar_t, but you will need to encode the Chinese characters using unicode escape sequences. This will obviously make it unreadable in the source code, but it will work on any machine that can print Chinese character fonts regardless of the locale.
浅忆流年 2024-12-13 05:29:32

第 1 行不是 ascii,它是编译器在编译时使用的任何多字节编码。在现代系统上可能是 UTF-8。 printf 不知道编码。它只是将字节发送到标准输出,只要编码匹配,一切都很好。

您应该注意的一个问题是,第 3 行和第 4 行一起调用了未定义的行为。您不能在同一个 FILE (stdout) 上混合基于字符的 io 和宽字符 io。在第一次操作之后,FILE 具有“方向”(字节或宽),此后任何尝试执行相反方向的操作都会导致 UB。

Line 1 is not ascii, it's whatever multibyte encoding is used by your compiler at compile-time. On modern systems that's probably UTF-8. printf does not know the encoding. It's just sending bytes to stdout, and as long as the encodings match, everything is fine.

One problem you should be aware of is that lines 3 and 4 together invoke undefined behavior. You cannot mix character-based and wide-character io on the same FILE (stdout). After the first operation, the FILE has an "orientation" (either byte or wide), and after that any attempt to perform operations of the opposite orientation results in UB.

不寐倦长更 2024-12-13 05:29:32

你忽略了一步,因此思考方式错误。

磁盘上有一个包含字节的 C 文件。您有一个“ASCII”字符串和一个宽字符串。

ASCII 字符串采用与第 1 行中完全相同的字节并输出它们。
只要用户端的编码与程序员端的编码相同,这就可以工作。

宽字符串首先将给定的字节解码为 un​​icode 代码点并存储在程序中 - 也许这在您这边出了问题。输出时,它们根据用户端的编码再次编码。这确保了这些字符按照预期发出,而不是按照输入的方式发出。

您的编译器假定错误的编码,或者您的输出终端设置错误。

You are omitting one step and therefore think the wrong way.

You have a C file on disk, containing bytes. You have a "ASCII" string and a wide string.

The ASCII string takes the bytes exactly like they are in line 1 and outputs them.
This works as long as the encoding of the user's side is the same as the one on the programmer's side.

The wide string first decodes the given bytes into unicode codepoints and stored in the program- maybe this goes wrong on your side. On output they are encoded again according to the encoding on the user's side. This ensures that these characters are emitted as they are intended to, not as they are entered.

Either your compiler assumes the wrong encoding, or your output terminal is set up the wrong way.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文