Unicode 与多字节

发布于 2024-08-20 16:57:00 字数 418 浏览 6 评论 0原文

我真的对 unicode 与多字节的事情感到困惑。

假设我正在用 Unicode 编译程序(但最终,我想要一个独立于所使用的字符集的解决方案)。

1)所有“char”都会被解释为宽字符吗?

2)如果我有一个简单的 printf 语句,即 printf("Hello World\n");如果没有字符串,我可以保留它而不使用 _tprintf 和 _T("...") 吗?如果printf语句包含字符串,那么我应该使用_tprintf和_T("..."),即_tprintf("Hello %s\n", name); ?

3)如果我有一个文本文件(以默认格式保存,即不更改使用的默认字符集),我想读入缓冲区,我仍然可以使用char而不是TCHAR吗?特别是如果我逐个字符地读取它,即通过递增字符指针?

谢谢。

问候, 雷恩

I'm really confused by this unicode vs multi-byte thing.

Say I'm compiling my program in Unicode (but ultimately, I want a solution that is independent of the character set used).

1) Will all 'char' be interpreted as wide characters?

2) If I have a simple printf statement, i.e. printf("Hello World\n"); with no character strings, can I just leave it be without using _tprintf and _T("...")? If the printf statement includes a character string, then I should use _tprintf and _T("..."), i.e. _tprintf("Hello %s\n", name); ?

3) If I have a text file (saved in the default format, i.e. without changing the default character set used) that I want to read into a buffer, can I still use char instead of TCHAR? Especially if I'm reading it character by character, i.e. by incrementing the character pointer?

Thank you.

Regards,
Rayne

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

二手情话 2024-08-27 16:57:00

首先,如果您使用 UNICODE/_UNICODE 进行编译并且不打算针对其他平台,则可以避免使用 TCHAR 业务,并且到处使用 WCHAR (或 wchar_t)和 W 函数。

1) 所有“char”都会被解释为宽字符吗?

根据定义,C 中的 char 是 1 个字节。 (从技术上讲,这并不排除它在 wchar_t 也是 1 字节的平台上成为“宽字符”,但考虑到您正在使用 MSVC 并且面向 Windows 平台,这不会是)

因此,出于实际目的,答案是:不。

2) 如果我有一个简单的 printf 语句,即 printf("Hello World\n");如果没有字符串,我可以保留它而不使用 _tprintf 和 _T("...") 吗?如果printf语句包含字符串,那么我应该使用_tprintf和_T("..."),即_tprintf("Hello %s\n", name); ?

如果您要打印 ASCII 字符串文字,则可以继续使用 printf

如果您要打印可能超出 ASCII 范围的任意字符串,则应使用 _tprintf(或 wprintf)。

3) 如果我有一个文本文件(以默认格式保存,即不更改使用的默认字符集)想要读入缓冲区,我仍然可以使用 char 而不是 TCHAR 吗?特别是如果我逐个字符地读取它,即通过递增字符指针?

什么是“默认格式”?

当您读取外部文件时,您应该首先读取前几个字节以检查 UTF-16 或 UTF-8 BOM,然后据此做出决定。

First, if you're compiling with UNICODE/_UNICODE and don't intend to target other platforms, you can avoid using the TCHAR business and use WCHAR (or wchar_t) and W functions everywhere.

1) Will all 'char' be interpreted as wide characters?

char in C is--by definition--1 byte. (This doesn't technically preclude it from being a "wide character" on platforms where wchar_t is also 1 byte, but given that you're using MSVC and are targeting Windows platforms, that's not going to be the case.)

So for practical purposes, the answer to this is: no.

2) If I have a simple printf statement, i.e. printf("Hello World\n"); with no character strings, can I just leave it be without using _tprintf and _T("...")? If the printf statement includes a character string, then I should use _tprintf and _T("..."), i.e. _tprintf("Hello %s\n", name); ?

If you're printing ASCII string literals, you can continue using printf.

If you're printing arbitrary strings that could lie outside of the ASCII range, you should use _tprintf (or wprintf).

3) If I have a text file (saved in the default format, i.e. without changing the default character set used) that I want to read into a buffer, can I still use char instead of TCHAR? Especially if I'm reading it character by character, i.e. by incrementing the character pointer?

What is "the default format"?

When you're reading in an external file, you should read in the first few bytes first to check for a UTF-16 or UTF-8 BOM, and then base your decisions around that.

煞人兵器 2024-08-27 16:57:00

1) 所有“char”都会被解释为宽字符吗?

不会。但是所有 TCHAR 都将被解释为 wchar_t

考虑 winnt.h 可能如何指定这一点:

#ifdef UNICODE
 typedef WCHAR TCHAR;
#else
 typedef CHAR TCHAR;
#endif

当您调用 SomeApi() 时将包装为 SomeApiA(char *arg)SomeApiW(wchar_t *arg)。 (参数实际上是 TCHAR 的,但你明白了)。

因此,您的源代码将是“独立的”,因为它可以编译为“ANSI”或 Widechar 版本。为此,您需要使用 TCHAR 而不是原始类型。

2) 如果我有一个简单的 printf 语句,即 printf("Hello World\n");如果没有字符串,我可以保留它而不使用 _tprintf 和 _T("...") 吗?如果printf语句包含字符串,那么我应该使用_tprintf和_T("..."),即_tprintf("Hello %s\n", name); ?

我不知道 tprintf 系列,但我可以推测它们的工作方式与上面的定义相同。也就是说,tprintf 将 TCHAR 作为参数,并依赖于 UNICODE 设置将它们视为 char或 wchar_t

3) 如果我有一个文本文件(以默认格式保存,即不更改使用的默认字符集)想要读入缓冲区,我仍然可以使用 char 而不是 TCHAR 吗?特别是如果我逐个字符地读取它,即通过递增字符指针?

文件内容使用什么字符编码完全取决于其本身,与TCHAR无关。 TCHAR 用于文件名,以便您在 win32 API 调用中使用。

1) Will all 'char' be interpreted as wide characters?

No. But all TCHARs will be interpreted as wchar_ts

Consider how winnt.h would probably specify this:

#ifdef UNICODE
 typedef WCHAR TCHAR;
#else
 typedef CHAR TCHAR;
#endif

When you call SomeApi() it will wrap to either SomeApiA(char *arg) or SomeApiW(wchar_t *arg). (the arguments will in reality be TCHAR's, but you get the point).

So your source code will be "independent" in the sense that it can be compiled into either an "ANSI" or Widechar version. For this to work you need to use TCHAR's instead of the primitive types.

2) If I have a simple printf statement, i.e. printf("Hello World\n"); with no character strings, can I just leave it be without using _tprintf and _T("...")? If the printf statement includes a character string, then I should use _tprintf and _T("..."), i.e. _tprintf("Hello %s\n", name); ?

I don't know the tprintf family other than I can speculate they work in the same way as the defines above. That is, tprintf takes TCHAR's as argument and dependent on the UNICODE setting either treats them as chars or wchar_ts.

3) If I have a text file (saved in the default format, i.e. without changing the default character set used) that I want to read into a buffer, can I still use char instead of TCHAR? Especially if I'm reading it character by character, i.e. by incrementing the character pointer?

What character encoding the contents of a file uses is entirely up to itself and has nothing to do with TCHAR's. TCHAR's are for filenames and such that you use in win32 API calls.

小草泠泠 2024-08-27 16:57:00

假设我正在用 Unicode 编译程序(但最终,我想要一个独立于所使用的字符集的解决方案)。

这将取决于您的语言 - 例如编程语言而不是人类口语。 “用 Unicode 编译我的程序”是什么意思?

  1. 所有“char”都会被解释为宽字符吗?

    • 这取决于语言和所选选项。例如,Java 使用 16 位字符(存储 UTF-16 或 UCS-2 - 很久以前它是 UCS-2,但我认为现在是 UTF-16)。在 C 中,您必须相当努力地将基本的“char”类型解释为 8 位数量以外的任何类型 - 至少在基于 Unix 的编译器上是这样。
  2. 如果我有一个简单的 printf 语句,即 printf("Hello World\n");如果没有字符串,我可以保留它而不使用 _tprintf 和 _T("...") 吗?如果printf语句包含字符串,那么我应该使用_tprintf和_T("..."),即_tprintf("Hello %s\n", name); ?

    • 这需要对您正在使用的平台有一定的了解,因为它远非标准。我怀疑这是 MSVC...这让我更难获得权威,因为我不使用 MSVC。但是,ISO C99 标准(MSVC 显然不支持该标准)提供了诸如 fwprintf() 之类的函数来打印宽字符字符串。如果您需要有关特定编译器的信息,请使用正确的信息标记您的问题。
  3. 如果我有一个文本文件(以默认格式保存,即不更改使用的默认字符集)想要读入缓冲区,我仍然可以使用 char 而不是 TCHAR 吗?特别是如果我逐个字符地读取它,即通过递增字符指针?

    • 再次强调,TCHAR 不是标准 - 它是高度特定于 MSVC 的。在标准 C 中,当您对其应用适当的函数时,文件流将获得一个“方向”(面向宽的或面向字节的)。它会保持该方向,直到关闭(或使用 freopen() 重新打开)。

Say I'm compiling my program in Unicode (but ultimately, I want a solution that is independent of the character set used).

This is going to depend on your language - as in programming language rather than human-spoken language. What do you mean by 'compiling my program in Unicode'?

  1. Will all 'char' be interpreted as wide characters?

    • It depends on the language and the options chosen. For example, Java uses 16-bit characters (storing UTF-16 or UCS-2 - once upon a long time ago it was UCS-2 but I assume it is now UTF-16). In C, you will have to work rather hard to get the basic 'char' type interpreted as anything other than an 8-bit quantity - at least on the Unix-based compilers.
  2. If I have a simple printf statement, i.e. printf("Hello World\n"); with no character strings, can I just leave it be without using _tprintf and _T("...")? If the printf statement includes a character string, then I should use _tprintf and _T("..."), i.e. _tprintf("Hello %s\n", name); ?

    • This requires some understanding of the platform you are working on, since it is far from being standard. I suspect this is MSVC...which makes it more difficult for me to be authoritative since I don't use MSVC. However, the ISO C99 standard (which is signally not supported by MSVC) provides functions such as fwprintf() to print strings of wide characters. If you need information about your specific compiler, tag your question with the correct information.
  3. If I have a text file (saved in the default format, i.e. without changing the default character set used) that I want to read into a buffer, can I still use char instead of TCHAR? Especially if I'm reading it character by character, i.e. by incrementing the character pointer?

    • Again, TCHAR is not standard - it is highly specific to MSVC. In standard C, a file stream acquires an 'orientation' (wide-oriented or byte-oriented) when you apply appropriate functions to it. It stays in that orientation until it is closed (or reopened with freopen()).
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文