Unicode 与多字节
我真的对 unicode 与多字节的事情感到困惑。
假设我正在用 Unicode 编译程序(但最终,我想要一个独立于所使用的字符集的解决方案)。
1)所有“char”都会被解释为宽字符吗?
2)如果我有一个简单的 printf 语句,即 printf("Hello World\n");如果没有字符串,我可以保留它而不使用 _tprintf 和 _T("...") 吗?如果printf语句包含字符串,那么我应该使用_tprintf和_T("..."),即_tprintf("Hello %s\n", name); ?
3)如果我有一个文本文件(以默认格式保存,即不更改使用的默认字符集),我想读入缓冲区,我仍然可以使用char而不是TCHAR吗?特别是如果我逐个字符地读取它,即通过递增字符指针?
谢谢。
问候, 雷恩
I'm really confused by this unicode vs multi-byte thing.
Say I'm compiling my program in Unicode (but ultimately, I want a solution that is independent of the character set used).
1) Will all 'char' be interpreted as wide characters?
2) If I have a simple printf statement, i.e. printf("Hello World\n"); with no character strings, can I just leave it be without using _tprintf and _T("...")? If the printf statement includes a character string, then I should use _tprintf and _T("..."), i.e. _tprintf("Hello %s\n", name); ?
3) If I have a text file (saved in the default format, i.e. without changing the default character set used) that I want to read into a buffer, can I still use char instead of TCHAR? Especially if I'm reading it character by character, i.e. by incrementing the character pointer?
Thank you.
Regards,
Rayne
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先,如果您使用
UNICODE
/_UNICODE
进行编译并且不打算针对其他平台,则可以避免使用TCHAR
业务,并且到处使用WCHAR
(或wchar_t
)和 W 函数。根据定义,C 中的 char 是 1 个字节。 (从技术上讲,这并不排除它在
wchar_t
也是 1 字节的平台上成为“宽字符”,但考虑到您正在使用 MSVC 并且面向 Windows 平台,这不会是)因此,出于实际目的,答案是:不。
如果您要打印 ASCII 字符串文字,则可以继续使用
printf
。如果您要打印可能超出 ASCII 范围的任意字符串,则应使用
_tprintf
(或wprintf
)。什么是“默认格式”?
当您读取外部文件时,您应该首先读取前几个字节以检查 UTF-16 或 UTF-8 BOM,然后据此做出决定。
First, if you're compiling with
UNICODE
/_UNICODE
and don't intend to target other platforms, you can avoid using theTCHAR
business and useWCHAR
(orwchar_t
) and W functions everywhere.char
in C is--by definition--1 byte. (This doesn't technically preclude it from being a "wide character" on platforms wherewchar_t
is also 1 byte, but given that you're using MSVC and are targeting Windows platforms, that's not going to be the case.)So for practical purposes, the answer to this is: no.
If you're printing ASCII string literals, you can continue using
printf
.If you're printing arbitrary strings that could lie outside of the ASCII range, you should use
_tprintf
(orwprintf
).What is "the default format"?
When you're reading in an external file, you should read in the first few bytes first to check for a UTF-16 or UTF-8 BOM, and then base your decisions around that.
不会。但是所有
TCHAR
都将被解释为wchar_t
考虑 winnt.h 可能如何指定这一点:
当您调用
SomeApi()
时将包装为SomeApiA(char *arg)
或SomeApiW(wchar_t *arg)
。 (参数实际上是TCHAR
的,但你明白了)。因此,您的源代码将是“独立的”,因为它可以编译为“ANSI”或 Widechar 版本。为此,您需要使用
TCHAR
而不是原始类型。我不知道 tprintf 系列,但我可以推测它们的工作方式与上面的定义相同。也就是说,tprintf 将 TCHAR 作为参数,并依赖于 UNICODE 设置将它们视为 char或
wchar_t
。文件内容使用什么字符编码完全取决于其本身,与
TCHAR
无关。TCHAR
用于文件名,以便您在 win32 API 调用中使用。No. But all
TCHAR
s will be interpreted aswchar_t
sConsider how winnt.h would probably specify this:
When you call
SomeApi()
it will wrap to eitherSomeApiA(char *arg)
orSomeApiW(wchar_t *arg)
. (the arguments will in reality beTCHAR
's, but you get the point).So your source code will be "independent" in the sense that it can be compiled into either an "ANSI" or Widechar version. For this to work you need to use
TCHAR
's instead of the primitive types.I don't know the
tprintf
family other than I can speculate they work in the same way as the defines above. That is,tprintf
takesTCHAR
's as argument and dependent on theUNICODE
setting either treats them aschar
s orwchar_t
s.What character encoding the contents of a file uses is entirely up to itself and has nothing to do with
TCHAR
's.TCHAR
's are for filenames and such that you use in win32 API calls.这将取决于您的语言 - 例如编程语言而不是人类口语。 “用 Unicode 编译我的程序”是什么意思?
所有“char”都会被解释为宽字符吗?
如果我有一个简单的 printf 语句,即 printf("Hello World\n");如果没有字符串,我可以保留它而不使用 _tprintf 和 _T("...") 吗?如果printf语句包含字符串,那么我应该使用_tprintf和_T("..."),即_tprintf("Hello %s\n", name); ?
fwprintf()
之类的函数来打印宽字符字符串。如果您需要有关特定编译器的信息,请使用正确的信息标记您的问题。如果我有一个文本文件(以默认格式保存,即不更改使用的默认字符集)想要读入缓冲区,我仍然可以使用 char 而不是 TCHAR 吗?特别是如果我逐个字符地读取它,即通过递增字符指针?
freopen()
重新打开)。This is going to depend on your language - as in programming language rather than human-spoken language. What do you mean by 'compiling my program in Unicode'?
Will all 'char' be interpreted as wide characters?
If I have a simple printf statement, i.e. printf("Hello World\n"); with no character strings, can I just leave it be without using _tprintf and _T("...")? If the printf statement includes a character string, then I should use _tprintf and _T("..."), i.e. _tprintf("Hello %s\n", name); ?
fwprintf()
to print strings of wide characters. If you need information about your specific compiler, tag your question with the correct information.If I have a text file (saved in the default format, i.e. without changing the default character set used) that I want to read into a buffer, can I still use char instead of TCHAR? Especially if I'm reading it character by character, i.e. by incrementing the character pointer?
freopen()
).