Microsoft 使用什么作为 Unicode 字符串的数据类型?
我正在学习 C++,并在 MSDN 上看到一篇文章:
http ://msdn.microsoft.com/en-us/magazine/dd861344.aspx
在第一个代码示例中,与我的问题相关的一行代码如下:
VERIFY(SetWindowText(L"Direct2D Sample"));
更具体地说,L 前缀。我读了一点,如果我错了,请纠正我:-),但这是为了允许 unicode 字符串,即准备长字符集。现在,在阅读本文时,我在这里看到了另一篇关于 C 语言高级字符串技术的文章 http ://www.flipcode.com/archives/Advanced_String_Techniques_in_C-Part_I_Unicode.shtml
它说有一些选项,包括包含标题:
#define UNICODE
OR
#define _UNICODE
in C ,再次指出我是否错了,感谢您的反馈。此外,它还显示了适合这些 unicode 字符串的数据类型是:
wchar_t
它将宏和一种混合数据类型混合在一起,宏是:
_TEXT(t)
它简单地在字符串前面加上 L 和混合数据类型,
TCHAR
它指出将允许如果标头存在则为 unicode,如果不存在则为 ASCII。现在我的问题是,或者更多的是我想确认的假设,微软是否会使用这种更灵活的 TCHAR 数据类型,或者承诺使用 wchar_t 是否有任何好处。
另外,当我说 Microsoft 是否使用此功能时,更具体地说,例如在 ATL 和 WTL 库中,你们中是否有人对此有偏好或有一些建议?
干杯,
安德鲁
I am in the process of learning C++ and came across an article on the MSDN here:
http://msdn.microsoft.com/en-us/magazine/dd861344.aspx
In the first code example the one line of code which my question relates to is the following:
VERIFY(SetWindowText(L"Direct2D Sample"));
More specifically that L prefix. I had a little read up, and correct me if I am wrong :-), but this is to allow for unicode strings, i.e. to prep for a long character set. Now in during my read up on this I came across another article on Adavnced String Techniques in C here http://www.flipcode.com/archives/Advanced_String_Techniques_in_C-Part_I_Unicode.shtml
It says there are a few options including the inclusion of the header:
#define UNICODE
OR
#define _UNICODE
in C , again point out if I am wrong, appreciate your feedback. Further it shows the datatype suitable for these unicode strings being:
wchar_t
It throws into the mix a macro and a kind of hybrid datatype, the macro being:
_TEXT(t)
which simply prefixes the string with the L and the hybrid data type as
TCHAR
Which it points out will allow for unicode if the header is there and ASCII if not. Now my question is, or more of an asumption which I would like to confirm, would Microsoft use this TCHAR data type which is more flexible or is there any benefit to committing to using the wchar_t.
Also when I say does Microsoft use this, more specifically for exmaple in the ATL and WTL libraries, do anyone of yourselves have preference or have some advice regarding this?
Cheers,
Andrew
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
对于所有新软件,您应该定义 UNICODE 并直接使用 wchar_t。使用 ANSI 搅拌会再次困扰您。
您应该只使用 wchar_t 和所有 CRT 函数的宽版本(例如:wcscmp 而不是 strcmp)。如果您的代码需要在 ANSI 和 UNICODE 环境中工作(我觉得代码很少需要这样做),则 TEXT 宏和 TCHAR 等就存在。
当您使用 Visual Studio 创建新的 Windows 应用程序时,会自动定义 UNICODE,并且 wchar_t 将像内置一样工作。
For all new software you should define UNICODE and use wchar_t directly. Using ANSI stirngs will come back to haunt you.
You should just use wchar_t and the wide versions of all the CRT functions (ex: wcscmp instead of strcmp). The TEXT macros and TCHAR etc just exist if your code needs to work in both ANSI and UNICODE environments which I feel code rarely needs to do.
When you create a new windows application using Visual Studio UNICODE is automatically defined and wchar_t will work like a built-in.
简短回答:具有
TCHAR
类型、_TEXT()
宏和各种_t*
函数 (_tcscpy 浮现在脑海中)让人回想起 Microsoft 有两个平台共存的时代:
这里的字符串表示形式意味着所有期望或返回字符串到您的应用程序的 Windows API 使用这些字符串的一种或另一种表示形式。 COM 增加了更多的混乱,因为它在两个平台上都可用——并且期望在两个平台上都有 Unicode 字符串!
在过去,人们鼓励您编写“可移植”代码:您被指示为字符串使用混合基础架构,以便您只需为应用程序定义/取消定义 UNICODE 和/或 _UNICODE 即可为两种模型进行编译。
由于 Windows9x 系列不再相关(无论如何对于绝大多数应用程序),您可以安全地忽略 ANSI 世界并直接使用 Unicode 字符串。
请注意,如今 Unicode 有多种表示形式:正如上面指出的,wchar_t 隐含的 Unicode 约定是 UCS-2 表示形式(所有字符都以 16 位字编码)。还有其他广泛使用的表示形式,但情况不一定如此。
Short answer: the hybrid infrastructure with the
TCHAR
type, the_TEXT()
macro and the various_t*
functions (_tcscpy
comes to mind) are a throwback to the times when Microsoft had two platforms coexisting:String representation here means that all the Windows APIs that expected or returned string to your app used one or the other representation for these strings. COM added even more confusion as it was available on both platforms -- and expected Unicode strings on both!
In those old times it was encouraged that you write "portable" code: you were instructed to use the hybrid infrastructure for your strings so that you can compile for both models just by defining/undefining UNICODE and/or _UNICODE for your app.
As the Windows9x line is no more relevant (for the vast majority of the apps anyway) you can safely ignore the ANSI world and use the Unicode strings directly.
Beware though that Unicode has multiple representations today: as it is pointed out above the Unicode convention implied by wchar_t is the UCS-2 representation (all characters encoded in 16-bit words). There are other, widely used representations where this is not necessarily true.
在 Windows 上,它是 UTF-16(2 字节)编码的 wchar_t。
来源:http://www.firstobject.com/wchar_t- string-on-linux-osx-windows.htm
On Windows it's wchar_t with UTF-16 (2 bytes) encoding.
Source : http://www.firstobject.com/wchar_t-string-on-linux-osx-windows.htm
TCHAR 会根据是否定义了 UNICODE 来更改其类型,并且当您需要可以为 UNICODE 和非 UNICODE 编译的代码时应使用 TCHAR。
如果您只想显式处理 UNICODE 数据,请随意使用 wchar_t。
TCHAR changes its type depending if UNICODE is defined, and should be used when you want code that you can compile for UNICODE and non-UNICODE.
If you want to explicitly process UNICODE data only, then feel free to use wchar_t.