C++ 中的可移植 wchar_t

发布于 2024-07-13 02:49:05 字数 105 浏览 6 评论 0原文

C++ 中有可移植的 wchar_t 吗? 在 Windows 上,它是 2 个字节。 其他都是 4 个字节。 我想在我的应用程序中使用 wstring,但是如果我决定移植它,这会导致问题。

Is there a portable wchar_t in C++? On Windows, its 2 bytes. On everything else is 4 bytes. I would like to use wstring in my application, but this will cause problems if I decide down the line to port it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

清音悠歌 2024-07-20 02:49:05

如果您正在处理程序内部的使用,请不要担心; A 类中的 wchar_t 与 B 类中的相同。

如果您打算在 Windows 和 Linux/MacOSX 版本之间传输数据,那么您需要担心的不仅仅是 wchar_t,并且您需要想出处理方法所有细节。

您可以定义一个类型,在任何地方都将其定义为四个字节,并实现您自己的字符串等(因为 C++ 中的大多数文本处理都是模板化的),但我不知道这是否能满足您的需求。

类似于 typedef int my_char; typedef std::basic_string; my_string;

If you're dealing with use internal to the program, don't worry about it; a wchar_t in class A is the same as in class B.

If you're planning to transfer data between Windows and Linux/MacOSX versions, you've got more than wchar_t to worry about, and you need to come up with means to handle all the details.

You could define a type that you'll define to be four bytes everywhere, and implement your own strings, etc. (since most text handling in C++ is templated), but I don't know how well that would work for your needs.

Something like typedef int my_char; typedef std::basic_string<my_char> my_string;

小嗷兮 2024-07-20 02:49:05

“便携式 wchar_t”是什么意思? 有一种 uint16_t 类型,它到处都是 16 位宽,这种类型经常可用。 但这当然还没有构成一个字符串。 字符串必须知道其编码才能理解 length()substring() 等函数(因此它不会在字符串中间剪切字符)使用 utf8 或 16 时的代码点)。 我知道您可以使用一些与 unicode 兼容的字符串类。 所有这些都可以免费在商业程序中使用(Qt 将在几个月内免费兼容商业程序,当 Qt 4.5 发布时)。

ustring来自 gtkmm 项目。 如果您使用 gtkmm 或使用 glibmm 进行编程,那应该是首选,它内部使用 utf-8Qt 还有一个字符串类,称为 QString。 它采用 utf-16 编码。 ICU 是另一个创建可移植 unicode 字符串类的项目,并且有一个UnicodeString 类,内部似乎是用 utf-16 编码的,就像 Qt 一样。 不过还没用过那个。

What do you mean by "portable wchar_t"? There is a uint16_t type that is 16bits wide everywhere, which is often available. But that of course doesn't make up a string yet. A string has to know of its encoding to make sense of functions like length(), substring() and so on (so it doesn't cut characters in the middle of a code point when using utf8 or 16). There are some unicode compatible string classes i know of that you can use. All can be used in commercial programs for free (the Qt one will be compatible with commercial programs for free in a couple of months, when Qt 4.5 is released).

ustring from the gtkmm project. If you program with gtkmm or use glibmm, that should be the first choice, it uses utf-8 internally. Qt also has a string class, called QString. It's encoded in utf-16. ICU is another project that creates portable unicode string classes, and has a UnicodeString class that internally seems to be encoded in utf-16, like Qt. Haven't used that one though.

蓝颜夕 2024-07-20 02:49:05

拟议的 C++0x 标准将具有 char16_tchar32_t 类型。 在那之前,您将不得不重新使用整数来表示非 wchar_t 字符类型。

#if defined(__STDC_ISO_10646__)
    #define WCHAR_IS_UTF32
#elif defined(_WIN32) || defined(_WIN64)
    #define WCHAR_IS_UTF16
#endif

#if defined(__STDC_UTF_16__)
    typedef _Char16_t CHAR16;
#elif defined(WCHAR_IS_UTF16)
    typedef wchar_t CHAR16;
#else
    typedef uint16_t CHAR16;
#endif

#if defined(__STDC_UTF_32__)
    typedef _Char32_t CHAR32;
#elif defined(WCHAR_IS_UTF32)
    typedef wchar_t CHAR32;
#else
    typedef uint32_t CHAR32;
#endif

根据标准,您需要专门化 char_traits 为整数类型。 但在 Visual Studio 2005 上,我已经摆脱了 std::basic_string 的困扰,没有进行任何特殊处理。

我计划使用 SQLite 数据库。

那么您需要使用 UTF-16,而不是 wchar_t

SQLite API 也有 UTF-8 版本。 您可能想使用它而不是处理 wchar_t 差异。

The proposed C++0x standard will have char16_t and char32_t types. Until then, you'll have to fall back on using integers for the non-wchar_t character type.

#if defined(__STDC_ISO_10646__)
    #define WCHAR_IS_UTF32
#elif defined(_WIN32) || defined(_WIN64)
    #define WCHAR_IS_UTF16
#endif

#if defined(__STDC_UTF_16__)
    typedef _Char16_t CHAR16;
#elif defined(WCHAR_IS_UTF16)
    typedef wchar_t CHAR16;
#else
    typedef uint16_t CHAR16;
#endif

#if defined(__STDC_UTF_32__)
    typedef _Char32_t CHAR32;
#elif defined(WCHAR_IS_UTF32)
    typedef wchar_t CHAR32;
#else
    typedef uint32_t CHAR32;
#endif

According to the standard, you'll need to specialize char_traits for the integer types. But on Visual Studio 2005, I've gotten away with std::basic_string<CHAR32> with no special handling.

I plan to use a SQLite database.

Then you'll need to use UTF-16, not wchar_t.

The SQLite API also has a UTF-8 version. You may want to use that instead of dealing with the wchar_t differences.

零度° 2024-07-20 02:49:05

我的建议。 使用 UTF-8 和 std::string。 宽弦不会给你带来太多的附加值。 因为无论如何你都不能将宽字符解释为字母,因为某些字符是从多个 unicode 代码点创建的。

因此,在任何地方都可以使用 UTF-8 并使用好的库来处理自然语言。 例如 Boost.Locale。

坏主意:定义类似 typedef uint32_t mychar; 的东西是不好的。 由于您不能将 iostream 与它一起使用,因此您无法创建基于此字符的字符串流,因为您将无法在其中写入。

例如,这不起作用:

std::basic_ostringstream<unsigned> s;
ss << 10;

不会为您创建字符串。

My suggestion. Use UTF-8 and std::string. Wide strings would not bring you too much added value. As you anyway can't interpret wide character as letter as some characters crated from several unicode code points.

So use anywhere UTF-8 and use good library to deal with natural languages. Like for example Boost.Locale.

Bad idea: define something like typedef uint32_t mychar; is bad. As you can't use iostream with it, you can't create for example stringstream based in this character as you would not be able to write in it.

For example this would not work:

std::basic_ostringstream<unsigned> s;
ss << 10;

Would not create you a string.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文