C++0x 中的新 unicode 字符

发布于 2024-07-19 07:25:00 字数 1555 浏览 4 评论 0原文

我正在构建一个 API,它允许我获取各种编码的字符串,包括 utf8、utf16、utf32 和 wchar_t(根据操作系统,可能是 utf32 或 utf16)。

  1. 新的 C++ 标准引入了新类型 char16_tchar32_t,它们没有这种 sizeof 歧义,应该在将来使用,所以我也想支持它们,但问题是,它们会干扰正常的 uint16_tuint32_twchar_t 类型不允许重载,因为它们可以指相同类型吗?

    class some_class { 
      民众: 
          无效集(std::字符串);   // utf8字符串 
          无效集(std::wstring);   // wchar字符串根据utf16或utf32 
                                   // 到 sizeof(wchar_t) 
          无效集(std::basic_string) 
                               // wchar独立的utf16字符串 
          无效集(std::basic_string); 
                               // wchar独立的utf32字符串 
    
      #ifdef HAVE_NEW_UNICODE_CHARRECTERS 
          无效集(std::basic_string) 
                               // 新的标准utf16字符串 
          无效集(std::basic_string); 
                               // 新的标准utf32字符串 
      #万一 
      }; 
      

    所以我可以写:

    foo.set(U"一些 utf32 字符串"); 
      foo.set(u"一些 utf16 字符串"); 
      
  2. 今天的 std::basic_stringstd::basic_string 的 typedef 是什么:

    typedef basic_string;   字符串。 
      

    我找不到任何参考。

    编辑:根据 gcc-4.4 的标题,引入了这些新类型:

    typedef basic_string;   u16字符串; 
      typedef basic_string;   u32字符串; 
      

    我只是想确保这是实际的标准要求,而不是 gcc-ism。

I'm buiding an API that allows me to fetch strings in various encodings, including utf8, utf16, utf32 and wchar_t (that may be utf32 or utf16 according to OS).

  1. New C++ standard had introduced new types char16_t and char32_t that do not have this sizeof ambiguity and should be used in future, so I would like to support them as well, but the question is, would they interfere with normal uint16_t, uint32_t, wchar_t types not allowing overload because they may refer to same type?

    class some_class {
    public:
        void set(std::string); // utf8 string
        void set(std::wstring); // wchar string utf16 or utf32 according
                                 // to sizeof(wchar_t)
        void set(std::basic_string<uint16_t>)
                             // wchar independent utf16 string
        void set(std::basic_string<uint32_t>);
                             // wchar independent utf32 string
    
    #ifdef HAVE_NEW_UNICODE_CHARRECTERS
        void set(std::basic_string<char16_t>)
                             // new standard utf16 string
        void set(std::basic_string<char32_t>);
                             // new standard utf32 string
    #endif
    };
    

    So I can just write:

    foo.set(U"Some utf32 String");
    foo.set(u"Some utf16 string");
    
  2. What are the typedef of std::basic_string<char16_t> and std::basic_string<char32_t> as there is today:

    typedef basic_string<wchar_t> wstring.
    

    I can't find any reference.

    Edit: according to headers of gcc-4.4, that introduced these new types:

    typedef basic_string<char16_t> u16string;
    typedef basic_string<char32_t> u32string;
    

    I just want to make sure that this is actual standard requirement and not gcc-ism.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

_畞蕅 2024-07-26 07:25:00

1) char16_tchar32_t 将是不同的新类型,因此可以对它们进行重载。

引自ISO/IEC JTC1 SC22 WG21 N2018:

char16_t 定义为 a 的 typedef
独特的新类型,名称为
_Char16_tuint_least16_t 具有相同的大小和表示形式。
同样,将 char32_t 定义为
typedef 为一个独特的新类型,其中
具有相同名称的_Char32_t
大小和表示为
uint_least32_t

进一步说明(摘自 devx.com 文章“为 Unicode 革命做好准备” ):

您可能想知道为什么
首先需要 _Char16_t_Char32_t 类型和关键字
当 typedefs uint_least16_t
uint_least32_t 已经可用。
新类型的主要问题
解决超载。 下雪了
可以重载函数
采用 _Char16_t_Char32_t
参数,并创建专业化
例如 std::basic_string<_Char16_t>
不同于
std::basic_string

2) u16stringu32string 确实是 C++0x 的一部分,而不仅仅是 GCC 的一部分,正如 u16string 中提到的那样。 google.com/search?q=u16string+site%3Aopen-std.org" rel="noreferrer">各种标准草稿文件。 它们将包含在新的 标头中。 引用同一篇文章:

标准库还将提供
_Char16_t_Char32_t typedef,类似于 typedef wstring
wcout 等,适用于以下标准类:

filebuf、streambuf、streampos、streamoff、ios、istream、ostream、fstream、
ifstream、ofstream、stringstream、istingstream、ostringstream、
字符串

1) char16_t and char32_t will be distinct new types, so overloading on them will be possible.

Quote from ISO/IEC JTC1 SC22 WG21 N2018:

Define char16_t to be a typedef to a
distinct new type, with the name
_Char16_t that has the same size and representation as uint_least16_t.
Likewise, define char32_t to be a
typedef to a distinct new type, with
the name _Char32_t that has the same
size and representation as
uint_least32_t.

Further explanation (from a devx.com article "Prepare Yourself for the Unicode Revolution"):

You're probably wondering why the
_Char16_t and _Char32_t types and keywords are needed in the first place
when the typedefs uint_least16_t and
uint_least32_t are already available.
The main problem that the new types
solve is overloading. It's now
possible to overload functions that
take _Char16_t and _Char32_t
arguments, and create specializations
such as std::basic_string<_Char16_t>
that are distinct from
std::basic_string <wchar_t>.

2) u16string and u32string are indeed part of C++0x and not just GCC'isms, as they are mentioned in various standard draft papers. They will be included in the new <string> header. Quote from the same article:

The Standard Library will also provide
_Char16_t and _Char32_t typedefs, in analogy to the typedefs wstring,
wcout, etc., for the following standard classes:

filebuf, streambuf, streampos, streamoff, ios, istream, ostream, fstream,
ifstream, ofstream, stringstream, istringstream, ostringstream,
string

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文