c++：如何从 UTF-8 代码点创建无符号字符

发布于 2024-12-22 04:41:21 字数 742 浏览 4 评论 0原文

我正在使用 C++ 库，需要从 UTF-8 代码点创建一个无符号字符。例如，如果代码点为十进制 610 （“拉丁字母小写 G”），我如何在 C++ 中创建它？

我的javascript，我可以执行以下操作：

var temp = String.fromCharCode(610);
console.log(temp); // Outputs a small 'G' (correct)
var codePoint = temp.charCodeAt(0);
console.log(codePoint); // Outputs 610 (correct)

在C++中已尝试：

unsigned char temp = (unsigned char)610;
// compiles, but
Debug::WriteLine((int)temp); // outputs 98 (??)

请提供C++中的代码示例，其执行与上面的javascript示例相同。

该环境采用托管 C++，但我想避免使用 CLR 类型，因为我正在与第 3 方库进行交互。

原文

I'm working with a C++ library, and need to create an unsigned char from a UTF-8 code point. For example, if the code point is decimal 610 (a 'latin letter small capital G'), how would I create this in C++?

I javascript, I can do the following:

var temp = String.fromCharCode(610);
console.log(temp); // Outputs a small 'G' (correct)
var codePoint = temp.charCodeAt(0);
console.log(codePoint); // Outputs 610 (correct)

In C++ have tried:

unsigned char temp = (unsigned char)610;
// compiles, but
Debug::WriteLine((int)temp); // outputs 98 (??)

Please provide a code example in C++ which performs the same as the javascript example above.

The environment is in managed C++, but I want to avoid using CLR types as I'm interfacing with a 3rd party library.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

郁金香雨 2024-12-29 04:41:21

unsigned char 太小，无法容纳 610 的值（假设 char 是 8 位宽，它只能容纳从 0 到 255 的值），因此它将环绕*

使用 char16_t 存储 16 位字符（或char32_t 用于 32 位字符，UTF-8 需要）。

char32_t temp = (char32_t)610;
Debug::WriteLine(temp); // outputs 610 (!!)

如果您想处理 UTF-8 字符串，请使用 UTF-8 字符串文字：

u8"I'm a UTF-8 string."

*在您的示例中它甚至会环绕两次：

610 - 256 - 256 = 98

An unsigned char is to small to hold a value of 610 (assuming a char is 8 bits wide, it can only hold values from 0 to 255), so it will wrap around*

Use char16_t to store a 16-bit char (or char32_t for a 32-bit char, which UTF-8 requires).

char32_t temp = (char32_t)610;
Debug::WriteLine(temp); // outputs 610 (!!)

If you want to handle UTF-8 strings, use UTF-8 string literals:

u8"I'm a UTF-8 string."

*It will wrap around even twice in your example:

610 - 256 - 256 = 98

回复收藏 0 原文

雨的味道风的声音 2024-12-29 04:41:21

Unicode 代码点可能需要 32 位表示。在大多数西方语言中，16 位就足够了，但要处理所有可能的 Unicode 代码点，您确实需要 32 位。

uint32_t codePoint = someString.CodePointAt(x);

您可以在这里阅读更多相关信息：http://en.wikipedia.org/wiki/Code_point 。

Unicode code points may need 32 bit representations. In most western languages, 16 bits are enough, but to handle all possible Unicode code points, you really do need 32 bits.

uint32_t codePoint = someString.CodePointAt(x);

You can read more about it here: http://en.wikipedia.org/wiki/Code_point.

回复收藏 0 原文

压抑⊿情绪 2024-12-29 04:41:21

如果你的意思是你想创建一个指向 Unicode 代码点 610 的 UTF-8 表示形式的 unsigned char，你可以这样做：

char unsigned temp[] = { 0xc9, 0xa2 };

If you mean you want to create an unsigned char pointing to the UTF-8 representation of the Unicode code point 610 you could do:

char unsigned temp[] = { 0xc9, 0xa2 };

回复收藏 0 原文

~没有更多了~

关于作者

对你而言

暂无简介

文章

30 人气

关注发私信

友情链接

文江博客

c++：如何从 UTF-8 代码点创建无符号字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

c++：如何从 UTF-8 代码点创建无符号字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。