c++:如何从 UTF-8 代码点创建无符号字符

发布于 2024-12-22 04:41:21 字数 742 浏览 4 评论 0原文

我正在使用 C++ 库,需要从 UTF-8 代码点创建一个无符号字符。例如,如果代码点为 十进制 610 (“拉丁字母小写 G”),我如何在 C++ 中创建它?

我的javascript,我可以执行以下操作:

var temp = String.fromCharCode(610);
console.log(temp); // Outputs a small 'G' (correct)
var codePoint = temp.charCodeAt(0);
console.log(codePoint); // Outputs 610 (correct)

在C++中已尝试:

unsigned char temp = (unsigned char)610;
// compiles, but
Debug::WriteLine((int)temp); // outputs 98 (??)

请提供C++中的代码示例,其执行与上面的javascript示例相同。

该环境采用托管 C++,但我想避免使用 CLR 类型,因为我正在与第 3 方库进行交互。

I'm working with a C++ library, and need to create an unsigned char from a UTF-8 code point. For example, if the code point is decimal 610 (a 'latin letter small capital G'), how would I create this in C++?

I javascript, I can do the following:

var temp = String.fromCharCode(610);
console.log(temp); // Outputs a small 'G' (correct)
var codePoint = temp.charCodeAt(0);
console.log(codePoint); // Outputs 610 (correct)

In C++ have tried:

unsigned char temp = (unsigned char)610;
// compiles, but
Debug::WriteLine((int)temp); // outputs 98 (??)

Please provide a code example in C++ which performs the same as the javascript example above.

The environment is in managed C++, but I want to avoid using CLR types as I'm interfacing with a 3rd party library.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

郁金香雨 2024-12-29 04:41:21

unsigned char 太小,无法容纳 610 的值(假设 char 是 8 位宽,它只能容纳从 0 到 255 的值),因此它将 环绕*

使用 char16_t 存储 16 位字符(或char32_t 用于 32 位字符,UTF-8 需要)。

char32_t temp = (char32_t)610;
Debug::WriteLine(temp); // outputs 610 (!!)

如果您想处理 UTF-8 字符串,请使用 UTF-8 字符串文字:

u8"I'm a UTF-8 string."

*在您的示例中它甚至会环绕两次:

610 - 256 - 256 = 98

An unsigned char is to small to hold a value of 610 (assuming a char is 8 bits wide, it can only hold values from 0 to 255), so it will wrap around*

Use char16_t to store a 16-bit char (or char32_t for a 32-bit char, which UTF-8 requires).

char32_t temp = (char32_t)610;
Debug::WriteLine(temp); // outputs 610 (!!)

If you want to handle UTF-8 strings, use UTF-8 string literals:

u8"I'm a UTF-8 string."

*It will wrap around even twice in your example:

610 - 256 - 256 = 98

雨的味道风的声音 2024-12-29 04:41:21

Unicode 代码点可能需要 32 位表示。在大多数西方语言中,16 位就足够了,但要处理所有可能的 Unicode 代码点,您确实需要 32 位。

uint32_t codePoint = someString.CodePointAt(x);

您可以在这里阅读更多相关信息:http://en.wikipedia.org/wiki/Code_point

Unicode code points may need 32 bit representations. In most western languages, 16 bits are enough, but to handle all possible Unicode code points, you really do need 32 bits.

uint32_t codePoint = someString.CodePointAt(x);

You can read more about it here: http://en.wikipedia.org/wiki/Code_point.

压抑⊿情绪 2024-12-29 04:41:21

如果你的意思是你想创建一个指向 Unicode 代码点 610 的 UTF-8 表示形式的 unsigned char,你可以这样做:

char unsigned temp[] = { 0xc9, 0xa2 };

If you mean you want to create an unsigned char pointing to the UTF-8 representation of the Unicode code point 610 you could do:

char unsigned temp[] = { 0xc9, 0xa2 };
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文