c++:如何从 UTF-8 代码点创建无符号字符
我正在使用 C++ 库,需要从 UTF-8 代码点创建一个无符号字符。例如,如果代码点为 十进制 610 (“拉丁字母小写 G”),我如何在 C++ 中创建它?
我的javascript,我可以执行以下操作:
var temp = String.fromCharCode(610);
console.log(temp); // Outputs a small 'G' (correct)
var codePoint = temp.charCodeAt(0);
console.log(codePoint); // Outputs 610 (correct)
在C++中已尝试:
unsigned char temp = (unsigned char)610;
// compiles, but
Debug::WriteLine((int)temp); // outputs 98 (??)
请提供C++中的代码示例,其执行与上面的javascript示例相同。
该环境采用托管 C++,但我想避免使用 CLR 类型,因为我正在与第 3 方库进行交互。
I'm working with a C++ library, and need to create an unsigned char from a UTF-8 code point. For example, if the code point is decimal 610 (a 'latin letter small capital G'), how would I create this in C++?
I javascript, I can do the following:
var temp = String.fromCharCode(610);
console.log(temp); // Outputs a small 'G' (correct)
var codePoint = temp.charCodeAt(0);
console.log(codePoint); // Outputs 610 (correct)
In C++ have tried:
unsigned char temp = (unsigned char)610;
// compiles, but
Debug::WriteLine((int)temp); // outputs 98 (??)
Please provide a code example in C++ which performs the same as the javascript example above.
The environment is in managed C++, but I want to avoid using CLR types as I'm interfacing with a 3rd party library.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
unsigned char
太小,无法容纳 610 的值(假设 char 是 8 位宽,它只能容纳从 0 到 255 的值),因此它将 环绕*使用
char16_t
存储 16 位字符(或char32_t
用于 32 位字符,UTF-8 需要)。如果您想处理 UTF-8 字符串,请使用 UTF-8 字符串文字:
*在您的示例中它甚至会环绕两次:
An
unsigned char
is to small to hold a value of 610 (assuming a char is 8 bits wide, it can only hold values from 0 to 255), so it will wrap around*Use
char16_t
to store a 16-bit char (orchar32_t
for a 32-bit char, which UTF-8 requires).If you want to handle UTF-8 strings, use UTF-8 string literals:
*It will wrap around even twice in your example:
Unicode 代码点可能需要 32 位表示。在大多数西方语言中,16 位就足够了,但要处理所有可能的 Unicode 代码点,您确实需要 32 位。
您可以在这里阅读更多相关信息:http://en.wikipedia.org/wiki/Code_point 。
Unicode code points may need 32 bit representations. In most western languages, 16 bits are enough, but to handle all possible Unicode code points, you really do need 32 bits.
You can read more about it here: http://en.wikipedia.org/wiki/Code_point.
如果你的意思是你想创建一个指向 Unicode 代码点 610 的 UTF-8 表示形式的 unsigned char,你可以这样做:
If you mean you want to create an unsigned char pointing to the UTF-8 representation of the Unicode code point 610 you could do: