NSString unicode编码问题

发布于 2024-10-26 08:13:01 字数 240 浏览 3 评论 0原文

我在将字符串转换为可读的内容时遇到问题。我正在使用

NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];

,但无法将 \U7ab6\U51b1 转换为“

它显示为窭冱,这是我不想要的,它应该显示为”。谁能帮助我吗?

I'm having problems converting the string to something readable . I'm using

NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];

but I can't convert \U7ab6\U51b1 into '

It shows as 窶冱 which is what I don't want, it should show as an '. Can anyone help me?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ゃ人海孤独症 2024-11-02 08:13:01

它显示为 '

这是字符 U+2019 右单引号。

所发生的情况是,您已经以 UTF-8 编码形式向您提交了字符序列 's,该字符序列以字节形式显示:

’          s
E2 80 99   73

然后,该字节序列被错误地解释为好像它使用 Windows 代码页 932(日语;或多或少的 Shift-JIS)进行编码:

E2 80    99 73
窶        冱

因此,在这种特殊情况下,您可以通过首先将字符编码为 cp932 字节,然后再恢复 字符串使用 UTF-8 将这些字节解码回字符。

但是,这并不能解决您真正的问题,即字符串首先被错误地读取。在这种情况下,您得到了 窭冱,因为编码 产生的 UTF-8 字节序列碰巧也是有效的 Shift-JIS 字节序列。但对于您可能获得的所有可能的 UTF-8 字节序列而言,情况并非如此。许多其他角色将受到不可挽回的伤害。

您需要找到字节被读入系统并解码为 Shift-JIS 的位置,并修复它以使用 UTF-8 代替。

it is shown as a ’

That's character U+2019 RIGHT SINGLE QUOTATION MARK.

What has happened is you've had the character sequence ’s submitted to you, in the UTF-8 encoding, which comes out as bytes:

’          s
E2 80 99   73

That byte sequence has then, incorrectly, been interpreted as if it were encoded in Windows code page 932 (Japanese; more or less Shift-JIS):

E2 80    99 73
窶        冱

So in this one particular case, you could recover the ’s string by firstly encoding the characters into cp932 bytes, and then decoding those bytes back to characters using UTF-8.

However, this will not solve your real problem, which is that the strings were read in incorrectly in the first place. You got 窶冱 in this case because the UTF-8 byte sequence resulting from encoding ’s happened also to be a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you might get. Many other characters will be unrecoverably mangled.

You need to find where bytes are being read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文