NSString unicode编码问题

发布于 2024-10-26 08:13:01 字数 240 浏览 5 评论 0原文

我在将字符串转换为可读的内容时遇到问题。我正在使用

NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];

，但无法将 \U7ab6\U51b1 转换为“

它显示为窭冱，这是我不想要的，它应该显示为”。谁能帮助我吗？

原文

I'm having problems converting the string to something readable . I'm using

NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];

but I can't convert \U7ab6\U51b1 into '

It shows as 窶冱 which is what I don't want, it should show as an '. Can anyone help me?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゃ人海孤独症 2024-11-02 08:13:01

它显示为 '

这是字符 U+2019 右单引号。

所发生的情况是，您已经以 UTF-8 编码形式向您提交了字符序列 's，该字符序列以字节形式显示：

’          s
E2 80 99   73

然后，该字节序列被错误地解释为好像它使用 Windows 代码页 932（日语；或多或少的 Shift-JIS）进行编码：

E2 80    99 73
窶        冱

因此，在这种特殊情况下，您可以通过首先将字符编码为 cp932 字节，然后再恢复 的 字符串使用 UTF-8 将这些字节解码回字符。

但是，这并不能解决您真正的问题，即字符串首先被错误地读取。在这种情况下，您得到了 窭冱，因为编码 的 产生的 UTF-8 字节序列碰巧也是有效的 Shift-JIS 字节序列。但对于您可能获得的所有可能的 UTF-8 字节序列而言，情况并非如此。许多其他角色将受到不可挽回的伤害。

您需要找到字节被读入系统并解码为 Shift-JIS 的位置，并修复它以使用 UTF-8 代替。

it is shown as a ’

That's character U+2019 RIGHT SINGLE QUOTATION MARK.

What has happened is you've had the character sequence ’s submitted to you, in the UTF-8 encoding, which comes out as bytes:

’          s
E2 80 99   73

That byte sequence has then, incorrectly, been interpreted as if it were encoded in Windows code page 932 (Japanese; more or less Shift-JIS):

E2 80    99 73
窶        冱

So in this one particular case, you could recover the ’s string by firstly encoding the characters into cp932 bytes, and then decoding those bytes back to characters using UTF-8.

However, this will not solve your real problem, which is that the strings were read in incorrectly in the first place. You got 窶冱 in this case because the UTF-8 byte sequence resulting from encoding ’s happened also to be a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you might get. Many other characters will be unrecoverably mangled.

You need to find where bytes are being read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.

回复收藏 0 原文

~没有更多了~