NSString unicode编码问题
我在将字符串转换为可读的内容时遇到问题。我正在使用
NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];
,但无法将 \U7ab6\U51b1 转换为“
它显示为窭冱,这是我不想要的,它应该显示为”。谁能帮助我吗?
I'm having problems converting the string to something readable . I'm using
NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];
but I can't convert \U7ab6\U51b1 into '
It shows as 窶冱 which is what I don't want, it should show as an '. Can anyone help me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是字符 U+2019 右单引号。
所发生的情况是,您已经以 UTF-8 编码形式向您提交了字符序列
's
,该字符序列以字节形式显示:然后,该字节序列被错误地解释为好像它使用 Windows 代码页 932(日语;或多或少的 Shift-JIS)进行编码:
因此,在这种特殊情况下,您可以通过首先将字符编码为 cp932 字节,然后再恢复
的
字符串使用 UTF-8 将这些字节解码回字符。但是,这并不能解决您真正的问题,即字符串首先被错误地读取。在这种情况下,您得到了
窭冱
,因为编码的
产生的 UTF-8 字节序列碰巧也是有效的 Shift-JIS 字节序列。但对于您可能获得的所有可能的 UTF-8 字节序列而言,情况并非如此。许多其他角色将受到不可挽回的伤害。您需要找到字节被读入系统并解码为 Shift-JIS 的位置,并修复它以使用 UTF-8 代替。
That's character U+2019 RIGHT SINGLE QUOTATION MARK.
What has happened is you've had the character sequence
’s
submitted to you, in the UTF-8 encoding, which comes out as bytes:That byte sequence has then, incorrectly, been interpreted as if it were encoded in Windows code page 932 (Japanese; more or less Shift-JIS):
So in this one particular case, you could recover the
’s
string by firstly encoding the characters into cp932 bytes, and then decoding those bytes back to characters using UTF-8.However, this will not solve your real problem, which is that the strings were read in incorrectly in the first place. You got
窶冱
in this case because the UTF-8 byte sequence resulting from encoding’s
happened also to be a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you might get. Many other characters will be unrecoverably mangled.You need to find where bytes are being read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.