Objective-C:将文件内容读入 NSString 对象不会转换 unicode
我有一个文件,我正在使用 stringWithContentsOfFile 将其读入 NSString 对象。它包含日语字符的 Unicode,例如:
\u305b\u3044\u3075\u304f
我相信我
せいふく
希望我的 NSString 对象将字符串存储为后者,但它却将其存储为前者。
我不太明白的是,当我这样做时:
NSString *myString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];
它将其存储为:\u305b\u3044\u3075\u304f。
但是当我在字符串中进行硬编码时:
NSString *myString = @"\u305b\u3044\u3075\u304f";
它正确地转换它并将其存储为:せいふく
stringWIthContentsOfFile是否以某种方式转义Unicode?任何帮助将不胜感激。
谢谢。
I have a file, which I'm reading into an NSString object using stringWithContentsOfFile. It contains Unicode for Japanese characters such as:
\u305b\u3044\u3075\u304f
which I believe is
せいふく
I would like my NSString object to store the string as the latter, but it is storing it as the former.
The thing I don't quite understand is that when I do this:
NSString *myString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];
It stores it as: \u305b\u3044\u3075\u304f.
But when I hardcode in the string:
NSString *myString = @"\u305b\u3044\u3075\u304f";
It correctly converts it and stores it as: せいふく
Does stringWIthContentsOfFile escape the Unicode in some way? Any help will be appreciated.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
文件中的
\u305b\u3044\u3075\u304f
只是普通字符。所以你要把它们串起来。您需要在文件中保存实际的日语字符。也就是说,将せいふく
存储在文件中,并将其加载到字符串中。In the file
\u305b\u3044\u3075\u304f
are just normal characters. So you are getting them in string. You need to save actual Japanese characters in the file. That is, storeせいふく
in file and that will be loaded in the string.你可以试试这个,不知道可行性如何。
You can try this, dont know how feasible it is..
Objective-C 字符串中类似
\u305b
的内容实际上是一条指令,给编译器,将其替换为该字符的实际 UTF-8 字节序列。读取文件的方法不是编译器,仅读取它找到的字节。因此,要获取该字符(正式称为“代码点”),您的文件必须包含该字符的实际 UTF-8 字节序列,而不是符号表示\u305b
。它有点像
\x43
。在源代码中,这是四个字符,但它被值 0x43 的一个字节替换。因此,如果您将 @"\x43" 写入文件,该文件将不会包含四个字符 '\'、'x'、'4'、'3',它将包含单个字符 'C'(具有 ASCII值 0x43)。Something like
\u305b
in an Objective-C string is in fact an instruction to the compiler to replace it with the actual UTF-8 byte sequence for that character. The method reading the file is not a compiler, and only reads the bytes it finds. So to get that character (officially called "code point"), your file must contain the actual UTF-8 byte sequence for that character, and not the symbolic representation\u305b
.It's a bit like
\x43
. This is, in your source code, four characters, but it is replaced by one byte with value 0x43. So if you write @"\x43" to a file, the file will not contain the four characters '\', 'x', '4', '3', it will contain the single character 'C' (which has ASCII value 0x43).