Objective-C:将文件内容读入 NSString 对象不会转换 unicode

发布于 2024-11-30 12:23:12 字数 617 浏览 3 评论 0原文

我有一个文件,我正在使用 stringWithContentsOfFile 将其读入 NSString 对象。它包含日语字符的 Unicode,例如:

\u305b\u3044\u3075\u304f

我相信我

せいふく

希望我的 NSString 对象将字符串存储为后者,但它却将其存储为前者。

我不太明白的是,当我这样做时:

NSString *myString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];

它将其存储为:\u305b\u3044\u3075\u304f。

但是当我在字符串中进行硬编码时:

NSString *myString = @"\u305b\u3044\u3075\u304f";

它正确地转换它并将其存储为:せいふく

stringWIthContentsOfFile是否以某种方式转义Unicode?任何帮助将不胜感激。

谢谢。

I have a file, which I'm reading into an NSString object using stringWithContentsOfFile. It contains Unicode for Japanese characters such as:

\u305b\u3044\u3075\u304f

which I believe is

せいふく

I would like my NSString object to store the string as the latter, but it is storing it as the former.

The thing I don't quite understand is that when I do this:

NSString *myString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];

It stores it as: \u305b\u3044\u3075\u304f.

But when I hardcode in the string:

NSString *myString = @"\u305b\u3044\u3075\u304f";

It correctly converts it and stores it as: せいふく

Does stringWIthContentsOfFile escape the Unicode in some way? Any help will be appreciated.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

风和你 2024-12-07 12:23:12

文件中的\u305b\u3044\u3075\u304f只是普通字符。所以你要把它们串起来。您需要在文件中保存实际的日语字符。也就是说,将 せいふく 存储在文件中,并将其加载到字符串中。

In the file \u305b\u3044\u3075\u304f are just normal characters. So you are getting them in string. You need to save actual Japanese characters in the file. That is, store せいふく in file and that will be loaded in the string.

柒夜笙歌凉 2024-12-07 12:23:12

你可以试试这个,不知道可行性如何。

NSArray *unicodeArray = [stringFromFile componentsSeparatedByString:@"\\u"];
NSMutableString *finalString = [[NSMutableString alloc] initWithString:@""];
for (NSString *unicodeString in unicodeArray) {
    if (![unicodeString isEqualToString:@""]) {
        unichar codeValue;
        [[NSScanner scannerWithString:unicodeString] scanHexInt:&codeValue];
        NSString* betaString = [NSString stringWithCharacters:&codeValue length:1]; 
        [finalString appendString:betaString];
    }
} 
//finalString should have せいふく

You can try this, dont know how feasible it is..

NSArray *unicodeArray = [stringFromFile componentsSeparatedByString:@"\\u"];
NSMutableString *finalString = [[NSMutableString alloc] initWithString:@""];
for (NSString *unicodeString in unicodeArray) {
    if (![unicodeString isEqualToString:@""]) {
        unichar codeValue;
        [[NSScanner scannerWithString:unicodeString] scanHexInt:&codeValue];
        NSString* betaString = [NSString stringWithCharacters:&codeValue length:1]; 
        [finalString appendString:betaString];
    }
} 
//finalString should have せいふく
绻影浮沉 2024-12-07 12:23:12

Objective-C 字符串中类似 \u305b 的内容实际上是一条指令,给编译器,将其替换为该字符的实际 UTF-8 字节序列。读取文件的方法不是编译器,仅读取它找到的字节。因此,要获取该字符(正式称为“代码点”),您的文件必须包含该字符的实际 UTF-8 字节序列,而不是符号表示 \u305b

它有点像\x43。在源代码中,这是四个字符,但它被值 0x43 的一个字节替换。因此,如果您将 @"\x43" 写入文件,该文件将不会包含四个字符 '\'、'x'、'4'、'3',它将包含单个字符 'C'(具有 ASCII值 0x43)。

Something like \u305b in an Objective-C string is in fact an instruction to the compiler to replace it with the actual UTF-8 byte sequence for that character. The method reading the file is not a compiler, and only reads the bytes it finds. So to get that character (officially called "code point"), your file must contain the actual UTF-8 byte sequence for that character, and not the symbolic representation \u305b.

It's a bit like \x43. This is, in your source code, four characters, but it is replaced by one byte with value 0x43. So if you write @"\x43" to a file, the file will not contain the four characters '\', 'x', '4', '3', it will contain the single character 'C' (which has ASCII value 0x43).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文