Objective-C：将文件内容读入 NSString 对象不会转换 unicode

发布于 2024-11-30 12:23:12 字数 617 浏览 4 评论 0原文

我有一个文件，我正在使用 stringWithContentsOfFile 将其读入 NSString 对象。它包含日语字符的 Unicode，例如：

\u305b\u3044\u3075\u304f

我相信我

せいふく

希望我的 NSString 对象将字符串存储为后者，但它却将其存储为前者。

我不太明白的是，当我这样做时：

NSString *myString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];

它将其存储为：\u305b\u3044\u3075\u304f。

但是当我在字符串中进行硬编码时：

NSString *myString = @"\u305b\u3044\u3075\u304f";

它正确地转换它并将其存储为：せいふく

stringWIthContentsOfFile是否以某种方式转义Unicode？任何帮助将不胜感激。

谢谢。

原文

I have a file, which I'm reading into an NSString object using stringWithContentsOfFile. It contains Unicode for Japanese characters such as:

\u305b\u3044\u3075\u304f

which I believe is

せいふく

I would like my NSString object to store the string as the latter, but it is storing it as the former.

The thing I don't quite understand is that when I do this:

NSString *myString = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];

It stores it as: \u305b\u3044\u3075\u304f.

But when I hardcode in the string:

NSString *myString = @"\u305b\u3044\u3075\u304f";

It correctly converts it and stores it as: せいふく

Does stringWIthContentsOfFile escape the Unicode in some way? Any help will be appreciated.

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风和你 2024-12-07 12:23:12

文件中的\u305b\u3044\u3075\u304f只是普通字符。所以你要把它们串起来。您需要在文件中保存实际的日语字符。也就是说，将 せいふく 存储在文件中，并将其加载到字符串中。

回复收藏 0 原文

柒夜笙歌凉 2024-12-07 12:23:12

你可以试试这个，不知道可行性如何。

NSArray *unicodeArray = [stringFromFile componentsSeparatedByString:@"\\u"];
NSMutableString *finalString = [[NSMutableString alloc] initWithString:@""];
for (NSString *unicodeString in unicodeArray) {
    if (![unicodeString isEqualToString:@""]) {
        unichar codeValue;
        [[NSScanner scannerWithString:unicodeString] scanHexInt:&codeValue];
        NSString* betaString = [NSString stringWithCharacters:&codeValue length:1]; 
        [finalString appendString:betaString];
    }
} 
//finalString should have せいふく

You can try this, dont know how feasible it is..

NSArray *unicodeArray = [stringFromFile componentsSeparatedByString:@"\\u"];
NSMutableString *finalString = [[NSMutableString alloc] initWithString:@""];
for (NSString *unicodeString in unicodeArray) {
    if (![unicodeString isEqualToString:@""]) {
        unichar codeValue;
        [[NSScanner scannerWithString:unicodeString] scanHexInt:&codeValue];
        NSString* betaString = [NSString stringWithCharacters:&codeValue length:1]; 
        [finalString appendString:betaString];
    }
} 
//finalString should have せいふく

回复收藏 0 原文

绻影浮沉 2024-12-07 12:23:12

Objective-C 字符串中类似 \u305b 的内容实际上是一条指令，给编译器，将其替换为该字符的实际 UTF-8 字节序列。读取文件的方法不是编译器，仅读取它找到的字节。因此，要获取该字符（正式称为“代码点”），您的文件必须包含该字符的实际 UTF-8 字节序列，而不是符号表示 \u305b。

它有点像\x43。在源代码中，这是四个字符，但它被值 0x43 的一个字节替换。因此，如果您将 @"\x43" 写入文件，该文件将不会包含四个字符 '\'、'x'、'4'、'3'，它将包含单个字符 'C'（具有 ASCII值 0x43）。

回复收藏 0 原文

~没有更多了~