“NSString stringWithUTF8String:”过于敏感
我正在使用诸如 NSString 和 NSData 之类的高级 Cocoa 功能进行一些字符串操作,而不是深入到 C 级的事情,比如处理数组char
。
出于对它的热爱,+[NSString stringWithUTF8String:]
有时会在使用 -[NSString UTF8String]
创建的完美字符串上返回 nil
首先。人们会认为当输入格式错误时会发生这种情况。下面是一个失败的输入示例,采用十六进制:
55 6B 66 51 35 59 4A 5C 6A 60 40 33 5F 45 58 60 9D 47 3F 6E 5E
60 59 34 58 68 41 4B 61 4E 3F 41 46 00
和 ASCII:
UkfQ5YJ\j`@3_EX`G?n^`Y4XhAKaN?AF
这是一个随机生成的字符串,用于测试我的子例程。
char * buffer = [randomNSString UTF8String];
// .... doing things .... in the end, buffer is the same as before
NSString * result = [NSString stringWithUTF8String:buffer];
// yields nil
编辑:以防万一有人没有理解隐含的问题,这里是 -v 模式:
为什么 [NSString stringWithUTF8String:] 有时会在完美形成的 UTF8-String 上返回 nil
?
I'm in the middle of doing some string manipulation using high-level Cocoa features like NSString
and NSData
as opposed to digging down to C-level things like working on arrays of char
s.
For the love of it, +[NSString stringWithUTF8String:]
sometimes returns nil
on a perfectly good string that was created with -[NSString UTF8String]
in the first place. One would assume that this happens when the input is malformed. Here is an example of the input that fails, in hex:
55 6B 66 51 35 59 4A 5C 6A 60 40 33 5F 45 58 60 9D 47 3F 6E 5E
60 59 34 58 68 41 4B 61 4E 3F 41 46 00
and ASCII:
UkfQ5YJ\j`@3_EX`G?n^`Y4XhAKaN?AF
This is a randomly generated string, to test my subroutine.
char * buffer = [randomNSString UTF8String];
// .... doing things .... in the end, buffer is the same as before
NSString * result = [NSString stringWithUTF8String:buffer];
// yields nil
Edit: Just in case somebody didn't grasp the implicit question, here it is in -v mode:
Why does [NSString stringWithUTF8String:] sometimes return nil
on a perfectly formed UTF8-String?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对讲机是对的。这样9d在utf8中是不合法的。具有最高位 10 的 utf8 字节被保留为连续字符,如果没有带有多个前导位的前缀字符,它们永远不会出现。
walkytalky is right. 9d is not legal in utf8 in this way. utf8 bytes with the top bits 10 are reserved as continuation characters, they never appear without a prefix character with more than one leading bit.
这有点盲目,因为我们没有足够的信息来正确诊断问题。
如果在为
result
分配内存时,randomNSString
不再存在,例如,如果它已在引用计数环境中释放或在 GC 环境中收集, buffer 可能指向已释放但尚未重用的内存(这可以解释为什么它仍然相同)。然而,创建一个新的 NSString 需要分配内存,并且它可能使用 buffer 指向的块,这意味着您的 UTF8 字符串将被新 NSString 的内部破坏。您可以通过在创建
结果
失败后记录缓冲区的内容来测试这一理论。但不要使用%s
说明符,打印十六进制字节。This is a bit of a stab in the dark because we don't have enough information to properly diagnose the problem.
If
randomNSString
no longer exists at the point where you allocate the memory forresult
, for instance, if it has been released in a reference counted environment or collected in a GC environment, it is possible thatbuffer
points to memory that has been freed but not yet reused (which would explain why it is still the same).However, creating a new NSString requires allocation of memory and it might use the block pointed to by buffer which would mean your UTF8 string would get zapped by the internals of the new NSString. You can test this theory by loggin the contents of buffer after failing to create
result
. Don't use the%s
specifier though, print the hex bytes.