“NSString stringWithUTF8String:”过于敏感

发布于 2024-11-14 09:09:40 字数 868 浏览 2 评论 0原文

我正在使用诸如 NSString 和 NSData 之类的高级 Cocoa 功能进行一些字符串操作,而不是深入到 C 级的事情,比如处理数组char

出于对它的热爱,+[NSString stringWithUTF8String:]有时会在使用 -[NSString UTF8String] 创建的完美字符串上返回 nil首先。人们会认为当输入格式错误时会发生这种情况。下面是一个失败的输入示例,采用十六进制:

55 6B 66 51 35 59 4A 5C 6A 60 40 33 5F 45 58 60 9D 47 3F 6E 5E 
60 59 34 58 68 41 4B 61 4E 3F 41 46 00

和 ASCII:

UkfQ5YJ\j`@3_EX`G?n^`Y4XhAKaN?AF

这是一个随机生成的字符串,用于测试我的子例程。

char * buffer = [randomNSString UTF8String];
// .... doing things .... in the end, buffer is the same as before
NSString * result = [NSString stringWithUTF8String:buffer];
// yields nil

编辑:以防万一有人没有理解隐含的问题,这里是 -v 模式:

为什么 [NSString stringWithUTF8String:] 有时会在完美形成的 UTF8-String 上返回 nil

I'm in the middle of doing some string manipulation using high-level Cocoa features like NSString and NSData as opposed to digging down to C-level things like working on arrays of chars.

For the love of it, +[NSString stringWithUTF8String:]sometimes returns nil on a perfectly good string that was created with -[NSString UTF8String] in the first place. One would assume that this happens when the input is malformed. Here is an example of the input that fails, in hex:

55 6B 66 51 35 59 4A 5C 6A 60 40 33 5F 45 58 60 9D 47 3F 6E 5E 
60 59 34 58 68 41 4B 61 4E 3F 41 46 00

and ASCII:

UkfQ5YJ\j`@3_EX`G?n^`Y4XhAKaN?AF

This is a randomly generated string, to test my subroutine.

char * buffer = [randomNSString UTF8String];
// .... doing things .... in the end, buffer is the same as before
NSString * result = [NSString stringWithUTF8String:buffer];
// yields nil

Edit: Just in case somebody didn't grasp the implicit question, here it is in -v mode:

Why does [NSString stringWithUTF8String:] sometimes return nil on a perfectly formed UTF8-String?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

凉风有信 2024-11-21 09:09:40

对讲机是对的。这样9d在utf8中是不合法的。具有最高位 10 的 utf8 字节被保留为连续字符,如果没有带有多个前导位的前缀字符,它们永远不会出现。

walkytalky is right. 9d is not legal in utf8 in this way. utf8 bytes with the top bits 10 are reserved as continuation characters, they never appear without a prefix character with more than one leading bit.

国粹 2024-11-21 09:09:40

这有点盲目,因为我们没有足够的信息来正确诊断问题。

如果在为 result 分配内存时,randomNSString 不再存在,例如,如果它已在引用计数环境中释放或在 GC 环境中收集, buffer 可能指向已释放但尚未重用的内存(这可以解释为什么它仍然相同)。

然而,创建一个新的 NSString 需要分配内存,并且它可能使用 buffer 指向的块,这意味着您的 UTF8 字符串将被新 NSString 的内部破坏。您可以通过在创建结果失败后记录缓冲区的内容来测试这一理论。但不要使用 %s 说明符,打印十六进制字节。

This is a bit of a stab in the dark because we don't have enough information to properly diagnose the problem.

If randomNSString no longer exists at the point where you allocate the memory for result, for instance, if it has been released in a reference counted environment or collected in a GC environment, it is possible that buffer points to memory that has been freed but not yet reused (which would explain why it is still the same).

However, creating a new NSString requires allocation of memory and it might use the block pointed to by buffer which would mean your UTF8 string would get zapped by the internals of the new NSString. You can test this theory by loggin the contents of buffer after failing to create result. Don't use the %s specifier though, print the hex bytes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文