NSString 到 NSData 编码注意事项

发布于 2024-12-27 20:51:43 字数 687 浏览 1 评论 0原文

我明白为什么从 NSData 到 NSString 时需要指定编码。 然而,我发现反向(NSString 到 NSData)需要指定编码是令人沮丧的。

在这个相关的问题中的答案建议使用 NSUTF8StringEncodingdefaultCStringEncoding,后者没有完全解释。

所以我只是想问一下,在将 NSString 转换为 NSData 时,以下内容是否正确:

  • 如果您想 100% 确定 NSString 对象的二进制表示形式是 UTF8,则使用 NSUTF8StringEncoding (或任何需要的编码)

  • 如果 NSString 对象的编码已知/预期已经是某种类型并且不需要转换,那么使用 defaultCStringEncoding 是安全的(也许内部更快)(来自我读过的 Objective-C 在内部使用 UTF-16,不确定是 LE 还是 BE,但我假设是 LE,因为平台是 LE)

TIA

I understand why when going from NSData to NSString you need to specify encoding.
However I'm finding it frustrating how the reverse (NSString to NSData) needs to have an encoding specified.

In this related question the answers suggested using
NSUTF8StringEncoding or defaultCStringEncoding, with the latter not being fully explained.

So I just wanted to ask IF the following is correct when converting NSString to NSData:

  • In cases where you want to be 100% sure the binary representation of the NSString object is UTF8 then use NSUTF8StringEncoding (or whatever encoding is needed)

  • In cases where the encoding of the NSString object is known/expected to already be of a certain type and no conversion is required then it's safe (perhaps internally faster) to use defaultCStringEncoding (from what I have read objective-c uses UTF-16 internally, not sure if LE or BE but I'd assume LE because the platform is LE)

TIA

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

奶茶白久 2025-01-03 20:51:43

需要指定将 NSString 转换为 NSData 的编码,其原因与需要指定从 NSData 到 NSString 的编码相同。

NSData 对象是绝对原始字节字符串的包装器。如果 NSString 没有指定某种编码,它就不知道要写什么,因为在 1 和 0 的级别上,同一字母的 UTF-16 编码看起来与 UTF-8 编码不同,当然,如果你将 UTF-16 写为大端,然后将其读为小端,你会得到乱码。

换句话说,不要将其视为转换或转义字符串;而是将其视为转换或转义字符串。它生成一个字节缓冲区,编码告诉它当下一个字符是“a”时要写入哪些1和0,当它表示“妈”时要写入哪些1和0。

至于你的问题……这是我的两分钱。

1)如果您将 NSString 转换为 NSData,以便同一个程序可以稍后将其转换回来,并且在您将其读回 NSString 之前没有其他软件需要处理该 NSData,那么这一切都不重要。重要的是您的字符串到数据的编码和数据到字符串的编码匹配。

2) 如果您只处理 ASCII 字符,您可能可以摆脱很多麻烦,因为许多种编码对 128 以下的字符使用相同的表示形式。但这很容易被破坏,即使是像智能引号这样的小东西。

3) 尽管有这个名字,defaultCStringEncoding 并不是你应该使用的默认值。它是为特殊情况而设计的,在这种情况下,您需要处理系统字符串并且不知道系统如何处理其内部字符串。它指的是在默认 C 实现中处理字符串的方式,而不是在 NSString 内部,因此不一定有性能优势。

4)如果你用未知的字符串编码编写一个字符串,并且尝试用不同的字符串编码读回它,你的代码将会失败;在许多情况下,您最终只会得到一个空字符串。

底线是:谁将尝试解释您的 NSData 对象?如果它是您自己的应用程序,请选择对您有意义的编码(我对所有内容都使用 UTF8)并将其用于两种转换。否则,找出您的生态系统需要读取或写入什么,并将其作为您的标准。

The encoding needs to be specified for converting NSString to NSData for the same reason it needs to be specified going from NSData to NSString.

An NSData object is a wrapper for a string of absolutely raw bytes. If the NSString doesn't specify some encoding, it doesn't know what to write, because at the level of ones and zeroes, a UTF-16 encoding looks different from a UTF-8 encoding of the same letter, and of course, if you write UTF-16 as big-endian and read it as little-endian you will get gibberish.

In other words, don't think of it as converting or escaping a string; it's generating a byte buffer, and the encoding tells it which ones and zeroes to write when the next character is "a" and which ones to write when it means "妈".

As for your question...here's my two cents.

1) If you are converting an NSString to an NSData so that your same program can convert it back later, and no other software will need to deal with that NSData until after you've read it back into an NSString, then none of this matters. All that matters is that your string-to-data encoding and your data-to-string encoding match.

2) If you are dealing only with ASCII characters, you can probably get away with a lot, just because many kinds of encoding use the same representation for characters under 128. But this breaks easily, even with little things like smart quotes.

3) Despite the name, defaultCStringEncoding is not something you should use as a default. It's designed for special circumstances where you need to deal with system strings and don't otherwise know how the system deals with its internal strings. It refers to the way strings are handled in the default C implementation, NOT in the NSString internals, so there's not necessarily a performance benefit.

4) If you write a string with an unknown string encoding, and you try to read it back with a different string encoding, your code will fail; in many cases, you will just end up with an empty string.

Bottom line is: who will be trying to interpret your NSData objects? If it's your own app, pick an encoding that makes sense for you (I use UTF8 for everything) and use it for both conversions. Otherwise, figure out what your ecosystem needs to read or write and make that your standard.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文