Objective-C 中 UTF-8 和 UTF-8 的区别UTF-16 字符串作为字节

发布于 2024-12-14 10:48:32 字数 1108 浏览 0 评论 0原文

我正在尝试将 NSString 转换为字节数组,然后再转换回 NSString。我尝试过 NSUnicodeEncoding 和 NSUTF8StringEncoding。我的问题是,当我迭代字节数组时,我看到不同的数据。

此代码中唯一的变化是我将 NSUTF8StringEncoding 更改为 NSUnicodeEncoding,并添加 dataLength += 2 以便它考虑 BOM。

NSString *message = @"testing";
NSUInteger dataLength = [message lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
void *byteData = malloc( dataLength );
NSRange range = NSMakeRange(0, [message length]);
BOOL result =   [message getBytes:byteData maxLength:dataLength usedLength:&actualLength encoding:NSUTF8StringEncoding options:0  range:range remainingRange:&remain];
for( NSUInteger x = 0; x < dataLength; x++ )
{
    NSLog( @"byte data: %s", (char *)byteData);
    int t = (int)*(char *)byteData;
    byteData++;
}

区别在于 NSLog : 作为 NSUTF8StringEncoding,我看到

  • testing`
  • esting`
  • sting`
  • ting`
  • ...

作为 NSUnicodeEncoding,我看到

  • null
  • t
  • null
  • e
  • ...

int t 值对于给定字符是正确的,但我不明白为什么 byteData 如此不同。我希望它们都像 NSUnicodeEncoding 一样工作。

I am trying to convert NSStrings to byte arrays and then back to NSStrings. I have tried with NSUnicodeEncoding and NSUTF8StringEncoding. My question is that as I iterate over the byte arrays, I'm seeing different data

Only change in this code is that I change NSUTF8StringEncoding to NSUnicodeEncoding and that I add dataLength += 2 so that it accounts for the BOM.

NSString *message = @"testing";
NSUInteger dataLength = [message lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
void *byteData = malloc( dataLength );
NSRange range = NSMakeRange(0, [message length]);
BOOL result =   [message getBytes:byteData maxLength:dataLength usedLength:&actualLength encoding:NSUTF8StringEncoding options:0  range:range remainingRange:&remain];
for( NSUInteger x = 0; x < dataLength; x++ )
{
    NSLog( @"byte data: %s", (char *)byteData);
    int t = (int)*(char *)byteData;
    byteData++;
}

The difference is in the NSLog :
As NSUTF8StringEncoding I see

  • testing`
  • esting`
  • sting`
  • ting`
  • ...

As NSUnicodeEncoding I see

  • null
  • t
  • null
  • e
  • ...

The int t value is correct for the given character, but I don't understand why the byteData is so different. I would expect them both to act like the NSUnicodeEncoding.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

雄赳赳气昂昂 2024-12-21 10:48:32

在UTF8中,字母F由单个F字节表示。字符串“FU”由 ASCII F 字节后跟 ASCII U 字节表示。在 Unicode(如此处所用)中,每个字符占用两个字节。标准 ASCII 字符前面有一个零字节。

目前尚不清楚为什么您看到的行为与您所期望的不完全一样。在UTF-8中,标准ASCII字符占用一个字节。在你的Unicode编码中,占据两个。所以它肯定不会完全一样。

In UTF8, the letter F is represented by a single F byte. The string "FU" is represented by an ASCII F byte followed by an ASCII U byte. In Unicode (as used here), each character occupies two bytes. Standard ASCII characters are preceded by a zero byte.

It's not clear why the behavior you see isn't exactly what you'd expect. In UTF-8, standard ASCII characters occupy one byte. In your Unicode encoding, the occupy two. So it certainly won't be at all the same.

岁月流歌 2024-12-21 10:48:32

根据这个答案NSUnicodeStringEncoding< /code> “是小端字节序 UTF-16,前面带有字节顺序标记”,因此应该预期结果与UTF-8。

According to this answer, NSUnicodeStringEncoding "is little-endian UTF-16 preceded with a byte order mark", so it should be expected that the result is totally different than UTF-8.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文