Objective-C 中 UTF-8 和 UTF-8 的区别UTF-16 字符串作为字节

发布于 2024-12-14 10:48:32 字数 1108 浏览 0 评论 0原文

我正在尝试将 NSString 转换为字节数组，然后再转换回 NSString。我尝试过 NSUnicodeEncoding 和 NSUTF8StringEncoding。我的问题是，当我迭代字节数组时，我看到不同的数据。

此代码中唯一的变化是我将 NSUTF8StringEncoding 更改为 NSUnicodeEncoding，并添加 dataLength += 2 以便它考虑 BOM。

NSString *message = @"testing";
NSUInteger dataLength = [message lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
void *byteData = malloc( dataLength );
NSRange range = NSMakeRange(0, [message length]);
BOOL result =   [message getBytes:byteData maxLength:dataLength usedLength:&actualLength encoding:NSUTF8StringEncoding options:0  range:range remainingRange:&remain];
for( NSUInteger x = 0; x < dataLength; x++ )
{
    NSLog( @"byte data: %s", (char *)byteData);
    int t = (int)*(char *)byteData;
    byteData++;
}

区别在于 NSLog ：作为 NSUTF8StringEncoding，我看到

testing`
esting`
sting`
ting`
...

作为 NSUnicodeEncoding，我看到

null
t
null
e
...

int t 值对于给定字符是正确的，但我不明白为什么 byteData 如此不同。我希望它们都像 NSUnicodeEncoding 一样工作。

原文

I am trying to convert NSStrings to byte arrays and then back to NSStrings. I have tried with NSUnicodeEncoding and NSUTF8StringEncoding. My question is that as I iterate over the byte arrays, I'm seeing different data

Only change in this code is that I change NSUTF8StringEncoding to NSUnicodeEncoding and that I add dataLength += 2 so that it accounts for the BOM.

NSString *message = @"testing";
NSUInteger dataLength = [message lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
void *byteData = malloc( dataLength );
NSRange range = NSMakeRange(0, [message length]);
BOOL result =   [message getBytes:byteData maxLength:dataLength usedLength:&actualLength encoding:NSUTF8StringEncoding options:0  range:range remainingRange:&remain];
for( NSUInteger x = 0; x < dataLength; x++ )
{
    NSLog( @"byte data: %s", (char *)byteData);
    int t = (int)*(char *)byteData;
    byteData++;
}

The difference is in the NSLog :
As NSUTF8StringEncoding I see

testing`
esting`
sting`
ting`
...

As NSUnicodeEncoding I see

null
t
null
e
...

The int t value is correct for the given character, but I don't understand why the byteData is so different. I would expect them both to act like the NSUnicodeEncoding.

分享到QQ

分享到微博