当前位置：文江博客话题详情

IOS Unicode nsstring

NSString 与 UTF32 之间的转换

发布于 2024-11-16 15:52:05 字数 795 浏览 1 评论 0 原文

我正在使用一个包含 UTF32 字符的十六进制代码的数据库。我想将这些字符存储在 NSString 中。我需要有以两种方式进行转换的例程。

要将 NSString 的第一个字符转换为 unicode 值，此例程似乎可以工作：

const unsigned char *cs = (const unsigned char *)
    [s cStringUsingEncoding:NSUTF32StringEncoding];
uint32_t code = 0;
for ( int i = 3 ; i >= 0 ; i-- ) {
    code <<= 8;
    code += cs[i];
}
return code;

但是，我无法执行相反的操作（即采用单个代码并将其转换为 NSString）。我想我可以做与上面相反的事情，只需创建一个包含 UTF32 字符且字节顺序正确的 c 字符串，然后使用正确的编码从中创建一个 NSString 。

但是，从 cstrings 转换为 / 从 cstrings 转换对我来说似乎是不可逆的。

例如，我尝试过这段代码，“tmp”字符串不等于原始字符串“s”。

char *cs = [s cStringUsingEncoding:NSUTF32StringEncoding];
NSString *tmp = [NSString stringWithCString:cs encoding:NSUTF32StringEncoding];

我做错了什么？我应该使用“wchar_t”作为 cstring 而不是 char * 吗？

原文

I'm working with a database that includes hex codes for UTF32 characters. I would like to take these characters and store them in an NSString. I need to have routines to convert in both ways.

To convert the first character of an NSString to a unicode value, this routine seems to work:

const unsigned char *cs = (const unsigned char *)
    [s cStringUsingEncoding:NSUTF32StringEncoding];
uint32_t code = 0;
for ( int i = 3 ; i >= 0 ; i-- ) {
    code <<= 8;
    code += cs[i];
}
return code;

However, I am unable to do the reverse (i.e. take a single code and convert it into an NSString). I thought I could just do the reverse of what I do above by simply creating a c-string with the UTF32 character in it with the bytes in the correct order, and then create an NSString from that using the correct encoding.

However, converting to / from cstrings does not seem to be reversible for me.

For example, I've tried this code, and the "tmp" string is not equal to the original string "s".

char *cs = [s cStringUsingEncoding:NSUTF32StringEncoding];
NSString *tmp = [NSString stringWithCString:cs encoding:NSUTF32StringEncoding];

What am I doing wrong? Should I be using "wchar_t" for the cstring instead of char *?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

执着的年纪 2024-11-23 15:52:05

您有几个合理的选择。

1. 转换

首先是将 UTF32 转换为 UTF16 并与 NSString 一起使用，因为 UTF16 是 NSString 的“本机”编码。实际上并不是那么难。如果UTF32字符在BMP中（例如高两个字节为0），则可以直接将其转换为unichar。如果它在任何其他平面中，您可以将其转换为 UTF16 字符的代理对。您可以在维基百科页面上找到这些规则。但是快速（未经测试的）转换看起来像

UTF32Char inputChar = // my UTF-32 character
inputChar -= 0x10000;
unichar highSurrogate = inputChar >> 10; // leave the top 10 bits
highSurrogate += 0xD800;
unichar lowSurrogate = inputChar & 0x3FF; // leave the low 10 bits
lowSurrogate += 0xDC00;

现在你可以同时使用两个字符创建一个 NSString：

NSString *str = [NSString stringWithCharacters:(unichar[]){highSurrogate, lowSurrogate} length:2];

要向后，你可以使用 [NSString getCharacters:range:] 来获取 unichar 的背面然后反转代理对算法以恢复 UTF32 字符（任何不在 0xD800-0xDFFF 范围内的字符都应该转换为 UTF32 直接地）。

2. 字节缓冲区

您的另一个选择是让 NSString 直接进行转换，而不使用 cStrings。要将 UTF32 值转换为 NSString，您可以使用如下所示的内容：

UTF32Char inputChar = // input UTF32 value
inputChar = NSSwapHostIntToLittle(inputChar); // swap to little-endian if necessary
NSString *str = [[[NSString alloc] initWithBytes:&inputChar length:4 encoding:NSUTF32LittleEndianStringEncoding] autorelease];

要再次将其返回，您可以使用

UTF32Char outputChar;
if ([str getBytes:&outputChar maxLength:4 usedLength:NULL encoding:NSUTF32LittleEndianStringEncoding options:0 range:NSMakeRange(0, 1) remainingRange:NULL]) {
    outputChar = NSSwapLittleIntToHost(outputChar); // swap back to host endian
    // outputChar now has the first UTF32 character
}

You have a couple of reasonable options.

1. Conversion

The first is to convert your UTF32 to UTF16 and use those with NSString, as UTF16 is the "native" encoding of NSString. It's not actually all that hard. If the UTF32 character is in the BMP (e.g. it's high two bytes are 0's), you can just cast it to unichar directly. If it's in any other plane, you can convert it to a surrogate pair of UTF16 characters. You can find the rules on the wikipedia page. But a quick (untested) conversion would look like

UTF32Char inputChar = // my UTF-32 character
inputChar -= 0x10000;
unichar highSurrogate = inputChar >> 10; // leave the top 10 bits
highSurrogate += 0xD800;
unichar lowSurrogate = inputChar & 0x3FF; // leave the low 10 bits
lowSurrogate += 0xDC00;

Now you can create an NSString using both characters at the same time:

NSString *str = [NSString stringWithCharacters:(unichar[]){highSurrogate, lowSurrogate} length:2];

To go backwards, you can use [NSString getCharacters:range:] to get the unichar's back and then reverse the surrogate pair algorithm to get your UTF32 character back (any characters which aren't in the range 0xD800-0xDFFF should just be cast to UTF32 directly).

2. Byte buffers

Your other option is to let NSString do the conversion directly without using cStrings. To convert a UTF32 value into an NSString you can use something like the following:

UTF32Char inputChar = // input UTF32 value
inputChar = NSSwapHostIntToLittle(inputChar); // swap to little-endian if necessary
NSString *str = [[[NSString alloc] initWithBytes:&inputChar length:4 encoding:NSUTF32LittleEndianStringEncoding] autorelease];

To get it back out again, you can use

UTF32Char outputChar;
if ([str getBytes:&outputChar maxLength:4 usedLength:NULL encoding:NSUTF32LittleEndianStringEncoding options:0 range:NSMakeRange(0, 1) remainingRange:NULL]) {
    outputChar = NSSwapLittleIntToHost(outputChar); // swap back to host endian
    // outputChar now has the first UTF32 character
}

回复收藏 0 原文

慢慢从新开始 2024-11-23 15:52:05

这里有两个问题：

1：

第一个是 [NSString cStringUsingEncoding:] 和 [NSString getCString:maxLength:encoding:] 都返回原生的 C 字符串-endianness（小）使用 NSUTF32StringEncoding 和 NSUTF16StringEncoding 时不添加 BOM。

Unicode 标准规定：（请参阅“我应该如何处理 BOM”）

“如果没有 BOM，文本应被解释为 big-endian。”

这也在 NSString 的文档：（请参阅“解释 UTF-16 编码数据”）

“...如果未另外指定字节顺序，NSString 假定 UTF-16字符都是大端字节序，除非有 BOM（字节顺序标记），在这种情况下，BOM 决定字节顺序。”

尽管它们指的是 UTF-16，但这同样适用于UTF-32。

2：

第二个是 [NSString stringWithCString:encoding:] 内部使用 CFStringCreateWithCString 创建 C 字符串。问题是 CFStringCreateWithCString 仅接受使用8 位编码的字符串。来自文档：（请参阅“参数”部分）

字符串必须使用 8 位编码。

要解决此问题：

明确声明您要使用两种方式的编码字节顺序 (NSString -> C-string 和 C-string -> NSString)
尝试从 C- 创建 NSString 时使用 [NSString initWithBytes:length:encoding:]以 UTF-32 或 UTF-16 编码的字符串。

回复收藏 0 原文

~没有更多了~

关于作者

东风软

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

NSString 与 UTF32 之间的转换

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

1. 转换

2. 字节缓冲区

1. Conversion

2. Byte buffers

1：

2：

要解决此问题：

1:

2:

To solve this issue:

关于作者

相关话题

热门标签

推荐作者

月下凄凉

toutuxuethreejs

回首观望

蓝戈者

开飞机的贝塔β

zyxwd

友情链接

NSString 与 UTF32 之间的转换

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

1. 转换

2. 字节缓冲区

1. Conversion

2. Byte buffers

1：

2：

要解决此问题：

1:

2:

To solve this issue:

关于作者

相关话题

热门标签

推荐作者

月下凄凉

toutuxuethreejs

回首观望

蓝戈者

开飞机的贝塔β

zyxwd

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。