将带有重音字符的 NSString 转换为 CString

发布于 2024-12-03 19:22:26 字数 299 浏览 0 评论 0原文

我有一个 NSString,其值为 Jose(e 上有重音符号)。我尝试将其转换为 C 字符串,如下所示:

char str [[myAccentStr length] + 1];
[myAccentStr getCString:str maxLength:[myAccentStr length] + 1 encoding:NSUTF32StringEncoding];

但 str 最终成为空字符串。什么给?我也尝试过UTF8和UTF16。它稍后会传递给另一个函数,当该函数调用 lstrlen 时,大小将为零。

I have an NSString with a value of Jose (an accent on the e). I try to convert it to a C string as follows:

char str [[myAccentStr length] + 1];
[myAccentStr getCString:str maxLength:[myAccentStr length] + 1 encoding:NSUTF32StringEncoding];

but str ends up being an empty string. What gives? I tried UTF8 and UTF16 too. It gets passed to another function later on and when that funcsion calls lstrlen on it, the size comes out as zero.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦中的蝴蝶 2024-12-10 19:22:26

NSString getCString:maxLength:encoding 说:

您可以使用 canBeConvertedToEncoding: 来检查字符串是否可以
无损转换为编码。如果不能的话,你可以使用
dataUsingEncoding:allowLossyConversion: 获取 C 字符串
使用编码表示,允许一些信息丢失(注意
dataUsingEncoding:allowLossyConversion: 返回的数据是
不是严格的 C 字符串,因为它没有 NULL 终止符)。

使用 NSString 方法 dataUsingEncoding:allowLossyConversion: 就可以了。这是一个代码示例:

NSString *myAccentStr = @"José";
char str[[myAccentStr length] + 1];

// NSString * to C String (char*)
NSData *strData = [myAccentStr dataUsingEncoding:NSMacOSRomanStringEncoding 
                                allowLossyConversion:YES];
memcpy(str, [strData bytes], [strData length] + 1);
str[[myAccentStr length]] = '\0';
NSLog(@"str (from NSString* to c string): %s", str);

// C String (char*) to NSString *   
NSString *newAccentStr = [NSString stringWithCString:str 
                                            encoding:NSMacOSRomanStringEncoding];
NSLog(@"newAccentStr (from c string to NSString*):  %@", newAccentStr);

NSLog 的输出是:

str(从 NSString* 到 c 字符串):José

newAccentStr(从 c 字符串到 NSString*):José

到目前为止,我只在使用 NSMacOSRomanStringEncoding 时看到它正常工作。


编辑

将其更改为社区 wiki。请随意编辑。

hooleyhoop 有一些很棒的观点,所以我想我会尝试编写尽可能详细的代码。如果我遗漏了任何内容,请有人插话。

另外 - 不知道为什么 [NSString canBeConvertedToEncoding:] 返回 YES,即使 [NSString getCString:maxLength:encoding:] 函数肯定无法正常工作(如输出所示) )。

以下是一些代码,可帮助分析哪些有效/哪些无效:

// Define Block variable to tests out different encodings
void (^tryGetCStringUsingEncoding)(NSString*, NSStringEncoding) = ^(NSString* originalNSString, NSStringEncoding encoding) {
    NSLog(@"Trying to convert \"%@\" using encoding: 0x%X", originalNSString, encoding);
    BOOL canEncode = [originalNSString canBeConvertedToEncoding:encoding];
    if (!canEncode)
    {
        NSLog(@"    Can not encode \"%@\" using encoding %X", originalNSString, encoding);
    }
    else
    {
        // Try encoding using NSString getCString:maxLength:encoding:
        NSUInteger cStrLength = [originalNSString lengthOfBytesUsingEncoding:encoding];
        char cstr[cStrLength];
        [originalNSString getCString:cstr maxLength:cStrLength encoding:encoding];
        NSLog(@"    Converted(1): \"%s\"  (expected length: %u)",
              cstr, cStrLength);

        // Try encoding using NSString dataUsingEncoding:allowLossyConversion:          
        NSData *strData = [originalNSString dataUsingEncoding:encoding allowLossyConversion:YES];
        char cstr2[[strData length] + 1];
        memcpy(cstr2, [strData bytes], [strData length] + 1);
        cstr2[[strData length]] = '\0';
        NSLog(@"    Converted(2): \"%s\"  (expected length: %u)",
              cstr2, [strData length]);
    }
};

NSString *myAccentStr = @"José";

// Try out whatever encoding you want
tryGetCStringUsingEncoding(myAccentStr, NSUTF8StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF16StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF32StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSMacOSRomanStringEncoding);

结果:

> Trying to convert "José" using encoding: 0x4
>     Converted(1): ""  (expected length: 5)
>     Converted(2): "José"  (expected length: 5)
> Trying to convert "José" using encoding: 0xA
>     Converted(1): ""  (expected length: 8)
>     Converted(2): "ˇ˛J"  (expected length: 10)
> Trying to convert "José" using encoding: 0x8C000100
>     Converted(1): ""  (expected length: 16)
>     Converted(2): "ˇ˛"  (expected length: 20)
> Trying to convert "José" using encoding: 0x1E
>     Converted(1): "-"  (expected length: 4)
>     Converted(2): "José"  (expected length: 4)

The docs for NSString getCString:maxLength:encoding says:

You can use canBeConvertedToEncoding: to check whether a string can be
losslessly converted to encoding. If it can’t, you can use
dataUsingEncoding:allowLossyConversion: to get a C-string
representation using encoding, allowing some loss of information (note
that the data returned by dataUsingEncoding:allowLossyConversion: is
not a strict C-string since it does not have a NULL terminator).

Using the NSString method dataUsingEncoding:allowLossyConversion: does the trick. Here's a code example:

NSString *myAccentStr = @"José";
char str[[myAccentStr length] + 1];

// NSString * to C String (char*)
NSData *strData = [myAccentStr dataUsingEncoding:NSMacOSRomanStringEncoding 
                                allowLossyConversion:YES];
memcpy(str, [strData bytes], [strData length] + 1);
str[[myAccentStr length]] = '\0';
NSLog(@"str (from NSString* to c string): %s", str);

// C String (char*) to NSString *   
NSString *newAccentStr = [NSString stringWithCString:str 
                                            encoding:NSMacOSRomanStringEncoding];
NSLog(@"newAccentStr (from c string to NSString*):  %@", newAccentStr);

The output from that NSLog is:

str (from NSString* to c string): José

newAccentStr (from c string to NSString*): José

So far I've only seen this work properly when using the NSMacOSRomanStringEncoding.


Edit

Changing this to a community wiki. Please feel free to edit.

hooleyhoop had some great points, so I thought I would try to make code that is as verbose as possible. If I'm missing anything, someone please chime in.

Also - Not sure why [NSString canBeConvertedToEncoding:] is returning YES even though the [NSString getCString:maxLength:encoding:] function definitely isn't working right (as seen by the output).

Here's some code to help in analyzing what works / what doesn't:

// Define Block variable to tests out different encodings
void (^tryGetCStringUsingEncoding)(NSString*, NSStringEncoding) = ^(NSString* originalNSString, NSStringEncoding encoding) {
    NSLog(@"Trying to convert \"%@\" using encoding: 0x%X", originalNSString, encoding);
    BOOL canEncode = [originalNSString canBeConvertedToEncoding:encoding];
    if (!canEncode)
    {
        NSLog(@"    Can not encode \"%@\" using encoding %X", originalNSString, encoding);
    }
    else
    {
        // Try encoding using NSString getCString:maxLength:encoding:
        NSUInteger cStrLength = [originalNSString lengthOfBytesUsingEncoding:encoding];
        char cstr[cStrLength];
        [originalNSString getCString:cstr maxLength:cStrLength encoding:encoding];
        NSLog(@"    Converted(1): \"%s\"  (expected length: %u)",
              cstr, cStrLength);

        // Try encoding using NSString dataUsingEncoding:allowLossyConversion:          
        NSData *strData = [originalNSString dataUsingEncoding:encoding allowLossyConversion:YES];
        char cstr2[[strData length] + 1];
        memcpy(cstr2, [strData bytes], [strData length] + 1);
        cstr2[[strData length]] = '\0';
        NSLog(@"    Converted(2): \"%s\"  (expected length: %u)",
              cstr2, [strData length]);
    }
};

NSString *myAccentStr = @"José";

// Try out whatever encoding you want
tryGetCStringUsingEncoding(myAccentStr, NSUTF8StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF16StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSUTF32StringEncoding);
tryGetCStringUsingEncoding(myAccentStr, NSMacOSRomanStringEncoding);

Results:

> Trying to convert "José" using encoding: 0x4
>     Converted(1): ""  (expected length: 5)
>     Converted(2): "José"  (expected length: 5)
> Trying to convert "José" using encoding: 0xA
>     Converted(1): ""  (expected length: 8)
>     Converted(2): "ˇ˛J"  (expected length: 10)
> Trying to convert "José" using encoding: 0x8C000100
>     Converted(1): ""  (expected length: 16)
>     Converted(2): "ˇ˛"  (expected length: 20)
> Trying to convert "José" using encoding: 0x1E
>     Converted(1): "-"  (expected length: 4)
>     Converted(2): "José"  (expected length: 4)
泛泛之交 2024-12-10 19:22:26

[aString length] 返回字符数。在您的情况下,这是4

您可以使用例如 NSUTF8StringEncodingNSUTF16StringEncodingNSUTF32StringEncoding 准确地将字符串转换为 ac 字符串。长度以字节为单位分别为5816

NSString *myAccentStr = @"José";
NSUInteger l1 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
NSUInteger l2 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF16StringEncoding];
NSUInteger l3 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF32StringEncoding];
NSLog(@"%ld %ld %ld", (long)l1, (long)l2, (long)l3);

> 5, 8, 16

出于转换目的,您应该使用 -maximumLengthOfBytesUsingEncoding 而不是 -lengthOfBytesUsingEncoding

始终使用 -canBeConvertedToEncoding 检查转换是否有效

使用 NSString 有充分的理由

[aString length] returns the number of characters. In your case this is 4.

You can convert your string to a c string accurately using, for example, NSUTF8StringEncoding, NSUTF16StringEncoding, NSUTF32StringEncoding. The length in bytes would be 5, 8, 16 respectively.

NSString *myAccentStr = @"José";
NSUInteger l1 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
NSUInteger l2 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF16StringEncoding];
NSUInteger l3 = [myAccentStr lengthOfBytesUsingEncoding:NSUTF32StringEncoding];
NSLog(@"%ld %ld %ld", (long)l1, (long)l2, (long)l3);

> 5, 8, 16

For conversion purposes you should use -maximumLengthOfBytesUsingEncoding instead of -lengthOfBytesUsingEncoding

Always check that the conversion is valid with -canBeConvertedToEncoding

There are good reasons to use NSString

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文