wcslen() 在 Xcode 和 VC 中的工作方式不同++

发布于 2025-01-04 21:23:32 字数 315 浏览 0 评论 0原文

我发现 VC++2010 中的 wcslen() 返回正确的字母计数；而 Xcode 则不然。例如，下面的代码在 VC++ 2010 中返回正确的 11，但在 Xcode 4.2 中返回错误的 17。

const wchar_t *p = L"123abc가1나1다";
size_t plen = wcslen(p);

我猜 Xcode 应用程序将 wchar_t 字符串作为 UTF-8 存储在内存中。这又是一件奇怪的事。

我怎样才能像Xcode中的VC++一样得到11？

原文

I found that wcslen() in VC++2010 returns correct count of letters; meanwhile Xcode does not.
For example, the code below returns correct 11 in VC++ 2010, but returns incorrect 17 in Xcode 4.2.

const wchar_t *p = L"123abc가1나1다";
size_t plen = wcslen(p);

I guess Xcode app stores wchar_t string as UTF-8 in memory. This is another strange thing.

How can I get 11 just like VC++ in Xcode too?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

乜一 2025-01-11 21:23:32

我在运行 MacOS X 10.7.2 (Xcode 4.2) 的 Mac Mini 上运行此程序：

#include <stdio.h>
#include <wchar.h>

int main(void)
{
    const wchar_t p[] = L"123abc가1나1다";
    size_t plen = wcslen(p);
    if (fwide(stdout, 1) <= 0)
    {
        fprintf(stderr, "Failed to make stdout wide-oriented\n");
        return -1;
    }
    wprintf(L"String <<%ls>>\n", p);
    putwc(L'\n', stdout);
    wprintf(L"Length = %zu\n", plen);
    for (size_t i = 0; i < sizeof(p)/sizeof(*p); i++)
        wprintf(L"Character %zu = 0x%X\n", i, p[i]);
    return 0;
}

当我对源文件进行十六进制转储时，我看到：

0x0000: 23 69 6E 63 6C 75 64 65 20 3C 73 74 64 69 6F 2E   #include <stdio.
0x0010: 68 3E 0A 23 69 6E 63 6C 75 64 65 20 3C 77 63 68   h>.#include <wch
0x0020: 61 72 2E 68 3E 0A 0A 69 6E 74 20 6D 61 69 6E 28   ar.h>..int main(
0x0030: 76 6F 69 64 29 0A 7B 0A 20 20 20 20 63 6F 6E 73   void).{.    cons
0x0040: 74 20 77 63 68 61 72 5F 74 20 70 5B 5D 20 3D 20   t wchar_t p[] = 
0x0050: 4C 22 31 32 33 61 62 63 EA B0 80 31 EB 82 98 31   L"123abc...1...1
0x0060: EB 8B A4 22 3B 0A 20 20 20 20 73 69 7A 65 5F 74   ...";.    size_t
0x0070: 20 70 6C 65 6E 20 3D 20 77 63 73 6C 65 6E 28 70    plen = wcslen(p
0x0080: 29 3B 0A 20 20 20 20 69 66 20 28 66 77 69 64 65   );.    if (fwide
0x0090: 28 73 74 64 6F 75 74 2C 20 31 29 20 3C 3D 20 30   (stdout, 1) <= 0
0x00A0: 29 0A 20 20 20 20 7B 0A 20 20 20 20 20 20 20 20   ).    {.        
0x00B0: 66 70 72 69 6E 74 66 28 73 74 64 65 72 72 2C 20   fprintf(stderr, 
0x00C0: 22 46 61 69 6C 65 64 20 74 6F 20 6D 61 6B 65 20   "Failed to make 
0x00D0: 73 74 64 6F 75 74 20 77 69 64 65 2D 6F 72 69 65   stdout wide-orie
0x00E0: 6E 74 65 64 5C 6E 22 29 3B 0A 20 20 20 20 20 20   nted\n");.      
0x00F0: 20 20 72 65 74 75 72 6E 20 2D 31 3B 0A 20 20 20     return -1;.   
0x0100: 20 7D 0A 20 20 20 20 77 70 72 69 6E 74 66 28 4C    }.    wprintf(L
0x0110: 22 53 74 72 69 6E 67 20 3C 3C 25 6C 73 3E 3E 5C   "String <<%ls>>\
0x0120: 6E 22 2C 20 70 29 3B 0A 20 20 20 20 70 75 74 77   n", p);.    putw
0x0130: 63 28 4C 27 5C 6E 27 2C 20 73 74 64 6F 75 74 29   c(L'\n', stdout)
0x0140: 3B 0A 20 20 20 20 77 70 72 69 6E 74 66 28 4C 22   ;.    wprintf(L"
0x0150: 4C 65 6E 67 74 68 20 3D 20 25 7A 75 5C 6E 22 2C   Length = %zu\n",
0x0160: 20 70 6C 65 6E 29 3B 0A 20 20 20 20 66 6F 72 20    plen);.    for 
0x0170: 28 73 69 7A 65 5F 74 20 69 20 3D 20 30 3B 20 69   (size_t i = 0; i
0x0180: 20 3C 20 73 69 7A 65 6F 66 28 70 29 2F 73 69 7A    < sizeof(p)/siz
0x0190: 65 6F 66 28 2A 70 29 3B 20 69 2B 2B 29 0A 20 20   eof(*p); i++).  
0x01A0: 20 20 20 20 20 20 77 70 72 69 6E 74 66 28 4C 22         wprintf(L"
0x01B0: 43 68 61 72 61 63 74 65 72 20 25 7A 75 20 3D 20   Character %zu = 
0x01C0: 30 78 25 58 5C 6E 22 2C 20 69 2C 20 70 5B 69 5D   0x%X\n", i, p[i]
0x01D0: 29 3B 0A 20 20 20 20 72 65 74 75 72 6E 20 30 3B   );.    return 0;
0x01E0: 0A 7D 0A                                          .}.
0x01E3:

使用 GCC 编译时的输出为：

String <<123abc
Length = 11
Character 0 = 0x31
Character 1 = 0x32
Character 2 = 0x33
Character 3 = 0x61
Character 4 = 0x62
Character 5 = 0x63
Character 6 = 0xAC00
Character 7 = 0x31
Character 8 = 0xB098
Character 9 = 0x31
Character 10 = 0xB2E4
Character 11 = 0x0

请注意，字符串在零处被截断byte - 我认为这可能是系统中的一个错误，但我似乎不太可能在第一次尝试使用 wprintf() 时找到一个错误，所以更有可能是我我在做有事吗。

没错，在多字节 UTF-8 源代码中，字符串占用 17 个字节（8 个一字节基本 Latin-1 字符，以及 3 个字符，每个字符使用 3 个字节编码）。因此，源字符串上的原始 strlen() 将返回 17 个字节。

GCC 版本是：

i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

只是为了咯咯笑，我尝试了 clang，但得到了不同的结果。编译使用：

clang -o row row.c -Wall -std=c99

使用：

Apple clang version 2.1 (tags/Apple/clang-163.7.1) (based on LLVM 3.0svn)
Target: x86_64-apple-darwin11.3.0
Thread model: posix

使用 clang 编译时的输出是：

String <<123abc가1나1다>>

Length = 17
Character 0 = 0x31
Character 1 = 0x32
Character 2 = 0x33
Character 3 = 0x61
Character 4 = 0x62
Character 5 = 0x63
Character 6 = 0xEA
Character 7 = 0xB0
Character 8 = 0x80
Character 9 = 0x31
Character 10 = 0xEB
Character 11 = 0x82
Character 12 = 0x98
Character 13 = 0x31
Character 14 = 0xEB
Character 15 = 0x8B
Character 16 = 0xA4
Character 17 = 0x0

所以，现在字符串显示正确，但长度给出为 17 而不是 11。表面上，您可以选择 bug - string看起来不错（在终端 - /Applications/Utilities/Terminal - 适应 UTF8），但长度错误，或者长度正确但字符串显示不正确。

我注意到，gcc 和 clang 中的 sizeof(wchar_t) 都是 4。

左手不明白右手在做什么。我认为有理由声称两者都以不同的方式被破坏了。

I ran this program on a Mac Mini running MacOS X 10.7.2 (Xcode 4.2):

#include <stdio.h>
#include <wchar.h>

int main(void)
{
    const wchar_t p[] = L"123abc가1나1다";
    size_t plen = wcslen(p);
    if (fwide(stdout, 1) <= 0)
    {
        fprintf(stderr, "Failed to make stdout wide-oriented\n");
        return -1;
    }
    wprintf(L"String <<%ls>>\n", p);
    putwc(L'\n', stdout);
    wprintf(L"Length = %zu\n", plen);
    for (size_t i = 0; i < sizeof(p)/sizeof(*p); i++)
        wprintf(L"Character %zu = 0x%X\n", i, p[i]);
    return 0;
}

When I do a hex dump of the source file, I see:

0x0000: 23 69 6E 63 6C 75 64 65 20 3C 73 74 64 69 6F 2E   #include <stdio.
0x0010: 68 3E 0A 23 69 6E 63 6C 75 64 65 20 3C 77 63 68   h>.#include <wch
0x0020: 61 72 2E 68 3E 0A 0A 69 6E 74 20 6D 61 69 6E 28   ar.h>..int main(
0x0030: 76 6F 69 64 29 0A 7B 0A 20 20 20 20 63 6F 6E 73   void).{.    cons
0x0040: 74 20 77 63 68 61 72 5F 74 20 70 5B 5D 20 3D 20   t wchar_t p[] = 
0x0050: 4C 22 31 32 33 61 62 63 EA B0 80 31 EB 82 98 31   L"123abc...1...1
0x0060: EB 8B A4 22 3B 0A 20 20 20 20 73 69 7A 65 5F 74   ...";.    size_t
0x0070: 20 70 6C 65 6E 20 3D 20 77 63 73 6C 65 6E 28 70    plen = wcslen(p
0x0080: 29 3B 0A 20 20 20 20 69 66 20 28 66 77 69 64 65   );.    if (fwide
0x0090: 28 73 74 64 6F 75 74 2C 20 31 29 20 3C 3D 20 30   (stdout, 1) <= 0
0x00A0: 29 0A 20 20 20 20 7B 0A 20 20 20 20 20 20 20 20   ).    {.        
0x00B0: 66 70 72 69 6E 74 66 28 73 74 64 65 72 72 2C 20   fprintf(stderr, 
0x00C0: 22 46 61 69 6C 65 64 20 74 6F 20 6D 61 6B 65 20   "Failed to make 
0x00D0: 73 74 64 6F 75 74 20 77 69 64 65 2D 6F 72 69 65   stdout wide-orie
0x00E0: 6E 74 65 64 5C 6E 22 29 3B 0A 20 20 20 20 20 20   nted\n");.      
0x00F0: 20 20 72 65 74 75 72 6E 20 2D 31 3B 0A 20 20 20     return -1;.   
0x0100: 20 7D 0A 20 20 20 20 77 70 72 69 6E 74 66 28 4C    }.    wprintf(L
0x0110: 22 53 74 72 69 6E 67 20 3C 3C 25 6C 73 3E 3E 5C   "String <<%ls>>\
0x0120: 6E 22 2C 20 70 29 3B 0A 20 20 20 20 70 75 74 77   n", p);.    putw
0x0130: 63 28 4C 27 5C 6E 27 2C 20 73 74 64 6F 75 74 29   c(L'\n', stdout)
0x0140: 3B 0A 20 20 20 20 77 70 72 69 6E 74 66 28 4C 22   ;.    wprintf(L"
0x0150: 4C 65 6E 67 74 68 20 3D 20 25 7A 75 5C 6E 22 2C   Length = %zu\n",
0x0160: 20 70 6C 65 6E 29 3B 0A 20 20 20 20 66 6F 72 20    plen);.    for 
0x0170: 28 73 69 7A 65 5F 74 20 69 20 3D 20 30 3B 20 69   (size_t i = 0; i
0x0180: 20 3C 20 73 69 7A 65 6F 66 28 70 29 2F 73 69 7A    < sizeof(p)/siz
0x0190: 65 6F 66 28 2A 70 29 3B 20 69 2B 2B 29 0A 20 20   eof(*p); i++).  
0x01A0: 20 20 20 20 20 20 77 70 72 69 6E 74 66 28 4C 22         wprintf(L"
0x01B0: 43 68 61 72 61 63 74 65 72 20 25 7A 75 20 3D 20   Character %zu = 
0x01C0: 30 78 25 58 5C 6E 22 2C 20 69 2C 20 70 5B 69 5D   0x%X\n", i, p[i]
0x01D0: 29 3B 0A 20 20 20 20 72 65 74 75 72 6E 20 30 3B   );.    return 0;
0x01E0: 0A 7D 0A                                          .}.
0x01E3:

The output when compiled with GCC is:

String <<123abc
Length = 11
Character 0 = 0x31
Character 1 = 0x32
Character 2 = 0x33
Character 3 = 0x61
Character 4 = 0x62
Character 5 = 0x63
Character 6 = 0xAC00
Character 7 = 0x31
Character 8 = 0xB098
Character 9 = 0x31
Character 10 = 0xB2E4
Character 11 = 0x0

Note that the string is truncated at the zero byte - I think that is probably a bug in the system, but it seems a little unlikely that I'd manage to find one on my first attempt at using wprintf(), so it is more likely I'm doing something wrong.

You're right, in the multi-byte UTF-8 source code, the string occupies 17 bytes (8 one-byte basic Latin-1 characters, and 3 characters each encoded using 3 bytes). So, the raw strlen() on the source string would return 17 bytes.

GCC version is:

i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Just for giggles, I tried clang, and I get a different result. Compiled using:

clang -o row row.c -Wall -std=c99

using:

Apple clang version 2.1 (tags/Apple/clang-163.7.1) (based on LLVM 3.0svn)
Target: x86_64-apple-darwin11.3.0
Thread model: posix

The output when compiled with clang is:

String <<123abc가1나1다>>

Length = 17
Character 0 = 0x31
Character 1 = 0x32
Character 2 = 0x33
Character 3 = 0x61
Character 4 = 0x62
Character 5 = 0x63
Character 6 = 0xEA
Character 7 = 0xB0
Character 8 = 0x80
Character 9 = 0x31
Character 10 = 0xEB
Character 11 = 0x82
Character 12 = 0x98
Character 13 = 0x31
Character 14 = 0xEB
Character 15 = 0x8B
Character 16 = 0xA4
Character 17 = 0x0

So, now the string appears correctly, but the length is given as 17 instead of 11. Superficially, you can take your choice of bugs - string looks OK (in a terminal - /Applications/Utilities/Terminal - acclimatized to UTF8) but length is wrong, or length is right but string does not appear correctly.

I note that sizeof(wchar_t) in both gcc and clang is 4.

The left hand does not understand what the right hand is doing. I think there's a case for claiming both are broken, in different ways.

回复收藏 0 原文

~没有更多了~