将 char 转换为 wchar_t 时出现问题(长度错误)
我正在尝试创建一个简单的数据结构,以便可以轻松地在 ASCII 字符串和 Unicode 字符串之间来回转换。我的问题是,函数 mbstowcs 返回的长度是正确的,但函数 wcslen 在新创建的 wchar_t 字符串上返回的长度不正确。我在这里错过了什么吗?
typedef struct{
wchar_t *string;
long length; // I have also tried int, and size_t
} String;
void setCString(String *obj, char *str){
obj->length = strlen(str);
free(obj->string); // Free original string
obj->string = (wchar_t *)malloc((obj->length + 1) * sizeof(wchar_t)); //Allocate space for new string to be copied to
//memset(obj->string,'\0',(obj->length + 1)); NOTE: I tried this but it doesn't make any difference
size_t length = 0;
length = mbstowcs(obj->string, (const char *)str, obj->length);
printf("Length = %d\n",(int)length); // Prints correct length
printf("!C string %s converted to wchar string %ls\n",str,obj->string); //obj->string is of a wcslen size larger than Length above...
if(length != wcslen(obj->string))
printf("Length failure!\n");
if(length == -1)
{
//Conversion failed, set string to NULL terminated character
free(obj->string);
obj->string = (wchar_t *)malloc(sizeof(wchar_t));
obj->string = L'\0';
}
else
{
//Conversion worked! but wcslen (and printf("%ls)) show the string is actually larger than length
//do stuff
}
}
I am trying to create a simple datastructure that will make it easy to convert back and forth between ASCII strings and Unicode strings. My issue is that the length returned by the function mbstowcs is correct but the length returned by the function wcslen, on the newly created wchar_t string, is not. Am I missing something here?
typedef struct{
wchar_t *string;
long length; // I have also tried int, and size_t
} String;
void setCString(String *obj, char *str){
obj->length = strlen(str);
free(obj->string); // Free original string
obj->string = (wchar_t *)malloc((obj->length + 1) * sizeof(wchar_t)); //Allocate space for new string to be copied to
//memset(obj->string,'\0',(obj->length + 1)); NOTE: I tried this but it doesn't make any difference
size_t length = 0;
length = mbstowcs(obj->string, (const char *)str, obj->length);
printf("Length = %d\n",(int)length); // Prints correct length
printf("!C string %s converted to wchar string %ls\n",str,obj->string); //obj->string is of a wcslen size larger than Length above...
if(length != wcslen(obj->string))
printf("Length failure!\n");
if(length == -1)
{
//Conversion failed, set string to NULL terminated character
free(obj->string);
obj->string = (wchar_t *)malloc(sizeof(wchar_t));
obj->string = L'\0';
}
else
{
//Conversion worked! but wcslen (and printf("%ls)) show the string is actually larger than length
//do stuff
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
该代码似乎对我来说工作得很好。您能否提供更多上下文,例如您传递给它的字符串内容以及您正在使用的区域设置?
我注意到的其他一些错误/样式问题:
obj->length
保留为分配的长度,而不是更新以匹配(宽)字符的长度。这是你的意图吗?const char *
是无用且糟糕的风格。编辑:经讨论,您可能正在使用不符合 Windows 版本的mbstowcs
函数。如果是这样,您的问题应该更新以反映这一点。编辑 2: 该代码恰好对我有用,因为
malloc
返回了一个新的零-填充的缓冲区。由于您将obj->length
传递给mbstowcs
作为写入目标的wchar_t
值的最大数量,因此它将用完空格并且无法写入空终止符,除非源字符串中存在适当的多字节字符(需要多个字节的字符)。将其更改为 obj->length+1 ,它应该可以正常工作。The code seems to work fine for me. Can you provide more context, such as the content of strings you're passing to it, and what locale you're using?
A few other bugs/style issues I noticed:
obj->length
is left as the allocated length, rather than updated to match the length in (wide) characters. Is that your intention?const char *
is useless and bad style.Edit: Upon discussion, it looks like you may be using a nonconformant Windows version of thembstowcs
function. If so, your question should be updated to reflect as such.Edit 2: The code only happened to work for me because
malloc
returned a fresh, zero-filled buffer. Since you are passingobj->length
tombstowcs
as the maximum number ofwchar_t
values to write to the destination, it will run out of space and not be able to write the null terminator unless there's a proper multibyte character (one which requires more than a single byte) in the source string. Change this toobj->length+1
and it should work fine.您需要传递给
mbstowcs()
的长度包括L'\0'
终止符,但您在obj 中计算的长度->length()
不包含它 - 您需要将 1 添加到传递给mbstowcs()
的值。此外,不应使用
strlen(str)
来确定转换后字符串的长度,而应使用mbstowcs(0, src, 0) + 1
。您还应该将str
的类型更改为const char *
,并省略强制转换。realloc()
可以用来代替free() / malloc()
对。总的来说,它应该是这样的:Mark Benningfield 指出
mbstowcs(0, src, 0)
是 C 标准的 POSIX / XSI 扩展 - 要仅在标准 C 下获得所需的长度,您必须而是使用:The length you need to pass to
mbstowcs()
includes theL'\0'
terminator character, but your calculated length inobj->length()
does not include it - you need to add 1 to the value passed tombstowcs()
.In addition, instead of using
strlen(str)
to determine the length of the converted string, you should be usingmbstowcs(0, src, 0) + 1
. You should also change the type ofstr
toconst char *
, and elide the cast.realloc()
can be used in place of afree() / malloc()
pair. Overall, it should look like:Mark Benningfield points out that
mbstowcs(0, src, 0)
is a POSIX / XSI extension to the C standard - to obtain the required length under only standard C, you must instead use:我在 Ubuntu linux 上运行它,使用 UTF-8 作为语言环境。
以下是所要求的附加信息:
我使用完全分配的结构调用此函数并传入硬编码的“字符串”(不是 L“字符串”)。所以我用本质上是 setCString(*obj, "Hello!") 来调用该函数。
长度 = 6
!C 字符串 你好!转换为 wchar 字符串 Hello!xxxxxxxxxxxxxxxxxxxx
(其中 x = 随机数据)
长度失败!
供参考
printf("wcslen = %d\n",(int)wcslen(obj->string));打印出来为
wcslen = 11
I am running this on Ubuntu linux with UTF-8 as locale.
Here is the additional info as requested:
I am calling this function with a fully allocated structure and passing in a hard coded "string" (not a L"string"). so I call the function with what is essentially setCString(*obj, "Hello!").
Length = 6
!C string Hello! converted to wchar string Hello!xxxxxxxxxxxxxxxxxxxx
(where x = random data)
Length failure!
for reference
printf("wcslen = %d\n",(int)wcslen(obj->string)); prints out as
wcslen = 11