将 char 转换为 wchar_t 时出现问题(长度错误)

发布于 2024-09-26 17:50:01 字数 1401 浏览 5 评论 0原文

我正在尝试创建一个简单的数据结构,以便可以轻松地在 ASCII 字符串和 Unicode 字符串之间来回转换。我的问题是,函数 mbstowcs 返回的长度是正确的,但函数 wcslen 在新创建的 wchar_t 字符串上返回的长度不正确。我在这里错过了什么吗?

typedef struct{

    wchar_t *string;
    long length; // I have also tried int, and size_t
} String;

void setCString(String *obj, char *str){

    obj->length = strlen(str);

    free(obj->string); // Free original string
    obj->string = (wchar_t *)malloc((obj->length + 1) * sizeof(wchar_t)); //Allocate space for new string to be copied to

    //memset(obj->string,'\0',(obj->length + 1)); NOTE: I tried this but it doesn't make any difference

    size_t length = 0;

    length = mbstowcs(obj->string, (const char *)str, obj->length);

    printf("Length = %d\n",(int)length); // Prints correct length
    printf("!C string %s converted to wchar string %ls\n",str,obj->string); //obj->string is of a wcslen size larger than Length above...

    if(length != wcslen(obj->string))
            printf("Length failure!\n");

    if(length == -1)
    {
        //Conversion failed, set string to NULL terminated character
        free(obj->string);
        obj->string = (wchar_t *)malloc(sizeof(wchar_t));
        obj->string = L'\0';
    }
    else
    {
        //Conversion worked! but wcslen (and printf("%ls)) show the string is actually larger than length
        //do stuff
    }
}

I am trying to create a simple datastructure that will make it easy to convert back and forth between ASCII strings and Unicode strings. My issue is that the length returned by the function mbstowcs is correct but the length returned by the function wcslen, on the newly created wchar_t string, is not. Am I missing something here?

typedef struct{

    wchar_t *string;
    long length; // I have also tried int, and size_t
} String;

void setCString(String *obj, char *str){

    obj->length = strlen(str);

    free(obj->string); // Free original string
    obj->string = (wchar_t *)malloc((obj->length + 1) * sizeof(wchar_t)); //Allocate space for new string to be copied to

    //memset(obj->string,'\0',(obj->length + 1)); NOTE: I tried this but it doesn't make any difference

    size_t length = 0;

    length = mbstowcs(obj->string, (const char *)str, obj->length);

    printf("Length = %d\n",(int)length); // Prints correct length
    printf("!C string %s converted to wchar string %ls\n",str,obj->string); //obj->string is of a wcslen size larger than Length above...

    if(length != wcslen(obj->string))
            printf("Length failure!\n");

    if(length == -1)
    {
        //Conversion failed, set string to NULL terminated character
        free(obj->string);
        obj->string = (wchar_t *)malloc(sizeof(wchar_t));
        obj->string = L'\0';
    }
    else
    {
        //Conversion worked! but wcslen (and printf("%ls)) show the string is actually larger than length
        //do stuff
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

肩上的翅膀 2024-10-03 17:50:01

该代码似乎对我来说工作得很好。您能否提供更多上下文,例如您传递给它的字符串内容以及您正在使用的区域设置?

我注意到的其他一些错误/样式问题:

  • obj->length 保留为分配的长度,而不是更新以匹配(宽)字符的长度。这是你的意图吗?
  • 转换为 const char * 是无用且糟糕的风格。

编辑:经讨论,您可能正在使用不符合 Windows 版本的 mbstowcs 函数。如果是这样,您的问题应该更新以反映这一点。

编辑 2: 该代码恰好对我有用,因为 malloc 返回了一个新的零-填充的缓冲区。由于您将 obj->length 传递给 mbstowcs 作为写入目标的 wchar_t 值的最大数量,因此它将用完空格并且无法写入空终止符,除非源字符串中存在适当的多字节字符(需要多个字节的字符)。将其更改为 obj->length+1 ,它应该可以正常工作。

The code seems to work fine for me. Can you provide more context, such as the content of strings you're passing to it, and what locale you're using?

A few other bugs/style issues I noticed:

  • obj->length is left as the allocated length, rather than updated to match the length in (wide) characters. Is that your intention?
  • The cast to const char * is useless and bad style.

Edit: Upon discussion, it looks like you may be using a nonconformant Windows version of the mbstowcs function. If so, your question should be updated to reflect as such.

Edit 2: The code only happened to work for me because malloc returned a fresh, zero-filled buffer. Since you are passing obj->length to mbstowcs as the maximum number of wchar_t values to write to the destination, it will run out of space and not be able to write the null terminator unless there's a proper multibyte character (one which requires more than a single byte) in the source string. Change this to obj->length+1 and it should work fine.

路还长,别太狂 2024-10-03 17:50:01

您需要传递给 mbstowcs() 的长度包括 L'\0' 终止符,但您在 obj 中计算的长度->length() 不包含它 - 您需要将 1 添加到传递给 mbstowcs() 的值。

此外,不应使用 strlen(str) 来确定转换后字符串的长度,而应使用 mbstowcs(0, src, 0) + 1。您还应该将 str 的类型更改为 const char *,并省略强制转换。 realloc() 可以用来代替 free() / malloc() 对。总的来说,它应该是这样的:

typedef struct {
    wchar_t *string;
    size_t length;
} String;

void setCString(String *obj, const char *str)
{
    obj->length = mbstowcs(0, src, 0);
    obj->string = realloc(obj->string, (obj->length + 1) * sizeof(wchar_t)); 

    size_t length = mbstowcs(obj->string, str, obj->length + 1);

    printf("Length = %zu\n", length);
    printf("!C string %s converted to wchar string %ls\n", str, obj->string);

    if (length != wcslen(obj->string))
            printf("Length failure!\n");

    if (length == (size_t)-1)
    {
        //Conversion failed, set string to NULL terminated character
        obj->string = realloc(obj->string, sizeof(wchar_t));
        obj->string = L'\0';
    }
    else
    {
        //Conversion worked!
        //do stuff
    }
}

Mark Benningfield 指出 mbstowcs(0, src, 0) 是 C 标准的 POSIX / XSI 扩展 - 要仅在标准 C 下获得所需的长度,您必须而是使用:

    const char *src_copy = src;
    obj->length = mbstowcs(NULL, &src_copy, 0, NULL);

The length you need to pass to mbstowcs() includes the L'\0' terminator character, but your calculated length in obj->length() does not include it - you need to add 1 to the value passed to mbstowcs().

In addition, instead of using strlen(str) to determine the length of the converted string, you should be using mbstowcs(0, src, 0) + 1. You should also change the type of str to const char *, and elide the cast. realloc() can be used in place of a free() / malloc() pair. Overall, it should look like:

typedef struct {
    wchar_t *string;
    size_t length;
} String;

void setCString(String *obj, const char *str)
{
    obj->length = mbstowcs(0, src, 0);
    obj->string = realloc(obj->string, (obj->length + 1) * sizeof(wchar_t)); 

    size_t length = mbstowcs(obj->string, str, obj->length + 1);

    printf("Length = %zu\n", length);
    printf("!C string %s converted to wchar string %ls\n", str, obj->string);

    if (length != wcslen(obj->string))
            printf("Length failure!\n");

    if (length == (size_t)-1)
    {
        //Conversion failed, set string to NULL terminated character
        obj->string = realloc(obj->string, sizeof(wchar_t));
        obj->string = L'\0';
    }
    else
    {
        //Conversion worked!
        //do stuff
    }
}

Mark Benningfield points out that mbstowcs(0, src, 0) is a POSIX / XSI extension to the C standard - to obtain the required length under only standard C, you must instead use:

    const char *src_copy = src;
    obj->length = mbstowcs(NULL, &src_copy, 0, NULL);
灰色世界里的红玫瑰 2024-10-03 17:50:01

我在 Ubuntu linux 上运行它,使用 UTF-8 作为语言环境。

以下是所要求的附加信息:

我使用完全分配的结构调用此函数并传入硬编码的“字符串”(不是 L“字符串”)。所以我用本质上是 setCString(*obj, "Hello!") 来调用该函数。

长度 = 6

!C 字符串 你好!转换为 wchar 字符串 Hello!xxxxxxxxxxxxxxxxxxxx

(其中 x = 随机数据)

长度失败!

供参考
printf("wcslen = %d\n",(int)wcslen(obj->string));打印出来为
wcslen = 11

I am running this on Ubuntu linux with UTF-8 as locale.

Here is the additional info as requested:

I am calling this function with a fully allocated structure and passing in a hard coded "string" (not a L"string"). so I call the function with what is essentially setCString(*obj, "Hello!").

Length = 6

!C string Hello! converted to wchar string Hello!xxxxxxxxxxxxxxxxxxxx

(where x = random data)

Length failure!

for reference
printf("wcslen = %d\n",(int)wcslen(obj->string)); prints out as
wcslen = 11

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文