wchar_t* 到 char* 转换问题

发布于 2024-12-22 22:02:26 字数 737 浏览 2 评论 0原文

我在将 wchar_t* 转换为 char* 时遇到问题。

我从 FILE_NOTIFY_INFORMATION 结构中获取 wchar_t* 字符串,该字符串由 ReadDirectoryChangesW WinAPI 函数返回,因此我认为该字符串是正确的。

假设 wchar 字符串是“New Text File.txt” 在 Visual Studio 调试器中,将鼠标悬停在变量上时会显示“N”和一些未知的中文字母。尽管在手表中字符串被正确表示。

当我尝试使用 wcstombs 将 wchar 转换为 char 时,

wcstombs(pfileName, pwfileName, fileInfo.FileNameLength);

它仅将两个字母转换为 char* (“Ne”),然后生成错误。

wcstombs.c 中的函数 _wcstombs_l_helper() 在此块中出现一些内部错误:

if (*pwcs > 255)  /* validate high byte */
{
    errno = EILSEQ;
    return (size_t)-1;  /* error */
}

它不会作为异常抛出。

可能是什么问题?

I have a problem with wchar_t* to char* conversion.

I'm getting a wchar_t* string from the FILE_NOTIFY_INFORMATION structure, returned by the ReadDirectoryChangesW WinAPI function, so I assume that string is correct.

Assume that wchar string is "New Text File.txt"
In Visual Studio debugger when hovering on variable in shows "N" and some unknown Chinese letters. Though in watches string is represented correctly.

When I try to convert wchar to char with wcstombs

wcstombs(pfileName, pwfileName, fileInfo.FileNameLength);

it converts just two letters to char* ("Ne") and then generates an error.

Some internal error in wcstombs.c at function _wcstombs_l_helper() at this block:

if (*pwcs > 255)  /* validate high byte */
{
    errno = EILSEQ;
    return (size_t)-1;  /* error */
}

It's not thrown up as exception.

What can be the problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

说好的呢 2024-12-29 22:02:26

为了以正确的方式做您想做的事情,您需要考虑一些重要的事情。我会尽力在这里为你分解它们。

让我们从 wcstombs() 中的 count 参数的定义开始 MSDN 上的函数文档

多字节输出字符串中可以存储的最大字节数。

请注意,这并没有说明宽字符输入字符串中宽字符的数量。尽管示例输入字符串(“New Text File.txt”)中的所有宽字符都可以表示为单字节 ASCII 字符,但我们不能假设输入字符串中的每个宽字符都会在输出中生成一个字节string 代表每个可能的输入字符串(如果这个说法让您感到困惑,您应该查看 Joel 关于 Unicode 和字符集的文章 )。因此,如果您向 wcstombs() 传递输出缓冲区的大小,它如何知道输入字符串有多长?文档指出,根据标准 C 语言约定,输入字符串应以 null 结尾:

如果 wcstombs 在 count 发生之前或发生时遇到宽字符空字符 (L'\0'),则会将其转换为 8 位 0 并停止。

尽管文档中没有明确说明这一点,但我们可以推断,如果输入字符串不是以 null 结尾,则 wcstombs() 将继续读取宽字符,直到写入 count< /code> 字节到输出字符串。因此,如果您正在处理非空终止的宽字符串,仅仅知道输入字符串的长度是不够的;还需要知道输入字符串的长度。您必须以某种方式确切地知道输出字符串需要多少字节(如果不进行转换就无法确定)并将其作为 count 参数传递以进行 wcstombs() 做你想让它做的事。

为什么我如此关注这个空终止问题?因为MSDN上的FILE_NOTIFY_INFORMATION结构文档有这个说说它的 FileName 字段:

包含相对于目录句柄的文件名的可变长度字段。文件名采用 Unicode 字符格式,并且不以 null 结尾。

FileName 字段不是以 null 结尾的事实解释了为什么当您在调试器中查看它时,它的末尾有一堆“未知的中文字母”。 FILE_NOTIFY_INFORMATION 结构的文档还包含有关 FileNameLength 字段的另一个宝贵知识:

记录的文件名部分的大小(以字节为单位)。

请注意,这里表示的是字节,而不是字符。因此,即使您想假设输入字符串中的每个宽字符都会在输出字符串中生成一个字节,您也不应该为 count 传递 fileInfo.FileNameLength >;您应该传递 fileInfo.FileNameLength / sizeof(WCHAR) (当然,或者使用以 null 结尾的输入字符串)。将所有这些信息放在一起,我们终于可以理解为什么您对 wcstombs() 的原始调用失败了:它读取了字符串末尾并阻塞了无效数据(从而触发了 EILSEQ 错误)。

现在我们已经阐明了问题,是时候讨论可能的解决方案了。为了以正确的方式做到这一点,您需要知道的第一件事是输出缓冲区需要有多大。幸运的是,wcstombs() 的文档中有一个最后的花絮可以帮助我们:

如果mbstr参数为NULL,wcstombs返回目标字符串所需的大小(以字节为单位)。

因此,使用 wcstombs() 函数的惯用方法是调用它两次:第一次确定输出缓冲区需要多大,第二次实际执行转换。最后要注意的是,正如我们之前所说,至少在第一次调用 wcstombs() 时,宽字符输入字符串需要以 null 结尾。

将所有这些放在一起,这里有一段代码可以完成您想要做的事情:

size_t fileNameLengthInWChars = fileInfo.FileNameLength / sizeof(WCHAR); //get the length of the filename in characters
WCHAR *pwNullTerminatedFileName = new WCHAR[fileNameLengthInWChars + 1]; //allocate an intermediate buffer to hold a null-terminated version of fileInfo.FileName; +1 for null terminator
wcsncpy(pwNullTerminatedFileName, fileInfo.FileName, fileNameLengthInWChars); //copy the filename into a the intermediate buffer
pwNullTerminatedFileName[fileNameLengthInWChars] = L'\0'; //null terminate the new buffer
size_t fileNameLengthInChars = wcstombs(NULL, pwNullTerminatedFileName, 0); //first call to wcstombs() determines how long the output buffer needs to be
char *pFileName = new char[fileNameLengthInChars + 1]; //allocate the final output buffer; +1 to leave room for null terminator
wcstombs(pFileName, pwNullTerminatedFileName, fileNameLengthInChars + 1); //finally do the conversion!

当然,不要忘记调用 delete[] pwNullTermminateFileNamedelete[] pFileName code> 当你完成清理工作时。

最后一件事

写完这个答案后,我更仔细地重新阅读了你的问题,并想到了你可能犯的另一个错误。您说 wcstombs() 在转换前两个字母(“Ne”)后失败,这意味着它在输入字符串中前两个宽字符之后遇到了未初始化的数据。您是否碰巧使用赋值运算符将一个 FILE_NOTIFY_INFORMATION 变量复制到另一个变量?例如,

FILE_NOTIFY_INFORMATION fileInfo = someOtherFileInfo;

如果您这样做,它只会将 someOtherFileInfo.FileName 的前两个宽字符复制到 fileInfo.FileName。为了理解为什么会出现这种情况,请考虑 FILE_NOTIFY_INFORMATION 结构的声明:

typedef struct _FILE_NOTIFY_INFORMATION {
  DWORD NextEntryOffset;
  DWORD Action;
  DWORD FileNameLength;
  WCHAR FileName[1];
} FILE_NOTIFY_INFORMATION, *PFILE_NOTIFY_INFORMATION;

当编译器生成赋值操作的代码时,它不理解使用 拉取的诡计FileName 是一个可变长度字段,因此它只是将 sizeof(FILE_NOTIFY_INFORMATION) 字节从 someOtherFileInfo 复制到 fileInfo。由于 FileName 被声明为一个包含一个 WCHAR 的数组,您可能会认为只会复制一个字符,但编译器会将该结构填充为额外的两个字节长(因此它的长度是 int 大小的整数倍),这就是为什么还要复制第二个 WCHAR 的原因。

In order to do what you're trying to do The Right Way, there are several nontrivial things that you need to take into account. I'll do my best to break them down for you here.

Let's start with the definition of the count parameter from the wcstombs() function's documentation on MSDN:

The maximum number of bytes that can be stored in the multibyte output string.

Note that this does NOT say anything about the number of wide characters in the wide character input string. Even though all of the wide characters in your example input string ("New Text File.txt") can be represented as single-byte ASCII characters, we cannot assume that each wide character in the input string will generate exactly one byte in the output string for every possible input string (if this statement confuses you, you should check out Joel's article on Unicode and character sets). So, if you pass wcstombs() the size of the output buffer, how does it know how long the input string is? The documentation states that the input string is expected to be null-terminated, as per the standard C language convention:

If wcstombs encounters the wide-character null character (L'\0') either before or when count occurs, it converts it to an 8-bit 0 and stops.

Though this isn't explicitly stated in the documentation, we can infer that if the input string isn't null-terminated, wcstombs() will keep reading wide characters until it has written count bytes to the output string. So if you're dealing with a wide character string that isn't null-terminated, it isn't enough to just know how long the input string is; you would have to somehow know exactly how many bytes the output string would need to be (which is impossible to determine without doing the conversion) and pass that as the count parameter to make wcstombs() do what you want it to do.

Why am I focusing so much on this null-termination issue? Because the FILE_NOTIFY_INFORMATION structure's documentation on MSDN has this to say about its FileName field:

A variable-length field that contains the file name relative to the directory handle. The file name is in the Unicode character format and is not null-terminated.

The fact that the FileName field isn't null-terminated explains why it has a bunch of "unknown Chinese letters" at the end of it when you look at it in the debugger. The FILE_NOTIFY_INFORMATION structure's documentation also contains another nugget of wisdom regarding the FileNameLength field:

The size of the file name portion of the record, in bytes.

Note that this says bytes, not characters. Therefore, even if you wanted to assume that each wide character in the input string will generate exactly one byte in the output string, you shouldn't be passing fileInfo.FileNameLength for count; you should be passing fileInfo.FileNameLength / sizeof(WCHAR) (or use a null-terminated input string, of course). Putting all of this information together, we can finally understand why your original call to wcstombs() was failing: it was reading past the end of the string and choking on invalid data (thereby triggering the EILSEQ error).

Now that we've elucidated the problem, it's time to talk about a possible solution. In order to do this The Right Way, the first thing you need to know is how big your output buffer needs to be. Luckily, there is one final tidbit in the documentation for wcstombs() that will help us out here:

If the mbstr argument is NULL, wcstombs returns the required size in bytes of the destination string.

So the idiomatic way to use the wcstombs() function is to call it twice: the first time to determine how big your output buffer needs to be, and the second time to actually do the conversion. The final thing to note is that as we stated previously, the wide character input string needs to be null-terminated for at least the first call to wcstombs().

Putting this all together, here is a snippet of code that does what you are trying to do:

size_t fileNameLengthInWChars = fileInfo.FileNameLength / sizeof(WCHAR); //get the length of the filename in characters
WCHAR *pwNullTerminatedFileName = new WCHAR[fileNameLengthInWChars + 1]; //allocate an intermediate buffer to hold a null-terminated version of fileInfo.FileName; +1 for null terminator
wcsncpy(pwNullTerminatedFileName, fileInfo.FileName, fileNameLengthInWChars); //copy the filename into a the intermediate buffer
pwNullTerminatedFileName[fileNameLengthInWChars] = L'\0'; //null terminate the new buffer
size_t fileNameLengthInChars = wcstombs(NULL, pwNullTerminatedFileName, 0); //first call to wcstombs() determines how long the output buffer needs to be
char *pFileName = new char[fileNameLengthInChars + 1]; //allocate the final output buffer; +1 to leave room for null terminator
wcstombs(pFileName, pwNullTerminatedFileName, fileNameLengthInChars + 1); //finally do the conversion!

Of course, don't forget to call delete[] pwNullTerminatedFileName and delete[] pFileName when you're done with them to clean up.

ONE LAST THING

After writing this answer, I reread your question a bit more closely and thought of another mistake you may be making. You say that wcstombs() fails after just converting the first two letters ("Ne"), which means that it's hitting uninitialized data in the input string after the first two wide characters. Did you happen to use the assignment operator to copy one FILE_NOTIFY_INFORMATION variable to another? For example,

FILE_NOTIFY_INFORMATION fileInfo = someOtherFileInfo;

If you did this, it would only copy the first two wide characters of someOtherFileInfo.FileName to fileInfo.FileName. In order to understand why this is the case, consider the declaration of the FILE_NOTIFY_INFORMATION structure:

typedef struct _FILE_NOTIFY_INFORMATION {
  DWORD NextEntryOffset;
  DWORD Action;
  DWORD FileNameLength;
  WCHAR FileName[1];
} FILE_NOTIFY_INFORMATION, *PFILE_NOTIFY_INFORMATION;

When the compiler generates code for the assignment operation, it does't understand the trickery that is being pulled with FileName being a variable length field, so it just copies sizeof(FILE_NOTIFY_INFORMATION) bytes from someOtherFileInfo to fileInfo. Since FileName is declared as an array of one WCHAR, you would think that only one character would be copied, but the compiler pads the struct to be an extra two bytes long (so that its length is an integer multiple of the size of an int), which is why a second WCHAR is copied as well.

爱本泡沫多脆弱 2024-12-29 22:02:26

我的猜测是您传递的宽字符串无效或定义不正确。

pwFileName 是如何定义的?看来您有一个 FILE_NOTIFY_INFORMATION 结构定义为 fileInfo,那么为什么不使用 fileInfo.FileName ,如下所示?

wcstombs(pfileName, fileInfo.FileName, fileInfo.FileNameLength);

My guess is that the wide string that you are passing is invalid or incorrectly defined.

How is pwFileName defined? It seems you have a FILE_NOTIFY_INFORMATION structure defined as fileInfo, so why are you not using fileInfo.FileName, as shown below?

wcstombs(pfileName, fileInfo.FileName, fileInfo.FileNameLength);
稀香 2024-12-29 22:02:26

你得到的错误说明了一切,它发现了一个无法转换为MB的字符(因为它没有MB的表示形式),来源

如果 wcstombs 遇到宽字符,它无法转换为
多字节字符,它返回 –1 转换为 size_t 类型并设置 errno
到EILSEQ

在这种情况下,您应该避免“假设”输入,并给出失败的实际测试用例。

the error you get says it all, it found a character that it cannot convert to MB (cause it has no representation in MB), source:

If wcstombs encounters a wide character it cannot convert to a
multibyte character, it returns –1 cast to type size_t and sets errno
to EILSEQ

In cases like this you should avoid 'assumed' input, and give an actual test case that fails.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文