wchar_t* 到 char* 转换问题
我在将 wchar_t*
转换为 char*
时遇到问题。
我从 FILE_NOTIFY_INFORMATION
结构中获取 wchar_t*
字符串,该字符串由 ReadDirectoryChangesW
WinAPI 函数返回,因此我认为该字符串是正确的。
假设 wchar 字符串是“New Text File.txt” 在 Visual Studio 调试器中,将鼠标悬停在变量上时会显示“N”和一些未知的中文字母。尽管在手表中字符串被正确表示。
当我尝试使用 wcstombs
将 wchar 转换为 char 时,
wcstombs(pfileName, pwfileName, fileInfo.FileNameLength);
它仅将两个字母转换为 char*
(“Ne”),然后生成错误。
wcstombs.c 中的函数 _wcstombs_l_helper() 在此块中出现一些内部错误:
if (*pwcs > 255) /* validate high byte */
{
errno = EILSEQ;
return (size_t)-1; /* error */
}
它不会作为异常抛出。
可能是什么问题?
I have a problem with wchar_t*
to char*
conversion.
I'm getting a wchar_t*
string from the FILE_NOTIFY_INFORMATION
structure, returned by the ReadDirectoryChangesW
WinAPI function, so I assume that string is correct.
Assume that wchar string is "New Text File.txt"
In Visual Studio debugger when hovering on variable in shows "N" and some unknown Chinese letters. Though in watches string is represented correctly.
When I try to convert wchar to char with wcstombs
wcstombs(pfileName, pwfileName, fileInfo.FileNameLength);
it converts just two letters to char*
("Ne") and then generates an error.
Some internal error in wcstombs.c at function _wcstombs_l_helper() at this block:
if (*pwcs > 255) /* validate high byte */
{
errno = EILSEQ;
return (size_t)-1; /* error */
}
It's not thrown up as exception.
What can be the problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
为了以正确的方式做您想做的事情,您需要考虑一些重要的事情。我会尽力在这里为你分解它们。
让我们从 wcstombs() 中的
count
参数的定义开始 MSDN 上的函数文档:请注意,这并没有说明宽字符输入字符串中宽字符的数量。尽管示例输入字符串(“New Text File.txt”)中的所有宽字符都可以表示为单字节 ASCII 字符,但我们不能假设输入字符串中的每个宽字符都会在输出中生成一个字节string 代表每个可能的输入字符串(如果这个说法让您感到困惑,您应该查看 Joel 关于 Unicode 和字符集的文章 )。因此,如果您向
wcstombs()
传递输出缓冲区的大小,它如何知道输入字符串有多长?文档指出,根据标准 C 语言约定,输入字符串应以 null 结尾:尽管文档中没有明确说明这一点,但我们可以推断,如果输入字符串不是以 null 结尾,则
wcstombs()
将继续读取宽字符,直到写入count< /code> 字节到输出字符串。因此,如果您正在处理非空终止的宽字符串,仅仅知道输入字符串的长度是不够的;还需要知道输入字符串的长度。您必须以某种方式确切地知道输出字符串需要多少字节(如果不进行转换就无法确定)并将其作为
count
参数传递以进行wcstombs()
做你想让它做的事。为什么我如此关注这个空终止问题?因为MSDN上的
FILE_NOTIFY_INFORMATION
结构文档有这个说说它的FileName
字段:FileName
字段不是以 null 结尾的事实解释了为什么当您在调试器中查看它时,它的末尾有一堆“未知的中文字母”。FILE_NOTIFY_INFORMATION
结构的文档还包含有关FileNameLength
字段的另一个宝贵知识:请注意,这里表示的是字节,而不是字符。因此,即使您想假设输入字符串中的每个宽字符都会在输出字符串中生成一个字节,您也不应该为
count
传递fileInfo.FileNameLength
>;您应该传递fileInfo.FileNameLength / sizeof(WCHAR)
(当然,或者使用以 null 结尾的输入字符串)。将所有这些信息放在一起,我们终于可以理解为什么您对wcstombs()
的原始调用失败了:它读取了字符串末尾并阻塞了无效数据(从而触发了EILSEQ 错误)。
现在我们已经阐明了问题,是时候讨论可能的解决方案了。为了以正确的方式做到这一点,您需要知道的第一件事是输出缓冲区需要有多大。幸运的是,
wcstombs()
的文档中有一个最后的花絮可以帮助我们:因此,使用 wcstombs() 函数的惯用方法是调用它两次:第一次确定输出缓冲区需要多大,第二次实际执行转换。最后要注意的是,正如我们之前所说,至少在第一次调用
wcstombs()
时,宽字符输入字符串需要以 null 结尾。将所有这些放在一起,这里有一段代码可以完成您想要做的事情:
当然,不要忘记调用
delete[] pwNullTermminateFileName
和delete[] pFileName
code> 当你完成清理工作时。最后一件事
写完这个答案后,我更仔细地重新阅读了你的问题,并想到了你可能犯的另一个错误。您说
wcstombs()
在转换前两个字母(“Ne”)后失败,这意味着它在输入字符串中前两个宽字符之后遇到了未初始化的数据。您是否碰巧使用赋值运算符将一个 FILE_NOTIFY_INFORMATION 变量复制到另一个变量?例如,如果您这样做,它只会将
someOtherFileInfo.FileName
的前两个宽字符复制到fileInfo.FileName
。为了理解为什么会出现这种情况,请考虑 FILE_NOTIFY_INFORMATION 结构的声明:当编译器生成赋值操作的代码时,它不理解使用
拉取的诡计FileName
是一个可变长度字段,因此它只是将sizeof(FILE_NOTIFY_INFORMATION)
字节从someOtherFileInfo
复制到fileInfo
。由于FileName
被声明为一个包含一个WCHAR
的数组,您可能会认为只会复制一个字符,但编译器会将该结构填充为额外的两个字节长(因此它的长度是int
大小的整数倍),这就是为什么还要复制第二个WCHAR
的原因。In order to do what you're trying to do The Right Way, there are several nontrivial things that you need to take into account. I'll do my best to break them down for you here.
Let's start with the definition of the
count
parameter from thewcstombs()
function's documentation on MSDN:Note that this does NOT say anything about the number of wide characters in the wide character input string. Even though all of the wide characters in your example input string ("New Text File.txt") can be represented as single-byte ASCII characters, we cannot assume that each wide character in the input string will generate exactly one byte in the output string for every possible input string (if this statement confuses you, you should check out Joel's article on Unicode and character sets). So, if you pass
wcstombs()
the size of the output buffer, how does it know how long the input string is? The documentation states that the input string is expected to be null-terminated, as per the standard C language convention:Though this isn't explicitly stated in the documentation, we can infer that if the input string isn't null-terminated,
wcstombs()
will keep reading wide characters until it has writtencount
bytes to the output string. So if you're dealing with a wide character string that isn't null-terminated, it isn't enough to just know how long the input string is; you would have to somehow know exactly how many bytes the output string would need to be (which is impossible to determine without doing the conversion) and pass that as thecount
parameter to makewcstombs()
do what you want it to do.Why am I focusing so much on this null-termination issue? Because the
FILE_NOTIFY_INFORMATION
structure's documentation on MSDN has this to say about itsFileName
field:The fact that the
FileName
field isn't null-terminated explains why it has a bunch of "unknown Chinese letters" at the end of it when you look at it in the debugger. TheFILE_NOTIFY_INFORMATION
structure's documentation also contains another nugget of wisdom regarding theFileNameLength
field:Note that this says bytes, not characters. Therefore, even if you wanted to assume that each wide character in the input string will generate exactly one byte in the output string, you shouldn't be passing
fileInfo.FileNameLength
forcount
; you should be passingfileInfo.FileNameLength / sizeof(WCHAR)
(or use a null-terminated input string, of course). Putting all of this information together, we can finally understand why your original call towcstombs()
was failing: it was reading past the end of the string and choking on invalid data (thereby triggering theEILSEQ
error).Now that we've elucidated the problem, it's time to talk about a possible solution. In order to do this The Right Way, the first thing you need to know is how big your output buffer needs to be. Luckily, there is one final tidbit in the documentation for
wcstombs()
that will help us out here:So the idiomatic way to use the
wcstombs()
function is to call it twice: the first time to determine how big your output buffer needs to be, and the second time to actually do the conversion. The final thing to note is that as we stated previously, the wide character input string needs to be null-terminated for at least the first call towcstombs()
.Putting this all together, here is a snippet of code that does what you are trying to do:
Of course, don't forget to call
delete[] pwNullTerminatedFileName
anddelete[] pFileName
when you're done with them to clean up.ONE LAST THING
After writing this answer, I reread your question a bit more closely and thought of another mistake you may be making. You say that
wcstombs()
fails after just converting the first two letters ("Ne"), which means that it's hitting uninitialized data in the input string after the first two wide characters. Did you happen to use the assignment operator to copy oneFILE_NOTIFY_INFORMATION
variable to another? For example,If you did this, it would only copy the first two wide characters of
someOtherFileInfo.FileName
tofileInfo.FileName
. In order to understand why this is the case, consider the declaration of theFILE_NOTIFY_INFORMATION
structure:When the compiler generates code for the assignment operation, it does't understand the trickery that is being pulled with
FileName
being a variable length field, so it just copiessizeof(FILE_NOTIFY_INFORMATION)
bytes fromsomeOtherFileInfo
tofileInfo
. SinceFileName
is declared as an array of oneWCHAR
, you would think that only one character would be copied, but the compiler pads the struct to be an extra two bytes long (so that its length is an integer multiple of the size of anint
), which is why a secondWCHAR
is copied as well.我的猜测是您传递的宽字符串无效或定义不正确。
pwFileName
是如何定义的?看来您有一个FILE_NOTIFY_INFORMATION
结构定义为fileInfo
,那么为什么不使用fileInfo.FileName
,如下所示?My guess is that the wide string that you are passing is invalid or incorrectly defined.
How is
pwFileName
defined? It seems you have aFILE_NOTIFY_INFORMATION
structure defined asfileInfo
, so why are you not usingfileInfo.FileName
, as shown below?你得到的错误说明了一切,它发现了一个无法转换为MB的字符(因为它没有MB的表示形式),来源:
在这种情况下,您应该避免“假设”输入,并给出失败的实际测试用例。
the error you get says it all, it found a character that it cannot convert to MB (cause it has no representation in MB), source:
In cases like this you should avoid 'assumed' input, and give an actual test case that fails.