C++将 string(或 char*)转换为 wstring(或 wchar_t*)
string s = "おはよう";
wstring ws = FUNCTION(s, ws);
我如何将 s 的内容分配给 ws ?
搜索谷歌并使用了一些技术,但他们无法分配确切的内容。内容被扭曲。
string s = "おはよう";
wstring ws = FUNCTION(s, ws);
How would i assign the contents of s to ws?
Searched google and used some techniques but they can't assign the exact content. The content is distorted.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(20)
注意!请参阅底部的注意 (2023-10-05)!
假设示例中的输入字符串 (おはよう) 是 UTF-8 编码的(从外观来看,它不是,但为了解释起见,我们假设它是:-))Unicode 字符串的表示形式如果您感兴趣,那么您的问题可以仅使用标准库(C++11 及更高版本)来完全解决。
TL;DR版本:
更长的在线编译和运行示例:
(它们都显示相同的示例。只是有很多冗余......)
注意(旧):
正如评论中指出的那样,并在 https://stackoverflow.com/ 中进行了解释a/17106065/6345 在某些情况下,使用标准库在 UTF-8 和 UTF-16 之间进行转换时,可能会在不同平台上产生意外的结果差异。为了获得更好的转换,请考虑使用 http 中所述的
std::codecvt_utf8
://en.cppreference.com/w/cpp/locale/codecvt_utf8注意(新):
由于
codecvt
标头在 C++17 中已弃用,有人对此答案中提出的解决方案提出了一些担忧。然而,C++ 标准委员会在 中添加了一条重要声明http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html说所以在可预见的未来,本答案中的codecvt解决方案是安全且可移植的。
注意 (2023-10-05):
建议删除 C++26 中已弃用的
codecvt
和wstring_convert
:NOTE! See Note (2023-10-05) at the bottom!
Assuming that the input string in your example (おはよう) is a UTF-8 encoded (which it isn't, by the looks of it, but let's assume it is for the sake of this explanation :-)) representation of a Unicode string of your interest, then your problem can be fully solved with the standard library (C++11 and newer) alone.
The TL;DR version:
Longer online compilable and runnable example:
(They all show the same example. There are just many for redundancy...)
Note (old):
As pointed out in the comments and explained in https://stackoverflow.com/a/17106065/6345 there are cases when using the standard library to convert between UTF-8 and UTF-16 might give unexpected differences in the results on different platforms. For a better conversion, consider
std::codecvt_utf8
as described on http://en.cppreference.com/w/cpp/locale/codecvt_utf8Note (new):
Since the
codecvt
header is deprecated in C++17, some worry about the solution presented in this answer were raised. However, the C++ standards committee added an important statement in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html sayingSo in the foreseeable future, the
codecvt
solution in this answer is safe and portable.Note (2023-10-05):
Proposal to remove the deprecated
codecvt
andwstring_convert
in C++26:你的问题没有具体说明。严格来说,这个例子是一个语法错误。但是,
std::mbstowcs
是可能就是您正在寻找的。它是一个 C 库函数,在缓冲区上运行,但这里有一个易于使用的习惯用法,由 Mooing Duck 提供:
Your question is underspecified. Strictly, that example is a syntax error. However,
std::mbstowcs
is probably what you're looking for.It is a C-library function and operates on buffers, but here's an easy-to-use idiom, courtesy of Mooing Duck:
如果您使用 Windows/Visual Studio 并且需要将字符串转换为 wstring,您可以使用:
将 wstring 转换为字符串的相同过程(有时您需要指定一个代码页):
您可以指定代码页,甚至UTF8(这在使用JNI/Java时非常好) 。此答案中显示了将 std::wstring 转换为 utf8 std::string 的标准方式。
如果您想了解有关代码页的更多信息,请参阅 Joel 有关软件的有趣文章:每个软件开发人员绝对必须了解 Unicode 和字符集的绝对最低限度。
这些 CA2W(将 Ansi 转换为 Wide=unicode)宏是 ATL 和 MFC 字符串转换的一部分宏,包括示例。
有时您需要禁用安全警告#4995',我不知道其他解决方法(对我来说,当我在 VS2012 中为 WindowsXp 编译时会发生这种情况)。
编辑:
好吧,根据 这篇文章,Joel 的文章似乎是:“虽然很有趣,但它对实际情况的了解相当少技术细节”。文章:每个程序员绝对需要了解处理文本的编码和字符集。
If you are using Windows/Visual Studio and need to convert a string to wstring you could use:
Same procedure for converting a wstring to string (sometimes you will need to specify a codepage):
You could specify a codepage and even UTF8 (that's pretty nice when working with JNI/Java). A standard way of converting a std::wstring to utf8 std::string is showed in this answer.
If you want to know more about codepages there is an interesting article on Joel on Software: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.
These CA2W (Convert Ansi to Wide=unicode) macros are part of ATL and MFC String Conversion Macros, samples included.
Sometimes you will need to disable the security warning #4995', I don't know of other workaround (to me it happen when I compiled for WindowsXp in VS2012).
Edit:
Well, according to this article the article by Joel appears to be: "while entertaining, it is pretty light on actual technical details". Article: What Every Programmer Absolutely, Positively Needs To Know About Encoding And Character Sets To Work With Text.
仅限 Windows API,C++11 之前的实现,以防有人需要:
Windows API only, pre C++11 implementation, in case someone needs it:
下面是将
string
、wstring
和混合字符串常量组合到wstring
的方法。使用wstringstream
类。这不适用于多字节字符编码。这只是一种愚蠢的方式,抛弃了类型安全性,并将 std::string 中的 7 位字符扩展为 std::wstring 每个字符的低 7 位。仅当您有 7 位 ASCII 字符串并且需要调用需要宽字符串的 API 时,这才有用。
Here's a way to combining
string
,wstring
and mixed string constants towstring
. Use thewstringstream
class.This does NOT work for multi-byte character encodings. This is just a dumb way of throwing away type safety and expanding 7 bit characters from std::string into the lower 7 bits of each character of std:wstring. This is only useful if you have a 7-bit ASCII strings and you need to call an API that requires wide strings.
从
char*
到wstring
:从
string
到wstring
:请注意,只有当要转换的字符串包含仅限 ASCII 字符。
From
char*
towstring
:From
string
towstring
:Note this only works well if the string being converted contains only ASCII characters.
它的这个变体是我在现实生活中最喜欢的。它将输入(如果有效)UTF-8 转换为相应的
wstring
。如果输入被损坏,wstring
就会由单个字节构造而成。如果您无法真正确定输入数据的质量,这将非常有用。This variant of it is my favourite in real life. It converts the input, if it is valid UTF-8, to the respective
wstring
. If the input is corrupted, thewstring
is constructed out of the single bytes. This is extremely helpful if you cannot really be sure about the quality of your input data.使用 Boost.Locale:
using Boost.Locale:
您可以使用 boost 路径或 std 路径;这要容易得多。
则 boost 路径对于跨平台应用程序来说更容易。
如果您喜欢使用 std:
c++ 旧版本,
其中的代码仍然实现一个转换器,您不必解开细节。
You can use boost path or std path; which is a lot more easier.
boost path is easier for cross-platform application
if you like to use std:
c++ older version
The code within still implement a converter which you dont have to unravel the detail.
对我来说,没有大开销的最简单的选项是:
包含:
转换:
如果需要:
For me the most uncomplicated option without big overhead is:
Include:
Convert:
If needed:
字符串到 wstring
wstring 到字符串
String to wstring
wstring to String
如果你有QT并且你懒得实现一个功能和你可以使用的东西
If you have QT and if you are lazy to implement a function and stuff you can use
这是我的超级基本解决方案,可能并不适合所有人。但对很多人都有效。
它需要使用指南支持库。
这是一个相当官方的 C++ 库,由许多 C++ 委员会作者设计:
我的所有功能都允许如果可能的话进行转换。否则抛出异常。
通过使用 gsl::narrow (https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es49-if-you-must-use-a-cast-use-a -命名演员)
Here is my super basic solution that might not work for everyone. But would work for a lot of people.
It requires usage of the Guideline Support Library.
Which is a pretty official C++ library that was designed by many C++ committee authors:
All my function does is allow the conversion if possible. Otherwise throw an exception.
Via the usage of gsl::narrow (https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es49-if-you-must-use-a-cast-use-a-named-cast)
s2ws 方法效果很好。希望有帮助。
method s2ws works well. Hope helps.
根据我自己的测试(在 Windows 8、vs2010 上),mbstowcs 实际上会损坏原始字符串,它仅适用于 ANSI 代码页。如果 MultiByteToWideChar/WideCharToMultiByte 也可能导致字符串损坏 - 但它们倾向于用“?”替换它们不知道的字符问号,但 mbstowcs 在遇到未知字符时往往会停止并在此时剪切字符串。 (我已经在芬兰窗口上测试了越南语字符)。
因此,与模拟 ansi C 函数相比,更喜欢 Multi*-windows api 函数。
另外,我注意到将字符串从一个代码页编码到另一个代码页的最短方法不是使用 MultiByteToWideChar/WideCharToMultiByte api 函数调用,而是使用它们的模拟 ATL 宏:W2A / A2W。
所以上面提到的模拟函数听起来像:
_acp 在 USES_CONVERSION 宏中声明。
或者也是我在执行旧数据转换为新数据时经常错过的函数:
但请注意,这些宏大量使用堆栈 - 不要对同一函数使用 for 循环或递归循环 - 使用 W2A 或 A2W 宏之后 - 最好尽快返回,因此堆栈将免于临时转换。
Based upon my own testing (On windows 8, vs2010) mbstowcs can actually damage original string, it works only with ANSI code page. If MultiByteToWideChar/WideCharToMultiByte can also cause string corruption - but they tends to replace characters which they don't know with '?' question marks, but mbstowcs tends to stop when it encounters unknown character and cut string at that very point. (I have tested Vietnamese characters on finnish windows).
So prefer Multi*-windows api function over analogue ansi C functions.
Also what I've noticed shortest way to encode string from one codepage to another is not use MultiByteToWideChar/WideCharToMultiByte api function calls but their analogue ATL macros: W2A / A2W.
So analogue function as mentioned above would sounds like:
_acp is declared in USES_CONVERSION macro.
Or also function which I often miss when performing old data conversion to new one:
But please notice that those macro's use heavily stack - don't use for loops or recursive loops for same function - after using W2A or A2W macro - better to return ASAP, so stack will be freed from temporary conversion.
utf-8 实现
假设您的
std::string
是 utf8 编码的,这是 wstring-string 转换函数的独立于平台的实现:目前投票最多的 答案 看起来类似,但在非 Windows 平台上对非 BMP 字符(即表情符号
utf-8 implementation
Assuming that your
std::string
is utf8-encoded, this is a platform-independent implementation of wstring-string conversion functions:The currently most upvoted answer looks similar, but produces incorrect results for non-BMP characters (i.e. Emojis ????) on non-Windows platforms.
wchar_t
is UTF-16 on windows, but UTF-32 everywhere else. Thestd::conditional
takes care of that distinction.MSVC Deprecation Warning
On msvc this might generate some deprecation warnings. You can disable these by wrapping the functions in
Johann Gerell's answer explains why it's ok to disable that warning.
Getting utf-8 on msvc
Note that when you write a normal string in your source (like
std::string s = "おはよう";
), it won't be utf-8 encoded per default on msvc. I would strongly recommend setting your msvc character set to utf-8 to address this:https://learn.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-170
std::string -> wchar_t[]
具有安全的mbstowcs_s
函数:这是来自我的示例
std::string -> wchar_t[]
with safembstowcs_s
function:This is from my sample code
使用此代码将字符串转换为 wstring
use this code to convert your string to wstring
string s = "おはよう";
是一个错误。您应该直接使用 wstring:
string s = "おはよう";
is an error.You should use wstring directly: