如何将wstring转换为字符串?
问题是如何将wstring转换为string?
我有下一个示例:
#include <string>
#include <iostream>
int main()
{
std::wstring ws = L"Hello";
std::string s( ws.begin(), ws.end() );
//std::cout <<"std::string = "<<s<<std::endl;
std::wcout<<"std::wstring = "<<ws<<std::endl;
std::cout <<"std::string = "<<s<<std::endl;
}
带有注释行的输出是:
std::string = Hello
std::wstring = Hello
std::string = Hello
但没有注释行的输出只是:
std::wstring = Hello
示例中有什么问题吗?我可以像上面那样进行转换吗?
编辑
新示例(考虑到一些答案)是
#include <string>
#include <iostream>
#include <sstream>
#include <locale>
int main()
{
setlocale(LC_CTYPE, "");
const std::wstring ws = L"Hello";
const std::string s( ws.begin(), ws.end() );
std::cout<<"std::string = "<<s<<std::endl;
std::wcout<<"std::wstring = "<<ws<<std::endl;
std::stringstream ss;
ss << ws.c_str();
std::cout<<"std::stringstream = "<<ss.str()<<std::endl;
}
输出:
std::string = Hello
std::wstring = Hello
std::stringstream = 0x860283c
因此 stringstream 不能用于将 wstring 转换为字符串。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(21)
正如 Cubbi 在其中一条评论中指出的那样,
std::wstring_convert
(C++11) 提供了一个简洁的解决方案(您需要#include
< locale>
和
):在遇到这个问题之前,我结合使用了
wcstombs
和繁琐的内存分配/释放。http://en.cppreference.com/w/cpp/locale/wstring_convert
更新(2013.11.28)
一个衬垫可以这样表述(谢谢 Guss 的评论):
包装函数可以这样表述:(谢谢 ArmanSchwarz 的评论)
注意:存在一些争议关于
string
/wstring
是否应作为引用或文字传递给函数(由于 C++11 和编译器更新)。我会将决定权留给实施人员,但这是值得了解的。注意:我在上面的代码中使用了
std::codecvt_utf8
,但如果您没有使用 UTF-8,则需要将其更改为您正在使用的适当编码:http://en.cppreference.com/w/cpp/header/codecvt
As Cubbi pointed out in one of the comments,
std::wstring_convert
(C++11) provides a neat simple solution (you need to#include
<locale>
and<codecvt>
):I was using a combination of
wcstombs
and tedious allocation/deallocation of memory before I came across this.http://en.cppreference.com/w/cpp/locale/wstring_convert
update(2013.11.28)
One liners can be stated as so (Thank you Guss for your comment):
Wrapper functions can be stated as so: (Thank you ArmanSchwarz for your comment)
Note: there's some controversy on whether
string
/wstring
should be passed in to functions as references or as literals (due to C++11 and compiler updates). I'll leave the decision to the person implementing, but it's worth knowing.Note: I'm using
std::codecvt_utf8
in the above code, but if you're not using UTF-8 you'll need to change that to the appropriate encoding you're using:http://en.cppreference.com/w/cpp/header/codecvt
旧的解决方案来自:http://forums.devshed。 com/c-programming-42/wstring-to-string-444006.html
更新 (2021):但是,至少在更新版本的 MSVC 上,这可能会生成
wchar_t
到char
截断警告。可以通过使用std::transform
而不是在转换函数中进行显式转换来消除警告,例如:或者如果您更喜欢预分配而不使用
back_inserter
:请参阅示例各种编译器此处。
请注意,这里根本没有进行字符集转换。其作用只是将每个迭代的
wchar_t
分配给char
- 截断转换。它使用 std::string c'tor:如评论中所述:
并注意 Win1252 将无法工作。这包括
€
、œ
、ž
、Ÿ
、...An older solution from: http://forums.devshed.com/c-programming-42/wstring-to-string-444006.html
Update (2021): However, at least on more recent versions of MSVC, this may generate a
wchar_t
tochar
truncation warning. The warning can be quieted by usingstd::transform
instead with explicit conversion in the transformation function, e.g.:Or if you prefer to preallocate and not use
back_inserter
:See example on various compilers here.
Beware that there is no character set conversion going on here at all. What this does is simply to assign each iterated
wchar_t
to achar
- a truncating conversion. It uses the std::string c'tor:As stated in comments:
And note that code points in the range
0x80 - 0x9F
in Win1252 will not work. This includes€
,œ
,ž
,Ÿ
, ...这是一个基于其他建议的解决方案:
这通常适用于 Linux,但在 Windows 上会产生问题。
Here is a worked-out solution based on the other suggestions:
This will usually work for Linux, but will create problems on Windows.
默认编码:
我的解决方案步骤,包括空字符 \0 (避免被截断)。不使用 windows.h 标头中的函数:
检查 原始字符串文字。原始字符串后缀。
Linux 代码。使用std::cout直接打印std::string,Linux上默认编码为UTF-8,不需要额外的函数。
在 Windows 上如果您需要打印 unicode。我们可以使用 WriteConsole 从 std::wstring 打印 unicode 字符。
终于在 Windows 上出现了。您需要在控制台中对 unicode 字符提供强大且完整的视图支持。
我推荐 Windows 终端
QA
完整示例代码
Default encoding on:
My solution Steps, includes null chars \0 (avoid truncated). Without using functions on windows.h header:
Check RawString Literals. Raw String Suffix.
Linux Code. Print directly std::string using std::cout, Default Encoding on Linux is UTF-8, no need extra functions.
On Windows if you need to print unicode. We can use WriteConsole for print unicode chars from std::wstring.
Finally on Windows. You need a powerfull and complete view support for unicode chars in console.
I recommend Windows Terminal
QA
Full example code
You cant test this code on https://repl.it/@JomaCorpFX/StringToWideStringToString#main.cpp
**Screenshots**
Using Windows Terminal
Using cmd/powershell
Repl.it capture
如果您确实知道您的字符串是可转换的,则无需包含语言环境和所有这些花哨的东西,只需执行以下操作:
Live example 此处
Instead of including locale and all that fancy stuff, if you know for FACT your string is convertible just do this:
Live example here
如果您正在处理文件路径(就像我经常在发现需要 wstring-to-string 时所做的那样),您可以使用 文件系统::路径 (C++17 起):
If you are dealing with file paths (as I often am when I find the need for wstring-to-string) you can use filesystem::path (since C++17):
我相信官方的方法仍然是通过 codecvt 方面(您需要某种区域设置感知的翻译),就像
这样,我没有可用的工作代码。但我不确定现在有多少人使用这种机器,有多少人只是简单地请求指向内存的指针,然后让 ICU 或其他一些库处理血淋淋的细节。
I believe the official way is still to go thorugh
codecvt
facets (you need some sort of locale-aware translation), as inor something like that, I don't have working code lying around. But I'm not sure how many people these days use that machinery and how many simply ask for pointers to memory and let ICU or some other library handle the gory details.
代码存在两个问题:
const std::string s( ws.begin(), ws.end() ); 中的转换不需要正确映射宽字符到他们狭隘的对手。最有可能的是,每个宽字符只会被类型转换为
char
。这个问题的解决方案已经在 kem 的答案中给出了< /a> 并涉及区域设置的
ctype
方面的narrow
函数。您正在同一程序中将输出写入到
std::cout
和std::wcout
。cout
和wcout
都与同一个流 (stdout
) 关联,并且使用同一个流作为面向字节的流的结果 (未定义(如cout
那样)和宽向流(如wcout
那样)。最好的选择是避免将窄输出和宽输出混合到同一(底层)流。对于
stdout
/cout
/wcout
,您可以尝试在宽窄输出切换时切换stdout
的方向(反之亦然):There are two issues with the code:
The conversion in
const std::string s( ws.begin(), ws.end() );
is not required to correctly map the wide characters to their narrow counterpart. Most likely, each wide character will just be typecast tochar
.The resolution to this problem is already given in the answer by kem and involves the
narrow
function of the locale'sctype
facet.You are writing output to both
std::cout
andstd::wcout
in the same program. Bothcout
andwcout
are associated with the same stream (stdout
) and the results of using the same stream both as a byte-oriented stream (ascout
does) and a wide-oriented stream (aswcout
does) are not defined.The best option is to avoid mixing narrow and wide output to the same (underlying) stream. For
stdout
/cout
/wcout
, you can try switching the orientation ofstdout
when switching between wide and narrow output (or vice versa):除了转换类型之外,您还应该注意字符串的实际格式。
编译多字节字符集时,Visual Studio 和 Win API 假定为 UTF8(实际上 Windows 编码为 Windows-28591 )。
针对 Unicode 字符集进行编译时,Visual studio 和 Win API 假定为 UTF16。
因此,您还必须将字符串从 UTF16 转换为 UTF8 格式,而不仅仅是转换为 std::string。
当使用多字符格式(例如某些非拉丁语言)时,这将变得必要。
这个想法是决定
std::wstring
始终 代表 UTF16。std::string
always 代表UTF8。这不是由编译器强制执行的,它更像是一个好的策略。
请注意我用来定义 UTF16 (L) 和 UTF8 (u8) 的字符串前缀。
要在两种类型之间进行转换,您应该使用: std::codecvt_utf8_utf16< wchar_t>
Besides just converting the types, you should also be conscious about the string's actual format.
When compiling for Multi-byte Character set Visual Studio and the Win API assumes UTF8 (Actually windows encoding which is Windows-28591 ).
When compiling for Unicode Character set Visual studio and the Win API assumes UTF16.
So, you must convert the string from UTF16 to UTF8 format as well, and not just convert to std::string.
This will become necessary when working with multi-character formats like some non-latin languages.
The idea is to decide that
std::wstring
always represents UTF16.And
std::string
always represents UTF8.This isn't enforced by the compiler, it's more of a good policy to have.
Note the string prefixes I use to define UTF16 (L) and UTF8 (u8).
To convert between the 2 types, you should use: std::codecvt_utf8_utf16< wchar_t>
在撰写此答案时,谷歌搜索排名第一的“convert string wstring”会将您带到此页面。我的答案显示了如何将 string 转换为 wstring,尽管这不是实际的问题,我可能应该删除这个答案,但这被认为是不好的形式。 您可能想跳到 此 StackOverflow 答案,现在的排名高于此页面。
这是将字符串、wstring 和混合字符串常量组合到 wstring 的方法。使用 wstringstream 类。
At the time of writing this answer, the number one google search for "convert string wstring" would land you on this page. My answer shows how to convert string to wstring, although this is NOT the actual question, and I should probably delete this answer but that is considered bad form. You may want to jump to this StackOverflow answer, which is now higher ranked than this page.
Here's a way to combining string, wstring and mixed string constants to wstring. Use the wstringstream class.
您也可以直接使用 ctype 方面的狭窄方法:
You might as well just use the ctype facet's narrow method directly:
此解决方案受到 dk123 的解决方案 的启发,但使用了与区域设置相关的 codecvt 方面。结果是区域设置编码的字符串而不是 UTF-8(如果未将其设置为区域设置):
我正在搜索它,但找不到它。最后,我发现我可以使用具有正确类型名的
std::use_facet()
函数从std::locale
获取正确的方面。希望这有帮助。This solution is inspired in dk123's solution, but uses a locale dependent codecvt facet. The result is in locale encoded string instead of UTF-8 (if it is not set as locale):
I was searching for it, but I can't find it. Finally I found that I can get the right facet from
std::locale
using thestd::use_facet()
function with the right typename. Hope this helps.我花了很多悲伤的日子试图想出一种方法来为 C++17 做到这一点,它已弃用
code_cvt
方面,这是我通过组合来自几个不同的来源:我在 Windows 10 上测试了这段代码,至少就我的目的而言,它似乎工作得很好。如果这没有考虑到您可能需要处理的一些疯狂的边缘情况,请不要私刑处死我,我相信有更多经验的人可以对此进行改进! :-)
另外,请注明应有的位置:
适用于 WideToMultiByte()
复制为 multiByteToWide
I spent many sad days trying to come up with a way to do this for C++17, which deprecated
code_cvt
facets, and this is the best I was able to come up with by combining code from a few different sources:I tested this code on Windows 10, and at least for my purposes, it seems to work fine. Please don't lynch me if this doesn't consider some crazy edge cases that you might need to handle, I'm sure someone with more experience can improve on this! :-)
Also, credit where it's due:
Adapted for wideToMultiByte()
Copied for multiByteToWide
就我而言,我必须使用多字节字符(MBCS),并且我想使用 std::string 和 std::wstring。并且不能使用c++11。所以我使用 mbstowcs 和 wcstombs。
我使用 new、delete [] 制作相同的功能,但它比这个慢。
这可以帮助如何:在各种字符串类型之间进行转换
编辑
但是,如果转换为 wstring 并且源字符串不是字母和多字节字符串,则它不起作用。
所以我将 wcstombs 更改为 WideCharToMultiByte。
编辑使用“MultiByteToWideChar”而不是“wcstombs”
In my case, I have to use multibyte character (MBCS), and I want to use std::string and std::wstring. And can't use c++11. So I use mbstowcs and wcstombs.
I make same function with using new, delete [], but it is slower then this.
This can help How to: Convert Between Various String Types
EDIT
However, in case of converting to wstring and source string is no alphabet and multi byte string, it's not working.
So I change wcstombs to WideCharToMultiByte.
EDIT to use 'MultiByteToWideChar' instead of 'wcstombs'
如果其他人感兴趣:我需要一个可以在需要
string
或wstring
的地方互换使用的类。以下类convertible_string
基于 dk123 的解决方案,可以使用进行初始化string
、char const*
、wstring
或wchar_t const*
可以通过指定或隐式转换为string
或wstring
(因此可以传递到采用任一函数的函数中)。In case anyone else is interested: I needed a class that could be used interchangeably wherever either a
string
orwstring
was expected. The following classconvertible_string
, based on dk123's solution, can be initialized with either astring
,char const*
,wstring
orwchar_t const*
and can be assigned to by or implicitly converted to either astring
orwstring
(so can be passed into a functions that take either).我正在使用下面的方法将 wstring 转换为字符串。
I am using below to convert wstring to string.
来源:https://msdn.microsoft.com/en-us/library/ 87zae4a3.aspx
char
字符串与wchar_t
字符串之间的转换是 Windows 上的一个典型问题。我想不出这个在 Linux 中的用例。wchar_t
类型在 Windows 上的长度为 2 个字节,在 Linux 上的长度为 4 个字节。从 C++20 开始,存在具有相应位数的类型char8_t
、char16_t
和char32_t
。因此,在新项目中,您应该对 UTF-8 使用char8_t
,对 UTF-16 使用char16_t
,对 UTF-32 使用char32_t
,请参阅 https://learn.microsoft .com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170。在 Windows 上,几乎所有经典 API 函数都使用类型
wchar_t
,它在 Windows 上对应于现代类型char16_t
。因此,如果要将char
字符串的值插入 API 函数或,相反,如果 API 函数创建的wchar_t
字符串要转换为char
字符串。由于两种字符串类型之间的转换是典型的 Windows 问题,因此也应该使用 Windows 函数来实现此目的。 Windows SDK 提供了
WideCharToMultiByte()
函数,用于使用特定代码页将wchar_t
转换为char
字符串。 Windows还提供MultiByteToWideChar()
函数进行反向转换。如果您指定CP_UTF8
作为代码页,这些函数将在 Unicode 格式 UTF-16 和 UTF-8 之间进行转换。这两个函数都非常不方便。因此,ATL 提供了两个模板类来包装这些函数以简化转换。您只需要标头和,不需要加载任何库。
CW2A
是类模板CW2AEX
的typedef
,它包装了WideCharToMultiByte()
函数。同样,CA2W
是类模板CA2WEX
的typedef
,它包装了函数MultiByteToWideChar()
。这些类的实例具有属性m_psz
,其类型为char*
或wchar_t
。在下面的示例中,我从
const char*
类型的 UTF-8 字符串开始,其中包含中文字符和笑脸。使用CA2W
将 char 字符串转换为wchar_t
字符串,以便我可以使用 Windows 函数MessageBoxW()
。然后使用 CW2A 将wchar_t
字符串转换回char
字符串。确保将 CP_UTF8 指定为两个类的构造函数的第二个参数,否则 ATL 将使用当前的 ANSI 代码页。最后一条语句确认新字符串和原始字符串具有相同的内容。Source: https://msdn.microsoft.com/en-us/library/87zae4a3.aspx
The conversion of
char
strings towchar_t
strings and vice versa is a typical problem on Windows. I can't think of a use case for this in Linux. The typewchar_t
has a length of 2 bytes on Windows and a length of 4 bytes on Linux. Since C++20 there are the typeschar8_t
,char16_t
andchar32_t
with the corresponding number of bits. In new projects you should therefore usechar8_t
for UTF-8,char16_t
for UTF-16 andchar32_t
for UTF-32, see https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170.On Windows, almost all classic API functions use the type
wchar_t
, which corresponds on Windows to the modern typechar16_t
. The conversion fromchar
towchar_t
and vice versa is therefore always necessary if the value of achar
string is to be inserted into an API function or, conversely, if thewchar_t
string created by an API function is to be converted into achar
string.As the conversion between the two string types is a typical Windows problem, Windows functions should also be used for this. The Windows SDK offers the
WideCharToMultiByte()
function for converting awchar_t
into achar
string using a specific code page. Windows also provides theMultiByteToWideChar()
function for reverse conversion. If you specifyCP_UTF8
as the code page, these functions convert between the Unicode formats UTF-16 and UTF-8. Both functions are very unwieldy.ATL therefore provides two template classes that wrap these functions to simplify the conversions. You only need the headers <atlconv.h> and <atlstr.h>, no library needs to be loaded.
CW2A
is atypedef
for the class templateCW2AEX
, which wraps theWideCharToMultiByte()
function. Similarly,CA2W
is atypedef
for the class templateCA2WEX
, which wraps the functionMultiByteToWideChar()
. The instances of these classes have the attributem_psz
, which is of typechar*
orwchar_t
.In the following example, I start with a UTF-8 character string of type
const char*
, which contains Chinese characters and a smiley. The char string is converted to awchar_t
string withCA2W
so that I can use the Windows functionMessageBoxW()
. Thewchar_t
string is then converted back to achar
string usingCW2A
. Make sure that you specifyCP_UTF8
as the second parameter of the constructor for both classes, otherwise ATL will use the current ANSI code page. The last statement confirms that the new and the original string have the same content.尽管 sus,使用 std::string s( WideString.begin(), WideString.end() ) 确实有效,但会通过直接转换完全截断宽字符。
该方法生成 2 个 C++ 警告,其中一个在 MSVC++ 中出现非常大:
1>(编译源文件'Filename.cpp')
1>C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xutility(4537,18):
1>模板实例化上下文(最旧的第一个)是......
要抑制这种情况,您不必使用
std::transform
或任何东西重写它,您可以只需使用#pragma warning(suppress:4244 4365)
,仅针对紧随其后的行抑制这些警告输出:
我应该在这里注意这不是转换宽字符串的正确方法。它会丢失信息,这就是那些 C++ 警告的含义。如果您想保留宽字符的信息,同时让任何 ANSI/英语字符仅由 1 个字节表示,那么您可以使用 UTF8 编码。在 Windows 上,有一对非常简单的函数可以让您执行转换此处描述
使用该函数的示例代码:
输出如下:
在 Linux/使用
上,这些函数将是:Although sus, using
std::string s( wideString.begin(), wideString.end() )
does work, but completely truncates the wide characters with a straight cast.That method generates 2 C++ warnings, one of which has a very large spew in MSVC++:
To suppress that, you don't have to rewrite it using
std::transform
or anything, you can just use#pragma warning( suppress : 4244 4365 )
, which suppresses those warnings only for the line that immediately follows itOutput:
I should note here that this is not the correct way to convert wide strings. It loses information and that's what those C++ warnings are about. If you want to retain the information of the wide characters, while having any ANSI/English characters represented by 1 byte only, then you can use the UTF8 encoding. On Windows, there's a really simple pair of functions that lets you do that transformation described here
Sample code using that function:
Outputs like:
On Linux/using
<codecvt>
, these functions would be: