如何将wstring转换为字符串?

发布于 2024-10-14 08:31:23 字数 1400 浏览 6 评论 0 原文

问题是如何将wstring转换为string?

我有下一个示例:

#include <string>
#include <iostream>

int main()
{
    std::wstring ws = L"Hello";
    std::string s( ws.begin(), ws.end() );

  //std::cout <<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;
    std::cout <<"std::string =     "<<s<<std::endl;
}

带有注释行的输出是:

std::string =     Hello
std::wstring =    Hello
std::string =     Hello

但没有注释行的输出只是:

std::wstring =    Hello

示例中有什么问题吗?我可以像上面那样进行转换吗?

编辑

新示例(考虑到一些答案)是

#include <string>
#include <iostream>
#include <sstream>
#include <locale>

int main()
{
    setlocale(LC_CTYPE, "");

    const std::wstring ws = L"Hello";
    const std::string s( ws.begin(), ws.end() );

    std::cout<<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;

    std::stringstream ss;
    ss << ws.c_str();
    std::cout<<"std::stringstream =     "<<ss.str()<<std::endl;
}

输出:

std::string =     Hello
std::wstring =    Hello
std::stringstream =     0x860283c

因此 stringstream 不能用于将 wstring 转换为字符串。

The question is how to convert wstring to string?

I have next example :

#include <string>
#include <iostream>

int main()
{
    std::wstring ws = L"Hello";
    std::string s( ws.begin(), ws.end() );

  //std::cout <<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;
    std::cout <<"std::string =     "<<s<<std::endl;
}

the output with commented out line is :

std::string =     Hello
std::wstring =    Hello
std::string =     Hello

but without is only :

std::wstring =    Hello

Is anything wrong in the example? Can I do the conversion like above?

EDIT

New example (taking into account some answers) is

#include <string>
#include <iostream>
#include <sstream>
#include <locale>

int main()
{
    setlocale(LC_CTYPE, "");

    const std::wstring ws = L"Hello";
    const std::string s( ws.begin(), ws.end() );

    std::cout<<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;

    std::stringstream ss;
    ss << ws.c_str();
    std::cout<<"std::stringstream =     "<<ss.str()<<std::endl;
}

The output is :

std::string =     Hello
std::wstring =    Hello
std::stringstream =     0x860283c

therefore the stringstream can not be used to convert wstring into string.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(21

枕梦 2024-10-21 08:31:23

正如 Cubbi 在其中一条评论中指出的那样,std::wstring_convert (C++11) 提供了一个简洁的解决方案(您需要 #include < locale>):

std::wstring string_to_convert;

//setup converter
using convert_type = std::codecvt_utf8<wchar_t>;
std::wstring_convert<convert_type, wchar_t> converter;

//use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
std::string converted_str = converter.to_bytes( string_to_convert );

在遇到这个问题之前,我结合使用了 wcstombs 和繁琐的内​​存分配/释放。

http://en.cppreference.com/w/cpp/locale/wstring_convert

更新(2013.11.28)

一个衬垫可以这样表述(谢谢 Guss 的评论):

std::wstring str = std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes("some string");

包装函数可以这样表述:(谢谢 ArmanSchwarz 的评论)

std::wstring s2ws(const std::string& str)
{
    using convert_typeX = std::codecvt_utf8<wchar_t>;
    std::wstring_convert<convert_typeX, wchar_t> converterX;

    return converterX.from_bytes(str);
}

std::string ws2s(const std::wstring& wstr)
{
    using convert_typeX = std::codecvt_utf8<wchar_t>;
    std::wstring_convert<convert_typeX, wchar_t> converterX;

    return converterX.to_bytes(wstr);
}

注意:存在一些争议关于 string/wstring 是否应作为引用或文字传递给函数(由于 C++11 和编译器更新)。我会将决定权留给实施人员,但这是值得了解的。

注意:我在上面的代码中使用了 std::codecvt_utf8,但如果您没有使用 UTF-8,则需要将其更改为您正在使用的适当编码:

http://en.cppreference.com/w/cpp/header/codecvt

As Cubbi pointed out in one of the comments, std::wstring_convert (C++11) provides a neat simple solution (you need to #include <locale> and <codecvt>):

std::wstring string_to_convert;

//setup converter
using convert_type = std::codecvt_utf8<wchar_t>;
std::wstring_convert<convert_type, wchar_t> converter;

//use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
std::string converted_str = converter.to_bytes( string_to_convert );

I was using a combination of wcstombs and tedious allocation/deallocation of memory before I came across this.

http://en.cppreference.com/w/cpp/locale/wstring_convert

update(2013.11.28)

One liners can be stated as so (Thank you Guss for your comment):

std::wstring str = std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes("some string");

Wrapper functions can be stated as so: (Thank you ArmanSchwarz for your comment)

std::wstring s2ws(const std::string& str)
{
    using convert_typeX = std::codecvt_utf8<wchar_t>;
    std::wstring_convert<convert_typeX, wchar_t> converterX;

    return converterX.from_bytes(str);
}

std::string ws2s(const std::wstring& wstr)
{
    using convert_typeX = std::codecvt_utf8<wchar_t>;
    std::wstring_convert<convert_typeX, wchar_t> converterX;

    return converterX.to_bytes(wstr);
}

Note: there's some controversy on whether string/wstring should be passed in to functions as references or as literals (due to C++11 and compiler updates). I'll leave the decision to the person implementing, but it's worth knowing.

Note: I'm using std::codecvt_utf8 in the above code, but if you're not using UTF-8 you'll need to change that to the appropriate encoding you're using:

http://en.cppreference.com/w/cpp/header/codecvt

毅然前行 2024-10-21 08:31:23

旧的解决方案来自:http://forums.devshed。 com/c-programming-42/wstring-to-string-444006.html

std::wstring wide( L"Wide" ); 
std::string str( wide.begin(), wide.end() );

// Will print no problemo!
std::cout << str << std::endl;

更新 (2021):但是,至少在更新版本的 MSVC 上,这可能会生成 wchar_tchar 截断警告。可以通过使用 std::transform 而不是在转换函数中进行显式转换来消除警告,例如:

std::wstring wide( L"Wide" );

std::string str;
std::transform(wide.begin(), wide.end(), std::back_inserter(str), [] (wchar_t c) {
    return (char)c;
});

或者如果您更喜欢预分配而不使用 back_inserter

std::string str(wide.length(), 0);
std::transform(wide.begin(), wide.end(), str.begin(), [] (wchar_t c) {
    return (char)c;
});

请参阅示例各种编译器此处


请注意,这里根本没有进行字符集转换。其作用只是将每个迭代的 wchar_t 分配给 char - 截断转换。它使用 std::string c'tor

template< class InputIt >
basic_string( InputIt first, InputIt last,
              const Allocator& alloc = Allocator() );

如评论中所述:

值 0-127 在几乎每种编码中都是相同的,因此截断
全部小于 127 的值会产生相同的文本。放入一个
汉字,你就会看到失败。

Windows 代码页 1252 的值 128-255(Windows 英语
默认)和 unicode 的值 128-255 大部分相同,所以如果
这就是您正在使用的代码页,其中大部分字符应该是
截断为正确的值。 (我完全希望 á 和 õ 能够工作,
我知道我们的工作代码依赖于 é,我很快就会修复它)

并注意 Win1252无法工作。这包括 œžŸ、...

An older solution from: http://forums.devshed.com/c-programming-42/wstring-to-string-444006.html

std::wstring wide( L"Wide" ); 
std::string str( wide.begin(), wide.end() );

// Will print no problemo!
std::cout << str << std::endl;

Update (2021): However, at least on more recent versions of MSVC, this may generate a wchar_t to char truncation warning. The warning can be quieted by using std::transform instead with explicit conversion in the transformation function, e.g.:

std::wstring wide( L"Wide" );

std::string str;
std::transform(wide.begin(), wide.end(), std::back_inserter(str), [] (wchar_t c) {
    return (char)c;
});

Or if you prefer to preallocate and not use back_inserter:

std::string str(wide.length(), 0);
std::transform(wide.begin(), wide.end(), str.begin(), [] (wchar_t c) {
    return (char)c;
});

See example on various compilers here.


Beware that there is no character set conversion going on here at all. What this does is simply to assign each iterated wchar_t to a char - a truncating conversion. It uses the std::string c'tor:

template< class InputIt >
basic_string( InputIt first, InputIt last,
              const Allocator& alloc = Allocator() );

As stated in comments:

values 0-127 are identical in virtually every encoding, so truncating
values that are all less than 127 results in the same text. Put in a
chinese character and you'll see the failure.

the values 128-255 of windows codepage 1252 (the Windows English
default) and the values 128-255 of unicode are mostly the same, so if
that's teh codepage you're using most of those characters should be
truncated to the correct values. (I totally expected á and õ to work,
I know our code at work relies on this for é, which I will soon fix)

And note that code points in the range 0x80 - 0x9F in Win1252 will not work. This includes , œ, ž, Ÿ, ...

逆夏时光 2024-10-21 08:31:23

这是一个基于其他建议的解决方案:

#include <string>
#include <iostream>
#include <clocale>
#include <locale>
#include <vector>

int main() {
  std::setlocale(LC_ALL, "");
  const std::wstring ws = L"ħëłlö";
  const std::locale locale("");
  typedef std::codecvt<wchar_t, char, std::mbstate_t> converter_type;
  const converter_type& converter = std::use_facet<converter_type>(locale);
  std::vector<char> to(ws.length() * converter.max_length());
  std::mbstate_t state;
  const wchar_t* from_next;
  char* to_next;
  const converter_type::result result = converter.out(state, ws.data(), ws.data() + ws.length(), from_next, &to[0], &to[0] + to.size(), to_next);
  if (result == converter_type::ok or result == converter_type::noconv) {
    const std::string s(&to[0], to_next);
    std::cout <<"std::string =     "<<s<<std::endl;
  }
}

这通常适用于 Linux,但在 Windows 上会产生问题。

Here is a worked-out solution based on the other suggestions:

#include <string>
#include <iostream>
#include <clocale>
#include <locale>
#include <vector>

int main() {
  std::setlocale(LC_ALL, "");
  const std::wstring ws = L"ħëłlö";
  const std::locale locale("");
  typedef std::codecvt<wchar_t, char, std::mbstate_t> converter_type;
  const converter_type& converter = std::use_facet<converter_type>(locale);
  std::vector<char> to(ws.length() * converter.max_length());
  std::mbstate_t state;
  const wchar_t* from_next;
  char* to_next;
  const converter_type::result result = converter.out(state, ws.data(), ws.data() + ws.length(), from_next, &to[0], &to[0] + to.size(), to_next);
  if (result == converter_type::ok or result == converter_type::noconv) {
    const std::string s(&to[0], to_next);
    std::cout <<"std::string =     "<<s<<std::endl;
  }
}

This will usually work for Linux, but will create problems on Windows.

一城柳絮吹成雪 2024-10-21 08:31:23

默认编码:

  • Windows UTF-16。
  • Linux UTF-8。
  • MacOS UTF-8。

我的解决方案步骤,包括空字符 \0 (避免被截断)。不使用 windows.h 标头中的函数:

  1. 添加宏来检测平台。
    Windows/Linux 及其他
  • 创建函数将 std::wstring 转换为 std::string 以及将 std::string 逆向 std::wstring
  • 创建打印函数
  • 打印 std::string/ std::wstring
  • 检查 原始字符串文字。原始字符串后缀。

    Linux 代码。使用std::cout直接打印std::string,Linux上默认编码为UTF-8,不需要额外的函数。

    在 Windows 上如果您需要打印 unicode。我们可以使用 WriteConsole 从 std::wstring 打印 unicode 字符。

    终于在 Windows 上出现了。您需要在控制台中对 unicode 字符提供强大且完整的视图支持。
    我推荐 Windows 终端

    QA

    • 在 Microsoft Visual Studio 2019 上使用 VC++ 进行测试; std=c++17。 (Windows 项目)
    • 使用 Clang 编译器在 repl.it 上进行测试; std=c++17。

    为什么不使用 标头函数和类?
    A. 弃用删除或弃用的功能不可能在 VC++ 上构建,但在 g++ 上没有问题。我更喜欢 0 警告和头痛。

    std ::wstring 是跨平台的吗?
    A. 不。std::wstring 使用 wchar_t 元素。在 Windows 上,wchar_t 大小为 2 个字节,每个字符都存储在 UTF-16 单元中,如果字符大于 U+FFFF,则该字符以两个 UTF-16 单元(2 个 wchar_t 元素)表示,称为代理对。在 Linux 上,wchar_t 大小为 4 个字节,每个字符存储在一个 wchar_t 元素中,不需要代理对。检查 标准数据UNIX、Linux 和 Windows 上的类型

    std ::string 是跨平台的吗?
    答。是的。 std::string 使用 char 元素。在大多数编译器中,保证 char 类型具有相同的字节大小。 char 类型大小为 1 个字节。检查 标准数据UNIX、Linux 和 Windows 上的类型

    完整示例代码


    #include <iostream>
    #include <set>
    #include <string>
    #include <locale>

    // WINDOWS
    #if (_WIN32)
    #include <Windows.h>
    #include <conio.h>
    #define WINDOWS_PLATFORM 1
    #define DLLCALL STDCALL
    #define DLLIMPORT _declspec(dllimport)
    #define DLLEXPORT _declspec(dllexport)
    #define DLLPRIVATE
    #define NOMINMAX

    //EMSCRIPTEN
    #elif defined(__EMSCRIPTEN__)
    #include <emscripten/emscripten.h>
    #include <emscripten/bind.h>
    #include <unistd.h>
    #include <termios.h>
    #define EMSCRIPTEN_PLATFORM 1
    #define DLLCALL
    #define DLLIMPORT
    #define DLLEXPORT __attribute__((visibility("default")))
    #define DLLPRIVATE __attribute__((visibility("hidden")))

    // LINUX - Ubuntu, Fedora, , Centos, Debian, RedHat
    #elif (__LINUX__ || __gnu_linux__ || __linux__ || __linux || linux)
    #define LINUX_PLATFORM 1
    #include <unistd.h>
    #include <termios.h>
    #define DLLCALL CDECL
    #define DLLIMPORT
    #define DLLEXPORT __attribute__((visibility("default")))
    #define DLLPRIVATE __attribute__((visibility("hidden")))
    #define CoTaskMemAlloc(p) malloc(p)
    #define CoTaskMemFree(p) free(p)

    //ANDROID
    #elif (__ANDROID__ || ANDROID)
    #define ANDROID_PLATFORM 1
    #define DLLCALL
    #define DLLIMPORT
    #define DLLEXPORT __attribute__((visibility("default")))
    #define DLLPRIVATE __attribute__((visibility("hidden")))

    //MACOS
    #elif defined(__APPLE__)
    #include <unistd.h>
    #include <termios.h>
    #define DLLCALL
    #define DLLIMPORT
    #define DLLEXPORT __attribute__((visibility("default")))
    #define DLLPRIVATE __attribute__((visibility("hidden")))
    #include "TargetConditionals.h"
    #if TARGET_OS_IPHONE && TARGET_IPHONE_SIMULATOR
    #define IOS_SIMULATOR_PLATFORM 1
    #elif TARGET_OS_IPHONE
    #define IOS_PLATFORM 1
    #elif TARGET_OS_MAC
    #define MACOS_PLATFORM 1
    #else

    #endif

    #endif

    typedef std::string String;
    typedef std::wstring WString;

    #define EMPTY_STRING u8""s
    #define EMPTY_WSTRING L""s

    using namespace std::literals::string_literals;

    class Strings
    {
    public:
    static String WideStringToString(const WString& wstr)
    {
    if (wstr.empty())
    {
    return String();
    }
    size_t pos;
    size_t begin = 0;
    String ret;

    #if WINDOWS_PLATFORM
    int size;
    pos = wstr.find(static_cast<wchar_t>(0), begin);
    while (pos != WString::npos && begin < wstr.length())
    {
    WString segment = WString(&wstr[begin], pos - begin);
    size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
    String converted = String(size, 0);
    WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
    ret.append(converted);
    ret.append({ 0 });
    begin = pos + 1;
    pos = wstr.find(static_cast<wchar_t>(0), begin);
    }
    if (begin <= wstr.length())
    {
    WString segment = WString(&wstr[begin], wstr.length() - begin);
    size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
    String converted = String(size, 0);
    WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
    ret.append(converted);
    }
    #elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
    size_t size;
    pos = wstr.find(static_cast<wchar_t>(0), begin);
    while (pos != WString::npos && begin < wstr.length())
    {
    WString segment = WString(&wstr[begin], pos - begin);
    size = wcstombs(nullptr, segment.c_str(), 0);
    String converted = String(size, 0);
    wcstombs(&converted[0], segment.c_str(), converted.size());
    ret.append(converted);
    ret.append({ 0 });
    begin = pos + 1;
    pos = wstr.find(static_cast<wchar_t>(0), begin);
    }
    if (begin <= wstr.length())
    {
    WString segment = WString(&wstr[begin], wstr.length() - begin);
    size = wcstombs(nullptr, segment.c_str(), 0);
    String converted = String(size, 0);
    wcstombs(&converted[0], segment.c_str(), converted.size());
    ret.append(converted);
    }
    #else
    static_assert(false, "Unknown Platform");
    #endif
    return ret;
    }

    static WString StringToWideString(const String& str)
    {
    if (str.empty())
    {
    return WString();
    }

    size_t pos;
    size_t begin = 0;
    WString ret;
    #ifdef WINDOWS_PLATFORM
    int size = 0;
    pos = str.find(static_cast<char>(0), begin);
    while (pos != std::string::npos) {
    std::string segment = std::string(&str[begin], pos - begin);
    std::wstring converted = std::wstring(segment.size() + 1, 0);
    size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.length());
    converted.resize(size);
    ret.append(converted);
    ret.append({ 0 });
    begin = pos + 1;
    pos = str.find(static_cast<char>(0), begin);
    }
    if (begin < str.length()) {
    std::string segment = std::string(&str[begin], str.length() - begin);
    std::wstring converted = std::wstring(segment.size() + 1, 0);
    size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, segment.c_str(), segment.size(), &converted[0], converted.length());
    converted.resize(size);
    ret.append(converted);
    }

    #elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
    size_t size;
    pos = str.find(static_cast<char>(0), begin);
    while (pos != String::npos)
    {
    String segment = String(&str[begin], pos - begin);
    WString converted = WString(segment.size(), 0);
    size = mbstowcs(&converted[0], &segment[0], converted.size());
    converted.resize(size);
    ret.append(converted);
    ret.append({ 0 });
    begin = pos + 1;
    pos = str.find(static_cast<char>(0), begin);
    }
    if (begin < str.length())
    {
    String segment = String(&str[begin], str.length() - begin);
    WString converted = WString(segment.size(), 0);
    size = mbstowcs(&converted[0], &segment[0], converted.size());
    converted.resize(size);
    ret.append(converted);
    }
    #else
    static_assert(false, "Unknown Platform");
    #endif
    return ret;
    }
    };

    enum class ConsoleTextStyle
    {
    DEFAULT = 0,
    BOLD = 1,
    FAINT = 2,
    ITALIC = 3,
    UNDERLINE = 4,
    SLOW_BLINK = 5,
    RAPID_BLINK = 6,
    REVERSE = 7,
    };

    enum class ConsoleForeground
    {
    DEFAULT = 39,
    BLACK = 30,
    DARK_RED = 31,
    DARK_GREEN = 32,
    DARK_YELLOW = 33,
    DARK_BLUE = 34,
    DARK_MAGENTA = 35,
    DARK_CYAN = 36,
    GRAY = 37,
    DARK_GRAY = 90,
    RED = 91,
    GREEN = 92,
    YELLOW = 93,
    BLUE = 94,
    MAGENTA = 95,
    CYAN = 96,
    WHITE = 97
    };

    enum class ConsoleBackground
    {
    DEFAULT = 49,
    BLACK = 40,
    DARK_RED = 41,
    DARK_GREEN = 42,
    DARK_YELLOW = 43,
    DARK_BLUE = 44,
    DARK_MAGENTA = 45,
    DARK_CYAN = 46,
    GRAY = 47,
    DARK_GRAY = 100,
    RED = 101,
    GREEN = 102,
    YELLOW = 103,
    BLUE = 104,
    MAGENTA = 105,
    CYAN = 106,
    WHITE = 107
    };

    class Console
    {
    private:
    static void EnableVirtualTermimalProcessing()
    {
    #if defined WINDOWS_PLATFORM
    HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
    DWORD dwMode = 0;
    GetConsoleMode(hOut, &dwMode);
    if (!(dwMode & ENABLE_VIRTUAL_TERMINAL_PROCESSING))
    {
    dwMode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING;
    SetConsoleMode(hOut, dwMode);
    }
    #endif
    }

    static void ResetTerminalFormat()
    {
    std::cout << u8"\033[0m";
    }

    static void SetVirtualTerminalFormat(ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
    {
    String format = u8"\033[";
    format.append(std::to_string(static_cast<int>(foreground)));
    format.append(u8";");
    format.append(std::to_string(static_cast<int>(background)));
    if (styles.size() > 0)
    {
    for (auto it = styles.begin(); it != styles.end(); ++it)
    {
    format.append(u8";");
    format.append(std::to_string(static_cast<int>(*it)));
    }
    }
    format.append(u8"m");
    std::cout << format;
    }
    public:
    static void Clear()
    {

    #ifdef WINDOWS_PLATFORM
    std::system(u8"cls");
    #elif LINUX_PLATFORM || defined MACOS_PLATFORM
    std::system(u8"clear");
    #elif EMSCRIPTEN_PLATFORM
    emscripten::val::global()["console"].call<void>(u8"clear");
    #else
    static_assert(false, "Unknown Platform");
    #endif
    }

    static void Write(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
    #ifndef EMSCRIPTEN_PLATFORM
    EnableVirtualTermimalProcessing();
    SetVirtualTerminalFormat(foreground, background, styles);
    #endif
    String str = s;
    #ifdef WINDOWS_PLATFORM
    WString unicode = Strings::StringToWideString(str);
    WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
    #elif defined LINUX_PLATFORM || defined MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
    std::cout << str;
    #else
    static_assert(false, "Unknown Platform");
    #endif

    #ifndef EMSCRIPTEN_PLATFORM
    ResetTerminalFormat();
    #endif
    }

    static void WriteLine(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
    Write(s, foreground, background, styles);
    std::cout << std::endl;
    }

    static void Write(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
    #ifndef EMSCRIPTEN_PLATFORM
    EnableVirtualTermimalProcessing();
    SetVirtualTerminalFormat(foreground, background, styles);
    #endif
    WString str = s;

    #ifdef WINDOWS_PLATFORM
    WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), str.c_str(), static_cast<DWORD>(str.length()), nullptr, nullptr);
    #elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
    std::cout << Strings::WideStringToString(str);
    #else
    static_assert(false, "Unknown Platform");
    #endif

    #ifndef EMSCRIPTEN_PLATFORM
    ResetTerminalFormat();
    #endif
    }

    static void WriteLine(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
    Write(s, foreground, background, styles);
    std::cout << std::endl;
    }

    static void WriteLine()
    {
    std::cout << std::endl;
    }

    static void Pause()
    {
    char c;
    do
    {
    c = getchar();
    std::cout << "Press Key " << std::endl;
    } while (c != 64);
    std::cout << "KeyPressed" << std::endl;
    }

    static int PauseAny(bool printWhenPressed = false, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
    int ch;
    #ifdef WINDOWS_PLATFORM
    ch = _getch();
    #elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
    struct termios oldt, newt;
    tcgetattr(STDIN_FILENO, &oldt);
    newt = oldt;
    newt.c_lflag &= ~(ICANON | ECHO);
    tcsetattr(STDIN_FILENO, TCSANOW, &newt);
    ch = getchar();
    tcsetattr(STDIN_FILENO, TCSANOW, &oldt);
    #else
    static_assert(false, "Unknown Platform");
    #endif
    if (printWhenPressed)
    {
    Console::Write(String(1, ch), foreground, background, styles);
    }
    return ch;
    }
    };

    int main()
    {
    std::locale::global(std::locale(u8"en_US.UTF8"));
    auto str = u8"

    Default encoding on:

    • Windows UTF-16.
    • Linux UTF-8.
    • MacOS UTF-8.

    My solution Steps, includes null chars \0 (avoid truncated). Without using functions on windows.h header:

    1. Add Macros to detect Platform.
      Windows/Linux and others
    1. Create function to convert std::wstring to std::string and inverse std::string to std::wstring
    1. Create function for print
    1. Print std::string/ std::wstring

    Check RawString Literals. Raw String Suffix.

    Linux Code. Print directly std::string using std::cout, Default Encoding on Linux is UTF-8, no need extra functions.

    On Windows if you need to print unicode. We can use WriteConsole for print unicode chars from std::wstring.

    Finally on Windows. You need a powerfull and complete view support for unicode chars in console.
    I recommend Windows Terminal

    QA

    • Tested on Microsoft Visual Studio 2019 with VC++; std=c++17. (Windows Project)
    • Tested on repl.it using Clang compiler; std=c++17.

    Q. Why you not use <codecvt> header functions and classes?.
    A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.

    Q. std ::wstring is cross platform?
    A. No. std::wstring uses wchar_t elements. On Windows wchar_t size is 2 bytes, each character is stored in UTF-16 units, if character is bigger than U+FFFF, the character is represented in two UTF-16 units(2 wchar_t elements) called surrogate pairs. On Linux wchar_t size is 4 bytes each character is stored in one wchar_t element, no needed surrogate pairs. Check Standard data types on UNIX, Linux, and Windowsl.

    Q. std ::string is cross platform?
    A. Yes. std::string uses char elements. char type is guaranted that is same byte size in most compilers. char type size is 1 byte. Check Standard data types on UNIX, Linux, and Windowsl.

    Full example code

    
    #include <iostream>
    #include <set>
    #include <string>
    #include <locale>
    
    // WINDOWS
    #if (_WIN32)
    #include <Windows.h>
    #include <conio.h>
    #define WINDOWS_PLATFORM 1
    #define DLLCALL STDCALL
    #define DLLIMPORT _declspec(dllimport)
    #define DLLEXPORT _declspec(dllexport)
    #define DLLPRIVATE
    #define NOMINMAX
    
    //EMSCRIPTEN
    #elif defined(__EMSCRIPTEN__)
    #include <emscripten/emscripten.h>
    #include <emscripten/bind.h>
    #include <unistd.h>
    #include <termios.h>
    #define EMSCRIPTEN_PLATFORM 1
    #define DLLCALL
    #define DLLIMPORT
    #define DLLEXPORT __attribute__((visibility("default")))
    #define DLLPRIVATE __attribute__((visibility("hidden")))
    
    // LINUX - Ubuntu, Fedora, , Centos, Debian, RedHat
    #elif (__LINUX__ || __gnu_linux__ || __linux__ || __linux || linux)
    #define LINUX_PLATFORM 1
    #include <unistd.h>
    #include <termios.h>
    #define DLLCALL CDECL
    #define DLLIMPORT
    #define DLLEXPORT __attribute__((visibility("default")))
    #define DLLPRIVATE __attribute__((visibility("hidden")))
    #define CoTaskMemAlloc(p) malloc(p)
    #define CoTaskMemFree(p) free(p)
    
    //ANDROID
    #elif (__ANDROID__ || ANDROID)
    #define ANDROID_PLATFORM 1
    #define DLLCALL
    #define DLLIMPORT
    #define DLLEXPORT __attribute__((visibility("default")))
    #define DLLPRIVATE __attribute__((visibility("hidden")))
    
    //MACOS
    #elif defined(__APPLE__)
    #include <unistd.h>
    #include <termios.h>
    #define DLLCALL
    #define DLLIMPORT
    #define DLLEXPORT __attribute__((visibility("default")))
    #define DLLPRIVATE __attribute__((visibility("hidden")))
    #include "TargetConditionals.h"
    #if TARGET_OS_IPHONE && TARGET_IPHONE_SIMULATOR
    #define IOS_SIMULATOR_PLATFORM 1
    #elif TARGET_OS_IPHONE
    #define IOS_PLATFORM 1
    #elif TARGET_OS_MAC
    #define MACOS_PLATFORM 1
    #else
    
    #endif
    
    #endif
    
    
    
    typedef std::string String;
    typedef std::wstring WString;
    
    #define EMPTY_STRING u8""s
    #define EMPTY_WSTRING L""s
    
    using namespace std::literals::string_literals;
    
    class Strings
    {
    public:
        static String WideStringToString(const WString& wstr)
        {
            if (wstr.empty())
            {
                return String();
            }
            size_t pos;
            size_t begin = 0;
            String ret;
    
    #if WINDOWS_PLATFORM
            int size;
            pos = wstr.find(static_cast<wchar_t>(0), begin);
            while (pos != WString::npos && begin < wstr.length())
            {
                WString segment = WString(&wstr[begin], pos - begin);
                size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
                String converted = String(size, 0);
                WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
                ret.append(converted);
                ret.append({ 0 });
                begin = pos + 1;
                pos = wstr.find(static_cast<wchar_t>(0), begin);
            }
            if (begin <= wstr.length())
            {
                WString segment = WString(&wstr[begin], wstr.length() - begin);
                size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
                String converted = String(size, 0);
                WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
                ret.append(converted);
            }
    #elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
            size_t size;
            pos = wstr.find(static_cast<wchar_t>(0), begin);
            while (pos != WString::npos && begin < wstr.length())
            {
                WString segment = WString(&wstr[begin], pos - begin);
                size = wcstombs(nullptr, segment.c_str(), 0);
                String converted = String(size, 0);
                wcstombs(&converted[0], segment.c_str(), converted.size());
                ret.append(converted);
                ret.append({ 0 });
                begin = pos + 1;
                pos = wstr.find(static_cast<wchar_t>(0), begin);
            }
            if (begin <= wstr.length())
            {
                WString segment = WString(&wstr[begin], wstr.length() - begin);
                size = wcstombs(nullptr, segment.c_str(), 0);
                String converted = String(size, 0);
                wcstombs(&converted[0], segment.c_str(), converted.size());
                ret.append(converted);
            }
    #else
            static_assert(false, "Unknown Platform");
    #endif
            return ret;
        }
    
        static WString StringToWideString(const String& str)
        {
            if (str.empty())
            {
                return WString();
            }
    
            size_t pos;
            size_t begin = 0;
            WString ret;
    #ifdef WINDOWS_PLATFORM
            int size = 0;
            pos = str.find(static_cast<char>(0), begin);
            while (pos != std::string::npos) {
                std::string segment = std::string(&str[begin], pos - begin);
                std::wstring converted = std::wstring(segment.size() + 1, 0);
                size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.length());
                converted.resize(size);
                ret.append(converted);
                ret.append({ 0 });
                begin = pos + 1;
                pos = str.find(static_cast<char>(0), begin);
            }
            if (begin < str.length()) {
                std::string segment = std::string(&str[begin], str.length() - begin);
                std::wstring converted = std::wstring(segment.size() + 1, 0);
                size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, segment.c_str(), segment.size(), &converted[0], converted.length());
                converted.resize(size);
                ret.append(converted);
            }
    
    #elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
            size_t size;
            pos = str.find(static_cast<char>(0), begin);
            while (pos != String::npos)
            {
                String segment = String(&str[begin], pos - begin);
                WString converted = WString(segment.size(), 0);
                size = mbstowcs(&converted[0], &segment[0], converted.size());
                converted.resize(size);
                ret.append(converted);
                ret.append({ 0 });
                begin = pos + 1;
                pos = str.find(static_cast<char>(0), begin);
            }
            if (begin < str.length())
            {
                String segment = String(&str[begin], str.length() - begin);
                WString converted = WString(segment.size(), 0);
                size = mbstowcs(&converted[0], &segment[0], converted.size());
                converted.resize(size);
                ret.append(converted);
            }
    #else
            static_assert(false, "Unknown Platform");
    #endif
            return ret;
        }
    };
    
    enum class ConsoleTextStyle
    {
        DEFAULT = 0,
        BOLD = 1,
        FAINT = 2,
        ITALIC = 3,
        UNDERLINE = 4,
        SLOW_BLINK = 5,
        RAPID_BLINK = 6,
        REVERSE = 7,
    };
    
    enum class ConsoleForeground
    {
        DEFAULT = 39,
        BLACK = 30,
        DARK_RED = 31,
        DARK_GREEN = 32,
        DARK_YELLOW = 33,
        DARK_BLUE = 34,
        DARK_MAGENTA = 35,
        DARK_CYAN = 36,
        GRAY = 37,
        DARK_GRAY = 90,
        RED = 91,
        GREEN = 92,
        YELLOW = 93,
        BLUE = 94,
        MAGENTA = 95,
        CYAN = 96,
        WHITE = 97
    };
    
    enum class ConsoleBackground
    {
        DEFAULT = 49,
        BLACK = 40,
        DARK_RED = 41,
        DARK_GREEN = 42,
        DARK_YELLOW = 43,
        DARK_BLUE = 44,
        DARK_MAGENTA = 45,
        DARK_CYAN = 46,
        GRAY = 47,
        DARK_GRAY = 100,
        RED = 101,
        GREEN = 102,
        YELLOW = 103,
        BLUE = 104,
        MAGENTA = 105,
        CYAN = 106,
        WHITE = 107
    };
    
    class Console
    {
    private:
        static void EnableVirtualTermimalProcessing()
        {
    #if defined WINDOWS_PLATFORM
            HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
            DWORD dwMode = 0;
            GetConsoleMode(hOut, &dwMode);
            if (!(dwMode & ENABLE_VIRTUAL_TERMINAL_PROCESSING))
            {
                dwMode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING;
                SetConsoleMode(hOut, dwMode);
            }
    #endif
        }
    
        static void ResetTerminalFormat()
        {
            std::cout << u8"\033[0m";
        }
    
        static void SetVirtualTerminalFormat(ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
        {
            String format = u8"\033[";
            format.append(std::to_string(static_cast<int>(foreground)));
            format.append(u8";");
            format.append(std::to_string(static_cast<int>(background)));
            if (styles.size() > 0)
            {
                for (auto it = styles.begin(); it != styles.end(); ++it)
                {
                    format.append(u8";");
                    format.append(std::to_string(static_cast<int>(*it)));
                }
            }
            format.append(u8"m");
            std::cout << format;
        }
    public:
        static void Clear()
        {
    
    #ifdef WINDOWS_PLATFORM
            std::system(u8"cls");
    #elif LINUX_PLATFORM || defined MACOS_PLATFORM
            std::system(u8"clear");
    #elif EMSCRIPTEN_PLATFORM
            emscripten::val::global()["console"].call<void>(u8"clear");
    #else
            static_assert(false, "Unknown Platform");
    #endif
        }
    
        static void Write(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
        {
    #ifndef EMSCRIPTEN_PLATFORM
            EnableVirtualTermimalProcessing();
            SetVirtualTerminalFormat(foreground, background, styles);
    #endif
            String str = s;
    #ifdef WINDOWS_PLATFORM
            WString unicode = Strings::StringToWideString(str);
            WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
    #elif defined LINUX_PLATFORM || defined MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
            std::cout << str;
    #else
            static_assert(false, "Unknown Platform");
    #endif
    
    #ifndef EMSCRIPTEN_PLATFORM
            ResetTerminalFormat();
    #endif
        }
    
        static void WriteLine(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
        {
            Write(s, foreground, background, styles);
            std::cout << std::endl;
        }
    
        static void Write(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
        {
    #ifndef EMSCRIPTEN_PLATFORM
            EnableVirtualTermimalProcessing();
            SetVirtualTerminalFormat(foreground, background, styles);
    #endif
            WString str = s;
    
    #ifdef WINDOWS_PLATFORM
            WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), str.c_str(), static_cast<DWORD>(str.length()), nullptr, nullptr);
    #elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
            std::cout << Strings::WideStringToString(str);
    #else
            static_assert(false, "Unknown Platform");
    #endif
    
    #ifndef EMSCRIPTEN_PLATFORM
            ResetTerminalFormat();
    #endif
        }
    
        static void WriteLine(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
        {
            Write(s, foreground, background, styles);
            std::cout << std::endl;
        }
    
        static void WriteLine()
        {
            std::cout << std::endl;
        }
    
        static void Pause()
        {
            char c;
            do
            {
                c = getchar();
                std::cout << "Press Key " << std::endl;
            } while (c != 64);
            std::cout << "KeyPressed" << std::endl;
        }
    
        static int PauseAny(bool printWhenPressed = false, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
        {
            int ch;
    #ifdef WINDOWS_PLATFORM
            ch = _getch();
    #elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
            struct termios oldt, newt;
            tcgetattr(STDIN_FILENO, &oldt);
            newt = oldt;
            newt.c_lflag &= ~(ICANON | ECHO);
            tcsetattr(STDIN_FILENO, TCSANOW, &newt);
            ch = getchar();
            tcsetattr(STDIN_FILENO, TCSANOW, &oldt);
    #else
            static_assert(false, "Unknown Platform");
    #endif
            if (printWhenPressed)
            {
                Console::Write(String(1, ch), foreground, background, styles);
            }
            return ch;
        }
    };
    
    
    
    int main()
    {
        std::locale::global(std::locale(u8"en_US.UTF8"));
        auto str = u8"????\0Hello\0????123456789也不是可运行的程序123456789日本"s;//
        WString wstr = L"????\0Hello\0????123456789也不是可运行的程序123456789日本"s;
        WString wstrResult = Strings::StringToWideString(str);
        String strResult = Strings::WideStringToString(wstr);
        bool equals1 = wstr == wstrResult;
        bool equals2 = str == strResult;
    
        Console::WriteLine(u8"█ Converted Strings printed with Console::WriteLine"s, ConsoleForeground::GREEN);
        Console::WriteLine(wstrResult, ConsoleForeground::BLUE);//Printed OK on Windows/Linux.
        Console::WriteLine(strResult, ConsoleForeground::BLUE);//Printed OK on Windows/Linux.
        
        Console::WriteLine(u8"█ Converted Strings printed with std::cout/std::wcout"s, ConsoleForeground::GREEN);
        std::cout << strResult << std::endl;//Printed OK on Linux. BAD on Windows.
        std::wcout << wstrResult << std::endl; //Printed BAD on Windows/Linux.
        Console::WriteLine();
        Console::WriteLine(u8"Press any key to exit"s, ConsoleForeground::DARK_GRAY);
        Console::PauseAny();
    
    }
    

    You cant test this code on https://repl.it/@JomaCorpFX/StringToWideStringToString#main.cpp


    **Screenshots**

    Using Windows Terminal
    WindowsTerminal

    Using cmd/powershell
    enter image description here

    Repl.it capture
    enter image description here

    囍孤女 2024-10-21 08:31:23

    如果您确实知道您的字符串是可转换的,则无需包含语言环境和所有这些花哨的东西,只需执行以下操作:

    #include <iostream>
    #include <string>
    
    using namespace std;
    
    int main()
    {
      wstring w(L"bla");
      string result;
      for(char x : w)
        result += x;
    
      cout << result << '\n';
    }
    

    Live example 此处

    Instead of including locale and all that fancy stuff, if you know for FACT your string is convertible just do this:

    #include <iostream>
    #include <string>
    
    using namespace std;
    
    int main()
    {
      wstring w(L"bla");
      string result;
      for(char x : w)
        result += x;
    
      cout << result << '\n';
    }
    

    Live example here

    甜是你 2024-10-21 08:31:23

    如果您正在处理文件路径(就像我经常在发现需要 wstring-to-string 时所做的那样),您可以使用 文件系统::路径 (C++17 起):

    #include <filesystem>
    
    const std::wstring wPath = GetPath(); // some function that returns wstring
    const std::string path = std::filesystem::path(wPath).string();
    

    If you are dealing with file paths (as I often am when I find the need for wstring-to-string) you can use filesystem::path (since C++17):

    #include <filesystem>
    
    const std::wstring wPath = GetPath(); // some function that returns wstring
    const std::string path = std::filesystem::path(wPath).string();
    
    似梦非梦 2024-10-21 08:31:23

    我相信官方的方法仍然是通过 codecvt 方面(您需要某种区域设置感知的翻译),就像

    resultCode = use_facet<codecvt<char, wchar_t, ConversionState> >(locale).
      in(stateVar, scratchbuffer, scratchbufferEnd, from, to, toLimit, curPtr);
    

    这样,我没有可用的工作代码。但我不确定现在有多少人使用这种机器,有多少人只是简单地请求指向内存的指针,然后让 ICU 或其他一些库处理血淋淋的细节。

    I believe the official way is still to go thorugh codecvt facets (you need some sort of locale-aware translation), as in

    resultCode = use_facet<codecvt<char, wchar_t, ConversionState> >(locale).
      in(stateVar, scratchbuffer, scratchbufferEnd, from, to, toLimit, curPtr);
    

    or something like that, I don't have working code lying around. But I'm not sure how many people these days use that machinery and how many simply ask for pointers to memory and let ICU or some other library handle the gory details.

    寂寞笑我太脆弱 2024-10-21 08:31:23

    代码存在两个问题:

    1. const std::string s( ws.begin(), ws.end() ); 中的转换不需要正确映射宽字符到他们狭隘的对手。最有可能的是,每个宽字符只会被类型转换为 char
      这个问题的解决方案已经在 kem 的答案中给出了< /a> 并涉及区域设置的 ctype 方面的 narrow 函数。

    2. 您正在同一程序中将输出写入到 std::coutstd::wcoutcoutwcout 都与同一个流 (stdout) 关联,并且使用同一个流作为面向字节的流的结果 (未定义(如 cout 那样)和宽向流(如 wcout 那样)。
      最好的选择是避免将窄输出和宽输出混合到同一(底层)流。对于 stdout/cout/wcout,您可以尝试在宽窄输出切换时切换 stdout 的方向(反之亦然):

      #include ;
      #include ;
      #include ;
      
      int main() {
          std::cout << “窄”<< std::endl;
          fwide(标准输出, 1); // 切换到宽屏
          std::wcout << L“宽”<< std::endl;
          fwide(标准输出,-1); // 切换到窄
          std::cout << “窄”<< std::endl;
          fwide(标准输出, 1); // 切换到宽屏
          std::wcout << L“宽”<< std::endl;
      }
      

    There are two issues with the code:

    1. The conversion in const std::string s( ws.begin(), ws.end() ); is not required to correctly map the wide characters to their narrow counterpart. Most likely, each wide character will just be typecast to char.
      The resolution to this problem is already given in the answer by kem and involves the narrow function of the locale's ctype facet.

    2. You are writing output to both std::cout and std::wcout in the same program. Both cout and wcout are associated with the same stream (stdout) and the results of using the same stream both as a byte-oriented stream (as cout does) and a wide-oriented stream (as wcout does) are not defined.
      The best option is to avoid mixing narrow and wide output to the same (underlying) stream. For stdout/cout/wcout, you can try switching the orientation of stdout when switching between wide and narrow output (or vice versa):

      #include <iostream>
      #include <stdio.h>
      #include <wchar.h>
      
      int main() {
          std::cout << "narrow" << std::endl;
          fwide(stdout, 1); // switch to wide
          std::wcout << L"wide" << std::endl;
          fwide(stdout, -1); // switch to narrow
          std::cout << "narrow" << std::endl;
          fwide(stdout, 1); // switch to wide
          std::wcout << L"wide" << std::endl;
      }
      
    一片旧的回忆 2024-10-21 08:31:23

    除了转换类型之外,您还应该注意字符串的实际格式。

    编译多字节字符集时,Visual Studio 和 Win API 假定为 UTF8(实际上 Windows 编码为 Windows-28591 )。
    针对 Unicode 字符集进行编译时,Visual studio 和 Win API 假定为 UTF16。

    因此,您还必须将字符串从 UTF16 转换为 UTF8 格式,而不仅仅是转换为 std::string。
    当使用多字符格式(例如某些非拉丁语言)时,这将变得必要。

    这个想法是决定 std::wstring 始终 代表 UTF16
    std::string always 代表UTF8

    这不是由编译器强制执行的,它更像是一个好的策略。
    请注意我用来定义 UTF16 (L) 和 UTF8 (u8) 的字符串前缀。

    要在两种类型之间进行转换,您应该使用: std::codecvt_utf8_utf16< wchar_t>

    #include <string>
    
    #include <codecvt>
    
    int main()
    {
    
        std::string original8 = u8"הלו";
    
        std::wstring original16 = L"הלו";
    
        //C++11 format converter
        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
    
        //convert to UTF8 and std::string
        std::string utf8NativeString = convert.to_bytes(original16);
    
        std::wstring utf16NativeString = convert.from_bytes(original8);
    
        assert(utf8NativeString == original8);
        assert(utf16NativeString == original16);
    
        return 0;
    }
    

    Besides just converting the types, you should also be conscious about the string's actual format.

    When compiling for Multi-byte Character set Visual Studio and the Win API assumes UTF8 (Actually windows encoding which is Windows-28591 ).
    When compiling for Unicode Character set Visual studio and the Win API assumes UTF16.

    So, you must convert the string from UTF16 to UTF8 format as well, and not just convert to std::string.
    This will become necessary when working with multi-character formats like some non-latin languages.

    The idea is to decide that std::wstring always represents UTF16.
    And std::string always represents UTF8.

    This isn't enforced by the compiler, it's more of a good policy to have.
    Note the string prefixes I use to define UTF16 (L) and UTF8 (u8).

    To convert between the 2 types, you should use: std::codecvt_utf8_utf16< wchar_t>

    #include <string>
    
    #include <codecvt>
    
    int main()
    {
    
        std::string original8 = u8"הלו";
    
        std::wstring original16 = L"הלו";
    
        //C++11 format converter
        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
    
        //convert to UTF8 and std::string
        std::string utf8NativeString = convert.to_bytes(original16);
    
        std::wstring utf16NativeString = convert.from_bytes(original8);
    
        assert(utf8NativeString == original8);
        assert(utf16NativeString == original16);
    
        return 0;
    }
    
    葬花如无物 2024-10-21 08:31:23

    在撰写此答案时,谷歌搜索排名第一的“convert string wstring”会将您带到此页面。我的答案显示了如何将 string 转换为 wstring,尽管这不是实际的问题,我可能应该删除这个答案,但这被认为是不好的形式。 您可能想跳到 此 StackOverflow 答案,现在的排名高于此页面。


    这是将字符串、wstring 和混合字符串常量组合到 wstring 的方法。使用 wstringstream 类。

    #include <sstream>
    
    std::string narrow = "narrow";
    std::wstring wide = "wide";
    
    std::wstringstream cls;
    cls << " abc " << narrow.c_str() << L" def " << wide.c_str();
    std::wstring total= cls.str();
    

    At the time of writing this answer, the number one google search for "convert string wstring" would land you on this page. My answer shows how to convert string to wstring, although this is NOT the actual question, and I should probably delete this answer but that is considered bad form. You may want to jump to this StackOverflow answer, which is now higher ranked than this page.


    Here's a way to combining string, wstring and mixed string constants to wstring. Use the wstringstream class.

    #include <sstream>
    
    std::string narrow = "narrow";
    std::wstring wide = "wide";
    
    std::wstringstream cls;
    cls << " abc " << narrow.c_str() << L" def " << wide.c_str();
    std::wstring total= cls.str();
    
    江南月 2024-10-21 08:31:23

    您也可以直接使用 ctype 方面的狭窄方法:

    #include <clocale>
    #include <locale>
    #include <string>
    #include <vector>
    
    inline std::string narrow(std::wstring const& text)
    {
        std::locale const loc("");
        wchar_t const* from = text.c_str();
        std::size_t const len = text.size();
        std::vector<char> buffer(len + 1);
        std::use_facet<std::ctype<wchar_t> >(loc).narrow(from, from + len, '_', &buffer[0]);
        return std::string(&buffer[0], &buffer[len]);
    }
    

    You might as well just use the ctype facet's narrow method directly:

    #include <clocale>
    #include <locale>
    #include <string>
    #include <vector>
    
    inline std::string narrow(std::wstring const& text)
    {
        std::locale const loc("");
        wchar_t const* from = text.c_str();
        std::size_t const len = text.size();
        std::vector<char> buffer(len + 1);
        std::use_facet<std::ctype<wchar_t> >(loc).narrow(from, from + len, '_', &buffer[0]);
        return std::string(&buffer[0], &buffer[len]);
    }
    
    温折酒 2024-10-21 08:31:23

    此解决方案受到 dk123 的解决方案 的启发,但使用了与区域设置相关的 codecvt 方面。结果是区域设置编码的字符串而不是 UTF-8(如果未将其设置为区域设置):

    std::string w2s(const std::wstring &var)
    {
       static std::locale loc("");
       auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
       return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).to_bytes(var);
    }
    
    std::wstring s2w(const std::string &var)
    {
       static std::locale loc("");
       auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
       return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).from_bytes(var);
    }
    

    我正在搜索它,但找不到它。最后,我发现我可以使用具有正确类型名的 std::use_facet() 函数从 std::locale 获取正确的方面。希望这有帮助。

    This solution is inspired in dk123's solution, but uses a locale dependent codecvt facet. The result is in locale encoded string instead of UTF-8 (if it is not set as locale):

    std::string w2s(const std::wstring &var)
    {
       static std::locale loc("");
       auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
       return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).to_bytes(var);
    }
    
    std::wstring s2w(const std::string &var)
    {
       static std::locale loc("");
       auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
       return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).from_bytes(var);
    }
    

    I was searching for it, but I can't find it. Finally I found that I can get the right facet from std::locale using the std::use_facet() function with the right typename. Hope this helps.

    凉世弥音 2024-10-21 08:31:23

    我花了很多悲伤的日子试图想出一种方法来为 C++17 做到这一点,它已弃用 code_cvt 方面,这是我通过组合来自几个不同的来源:

    setlocale( LC_ALL, "en_US.UTF-8" ); //Invoked in main()
    
    std::string wideToMultiByte( std::wstring const & wideString )
    {
         std::string ret;
         std::string buff( MB_CUR_MAX, '\0' );
    
         for ( wchar_t const & wc : wideString )
         {
             int mbCharLen = std::wctomb( &buff[ 0 ], wc );
    
             if ( mbCharLen < 1 ) { break; }
    
             for ( int i = 0; i < mbCharLen; ++i ) 
             { 
                 ret += buff[ i ]; 
             }
         }
    
         return ret;
     }
    
     std::wstring multiByteToWide( std::string const & multiByteString )
     {
         std::wstring ws( multiByteString.size(), L' ' );
         ws.resize( 
             std::mbstowcs( &ws[ 0 ], 
                 multiByteString.c_str(), 
                 multiByteString.size() ) );
    
         return ws;
     }
    

    我在 Windows 10 上测试了这段代码,至少就我的目的而言,它似乎工作得很好。如果这没有考虑到您可能需要处理的一些疯狂的边缘情况,请不要私刑处死我,我相信有更多经验的人可以对此进行改进! :-)

    另外,请注明应有的位置:

    适用于 WideToMultiByte()

    复制为 multiByteToWide

    I spent many sad days trying to come up with a way to do this for C++17, which deprecated code_cvt facets, and this is the best I was able to come up with by combining code from a few different sources:

    setlocale( LC_ALL, "en_US.UTF-8" ); //Invoked in main()
    
    std::string wideToMultiByte( std::wstring const & wideString )
    {
         std::string ret;
         std::string buff( MB_CUR_MAX, '\0' );
    
         for ( wchar_t const & wc : wideString )
         {
             int mbCharLen = std::wctomb( &buff[ 0 ], wc );
    
             if ( mbCharLen < 1 ) { break; }
    
             for ( int i = 0; i < mbCharLen; ++i ) 
             { 
                 ret += buff[ i ]; 
             }
         }
    
         return ret;
     }
    
     std::wstring multiByteToWide( std::string const & multiByteString )
     {
         std::wstring ws( multiByteString.size(), L' ' );
         ws.resize( 
             std::mbstowcs( &ws[ 0 ], 
                 multiByteString.c_str(), 
                 multiByteString.size() ) );
    
         return ws;
     }
    

    I tested this code on Windows 10, and at least for my purposes, it seems to work fine. Please don't lynch me if this doesn't consider some crazy edge cases that you might need to handle, I'm sure someone with more experience can improve on this! :-)

    Also, credit where it's due:

    Adapted for wideToMultiByte()

    Copied for multiByteToWide

    伴梦长久 2024-10-21 08:31:23

    就我而言,我必须使用多字节字符(MBCS),并且我想使用 std::string 和 std::wstring。并且不能使用c++11。所以我使用 mbstowcs 和 wcstombs。

    我使用 new、delete [] 制作相同的功能,但它比这个慢。

    这可以帮助如何:在各种字符串类型之间进行转换

    编辑

    但是,如果转换为 wstring 并且源字符串不是字母和多字节字符串,则它不起作用。
    所以我将 wcstombs 更改为 WideCharToMultiByte。

    #include <string>
    
    std::wstring get_wstr_from_sz(const char* psz)
    {
        //I think it's enough to my case
        wchar_t buf[0x400];
        wchar_t *pbuf = buf;
        size_t len = strlen(psz) + 1;
    
        if (len >= sizeof(buf) / sizeof(wchar_t))
        {
            pbuf = L"error";
        }
        else
        {
            size_t converted;
            mbstowcs_s(&converted, buf, psz, _TRUNCATE);
        }
    
        return std::wstring(pbuf);
    }
    
    std::string get_string_from_wsz(const wchar_t* pwsz)
    {
        char buf[0x400];
        char *pbuf = buf;
        size_t len = wcslen(pwsz)*2 + 1;
    
        if (len >= sizeof(buf))
        {
            pbuf = "error";
        }
        else
        {
            size_t converted;
            wcstombs_s(&converted, buf, pwsz, _TRUNCATE);
        }
    
        return std::string(pbuf);
    }
    

    编辑使用“MultiByteToWideChar”而不是“wcstombs”

    #include <Windows.h>
    #include <boost/shared_ptr.hpp>
    #include "string_util.h"
    
    std::wstring get_wstring_from_sz(const char* psz)
    {
        int res;
        wchar_t buf[0x400];
        wchar_t *pbuf = buf;
        boost::shared_ptr<wchar_t[]> shared_pbuf;
    
        res = MultiByteToWideChar(CP_ACP, 0, psz, -1, buf, sizeof(buf)/sizeof(wchar_t));
    
        if (0 == res && GetLastError() == ERROR_INSUFFICIENT_BUFFER)
        {
            res = MultiByteToWideChar(CP_ACP, 0, psz, -1, NULL, 0);
    
            shared_pbuf = boost::shared_ptr<wchar_t[]>(new wchar_t[res]);
    
            pbuf = shared_pbuf.get();
    
            res = MultiByteToWideChar(CP_ACP, 0, psz, -1, pbuf, res);
        }
        else if (0 == res)
        {
            pbuf = L"error";
        }
    
        return std::wstring(pbuf);
    }
    
    std::string get_string_from_wcs(const wchar_t* pcs)
    {
        int res;
        char buf[0x400];
        char* pbuf = buf;
        boost::shared_ptr<char[]> shared_pbuf;
    
        res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, buf, sizeof(buf), NULL, NULL);
    
        if (0 == res && GetLastError() == ERROR_INSUFFICIENT_BUFFER)
        {
            res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, NULL, 0, NULL, NULL);
    
            shared_pbuf = boost::shared_ptr<char[]>(new char[res]);
    
            pbuf = shared_pbuf.get();
    
            res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, pbuf, res, NULL, NULL);
        }
        else if (0 == res)
        {
            pbuf = "error";
        }
    
        return std::string(pbuf);
    }
    

    In my case, I have to use multibyte character (MBCS), and I want to use std::string and std::wstring. And can't use c++11. So I use mbstowcs and wcstombs.

    I make same function with using new, delete [], but it is slower then this.

    This can help How to: Convert Between Various String Types

    EDIT

    However, in case of converting to wstring and source string is no alphabet and multi byte string, it's not working.
    So I change wcstombs to WideCharToMultiByte.

    #include <string>
    
    std::wstring get_wstr_from_sz(const char* psz)
    {
        //I think it's enough to my case
        wchar_t buf[0x400];
        wchar_t *pbuf = buf;
        size_t len = strlen(psz) + 1;
    
        if (len >= sizeof(buf) / sizeof(wchar_t))
        {
            pbuf = L"error";
        }
        else
        {
            size_t converted;
            mbstowcs_s(&converted, buf, psz, _TRUNCATE);
        }
    
        return std::wstring(pbuf);
    }
    
    std::string get_string_from_wsz(const wchar_t* pwsz)
    {
        char buf[0x400];
        char *pbuf = buf;
        size_t len = wcslen(pwsz)*2 + 1;
    
        if (len >= sizeof(buf))
        {
            pbuf = "error";
        }
        else
        {
            size_t converted;
            wcstombs_s(&converted, buf, pwsz, _TRUNCATE);
        }
    
        return std::string(pbuf);
    }
    

    EDIT to use 'MultiByteToWideChar' instead of 'wcstombs'

    #include <Windows.h>
    #include <boost/shared_ptr.hpp>
    #include "string_util.h"
    
    std::wstring get_wstring_from_sz(const char* psz)
    {
        int res;
        wchar_t buf[0x400];
        wchar_t *pbuf = buf;
        boost::shared_ptr<wchar_t[]> shared_pbuf;
    
        res = MultiByteToWideChar(CP_ACP, 0, psz, -1, buf, sizeof(buf)/sizeof(wchar_t));
    
        if (0 == res && GetLastError() == ERROR_INSUFFICIENT_BUFFER)
        {
            res = MultiByteToWideChar(CP_ACP, 0, psz, -1, NULL, 0);
    
            shared_pbuf = boost::shared_ptr<wchar_t[]>(new wchar_t[res]);
    
            pbuf = shared_pbuf.get();
    
            res = MultiByteToWideChar(CP_ACP, 0, psz, -1, pbuf, res);
        }
        else if (0 == res)
        {
            pbuf = L"error";
        }
    
        return std::wstring(pbuf);
    }
    
    std::string get_string_from_wcs(const wchar_t* pcs)
    {
        int res;
        char buf[0x400];
        char* pbuf = buf;
        boost::shared_ptr<char[]> shared_pbuf;
    
        res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, buf, sizeof(buf), NULL, NULL);
    
        if (0 == res && GetLastError() == ERROR_INSUFFICIENT_BUFFER)
        {
            res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, NULL, 0, NULL, NULL);
    
            shared_pbuf = boost::shared_ptr<char[]>(new char[res]);
    
            pbuf = shared_pbuf.get();
    
            res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, pbuf, res, NULL, NULL);
        }
        else if (0 == res)
        {
            pbuf = "error";
        }
    
        return std::string(pbuf);
    }
    
    满栀 2024-10-21 08:31:23
    #include <boost/locale.hpp>
    namespace lcv = boost::locale::conv;
    
    inline std::wstring fromUTF8(const std::string& s)
    { return lcv::utf_to_utf<wchar_t>(s); }
    
    inline std::string toUTF8(const std::wstring& ws)
    { return lcv::utf_to_utf<char>(ws); }
    
    #include <boost/locale.hpp>
    namespace lcv = boost::locale::conv;
    
    inline std::wstring fromUTF8(const std::string& s)
    { return lcv::utf_to_utf<wchar_t>(s); }
    
    inline std::string toUTF8(const std::wstring& ws)
    { return lcv::utf_to_utf<char>(ws); }
    
    海之角 2024-10-21 08:31:23

    如果其他人感兴趣:我需要一个可以在需要 stringwstring 的地方互换使用的类。以下类 convertible_string 基于 dk123 的解决方案,可以使用 进行初始化stringchar const*wstringwchar_t const* 可以通过指定或隐式转换为 stringwstring (因此可以传递到采用任一函数的函数中)。

    class convertible_string
    {
    public:
        // default ctor
        convertible_string()
        {}
    
        /* conversion ctors */
        convertible_string(std::string const& value) : value_(value)
        {}
        convertible_string(char const* val_array) : value_(val_array)
        {}
        convertible_string(std::wstring const& wvalue) : value_(ws2s(wvalue))
        {}
        convertible_string(wchar_t const* wval_array) : value_(ws2s(std::wstring(wval_array)))
        {}
    
        /* assignment operators */
        convertible_string& operator=(std::string const& value)
        {
            value_ = value;
            return *this;
        }
        convertible_string& operator=(std::wstring const& wvalue)
        {
            value_ = ws2s(wvalue);
            return *this;
        }
    
        /* implicit conversion operators */
        operator std::string() const { return value_; }
        operator std::wstring() const { return s2ws(value_); }
    private:
        std::string value_;
    };
    

    In case anyone else is interested: I needed a class that could be used interchangeably wherever either a string or wstring was expected. The following class convertible_string, based on dk123's solution, can be initialized with either a string, char const*, wstring or wchar_t const* and can be assigned to by or implicitly converted to either a string or wstring (so can be passed into a functions that take either).

    class convertible_string
    {
    public:
        // default ctor
        convertible_string()
        {}
    
        /* conversion ctors */
        convertible_string(std::string const& value) : value_(value)
        {}
        convertible_string(char const* val_array) : value_(val_array)
        {}
        convertible_string(std::wstring const& wvalue) : value_(ws2s(wvalue))
        {}
        convertible_string(wchar_t const* wval_array) : value_(ws2s(std::wstring(wval_array)))
        {}
    
        /* assignment operators */
        convertible_string& operator=(std::string const& value)
        {
            value_ = value;
            return *this;
        }
        convertible_string& operator=(std::wstring const& wvalue)
        {
            value_ = ws2s(wvalue);
            return *this;
        }
    
        /* implicit conversion operators */
        operator std::string() const { return value_; }
        operator std::wstring() const { return s2ws(value_); }
    private:
        std::string value_;
    };
    
    相权↑美人 2024-10-21 08:31:23
    std::string
    convert_str(const std::wstring &s) {
      //not determinate how many bytes to place all wide characters.
      std::string res(2 * s.size(), '\0');
      while(1) {
        size_t num_used_char = std::wcstombs(res.data(), s.data(), res.size());
        if (num_used_char == (size_t) -1) {
          //do your error handling.
        } 
        else if (num_used_char < res.size()) {
          res.resize(num_used_char);
          break;
        }
        res.resize(2 * res.size());
      }
      return res;
    }
    
    std::string
    convert_str(const std::wstring &s) {
      //not determinate how many bytes to place all wide characters.
      std::string res(2 * s.size(), '\0');
      while(1) {
        size_t num_used_char = std::wcstombs(res.data(), s.data(), res.size());
        if (num_used_char == (size_t) -1) {
          //do your error handling.
        } 
        else if (num_used_char < res.size()) {
          res.resize(num_used_char);
          break;
        }
        res.resize(2 * res.size());
      }
      return res;
    }
    
    ぺ禁宫浮华殁 2024-10-21 08:31:23

    我正在使用下面的方法将 wstring 转换为字符串。

    std::string strTo;
    char *szTo = new char[someParam.length() + 1];
    szTo[someParam.size()] = '\0';
    WideCharToMultiByte(CP_ACP, 0, someParam.c_str(), -1, szTo, (int)someParam.length(), NULL, NULL);
    strTo = szTo;
    delete szTo;
    

    I am using below to convert wstring to string.

    std::string strTo;
    char *szTo = new char[someParam.length() + 1];
    szTo[someParam.size()] = '\0';
    WideCharToMultiByte(CP_ACP, 0, someParam.c_str(), -1, szTo, (int)someParam.length(), NULL, NULL);
    strTo = szTo;
    delete szTo;
    
    傲世九天 2024-10-21 08:31:23

    来源:https://msdn.microsoft.com/en-us/library/ 87zae4a3.aspx

    char 字符串与 wchar_t 字符串之间的转换是 Windows 上的一个典型问题。我想不出这个在 Linux 中的用例。 wchar_t 类型在 Windows 上的长度为 2 个字节,在 Linux 上的长度为 4 个字节。从 C++20 开始,存在具有相应位数的类型 char8_tchar16_tchar32_t。因此,在新项目中,您应该对 UTF-8 使用 char8_t,对 UTF-16 使用 char16_t,对 UTF-32 使用 char32_t,请参阅 https://learn.microsoft .com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170

    在 Windows 上,几乎所有经典 API 函数都使用类型 wchar_t,它在 Windows 上对应于现代类型 char16_t。因此,如果要将 char 字符串的值插入 API 函数或,相反,如果 API 函数创建的 wchar_t 字符串要转换为 char 字符串。

    由于两种字符串类型之间的转换是典型的 Windows 问题,因此也应该使用 Windows 函数来实现此目的。 Windows SDK 提供了 WideCharToMultiByte() 函数,用于使用特定代码页将 wchar_t 转换为 char 字符串。 Windows还提供MultiByteToWideChar()函数进行反向转换。如果您指定 CP_UTF8 作为代码页,这些函数将在 Unicode 格式 UTF-16 和 UTF-8 之间进行转换。这两个函数都非常不方便。

    因此,ATL 提供了两个模板类来包装这些函数以简化转换。您只需要标头 ,不需要加载任何库。 CW2A 是类模板 CW2AEXtypedef,它包装了 WideCharToMultiByte() 函数。同样,CA2W 是类模板 CA2WEXtypedef,它包装了函数 MultiByteToWideChar()。这些类的实例具有属性m_psz,其类型为char*wchar_t

    在下面的示例中,我从 const char* 类型的 UTF-8 字符串开始,其中包含中文字符和笑脸。使用 CA2W 将 char 字符串转换为 wchar_t 字符串,以便我可以使用 Windows 函数 MessageBoxW()。然后使用 CW2A 将 wchar_t 字符串转换回 char 字符串。确保将 CP_UTF8 指定为两个类的构造函数的第二个参数,否则 ATL 将使用当前的 ANSI 代码页。最后一条语句确认新字符串和原始字符串具有相同的内容。

    #include <iostream>
    #include <string> // the C++ Standard String Class
    #include <atlconv.h>
    #include <atlstr.h>

    int main()
    {
    const char* utf8Str = (const char*)u8"要开心

    Source: https://msdn.microsoft.com/en-us/library/87zae4a3.aspx

    The conversion of char strings to wchar_t strings and vice versa is a typical problem on Windows. I can't think of a use case for this in Linux. The type wchar_t has a length of 2 bytes on Windows and a length of 4 bytes on Linux. Since C++20 there are the types char8_t, char16_t and char32_t with the corresponding number of bits. In new projects you should therefore use char8_t for UTF-8, char16_t for UTF-16 and char32_t for UTF-32, see https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170.

    On Windows, almost all classic API functions use the type wchar_t, which corresponds on Windows to the modern type char16_t. The conversion from char to wchar_t and vice versa is therefore always necessary if the value of a char string is to be inserted into an API function or, conversely, if the wchar_t string created by an API function is to be converted into a char string.

    As the conversion between the two string types is a typical Windows problem, Windows functions should also be used for this. The Windows SDK offers the WideCharToMultiByte() function for converting a wchar_t into a char string using a specific code page. Windows also provides the MultiByteToWideChar() function for reverse conversion. If you specify CP_UTF8 as the code page, these functions convert between the Unicode formats UTF-16 and UTF-8. Both functions are very unwieldy.

    ATL therefore provides two template classes that wrap these functions to simplify the conversions. You only need the headers <atlconv.h> and <atlstr.h>, no library needs to be loaded. CW2A is a typedef for the class template CW2AEX, which wraps the WideCharToMultiByte() function. Similarly, CA2W is a typedef for the class template CA2WEX, which wraps the function MultiByteToWideChar(). The instances of these classes have the attribute m_psz, which is of type char* or wchar_t.

    In the following example, I start with a UTF-8 character string of type const char*, which contains Chinese characters and a smiley. The char string is converted to a wchar_t string with CA2W so that I can use the Windows function MessageBoxW(). The wchar_t string is then converted back to a char string using CW2A. Make sure that you specify CP_UTF8 as the second parameter of the constructor for both classes, otherwise ATL will use the current ANSI code page. The last statement confirms that the new and the original string have the same content.

    #include <iostream>
    #include <string> // the C++ Standard String Class
    #include <atlconv.h>
    #include <atlstr.h>
    
    int main()
    {
      const char* utf8Str = (const char*)u8"要开心 ????"; // 'Be happy ????'
      CA2W atow(utf8Str, CP_UTF8);
      MessageBoxW(nullptr, atow.m_psz, L"Title", MB_OK);
    
      std::wstring utf16Str = atow.m_psz;
      CW2A wtoa(utf16Str.c_str(), CP_UTF8);
      std::string utf8Str2 = wtoa.m_psz;
    
      std::wcout << "utf8Str == utf8Str2: " << (utf8Str == utf8Str2) << std::endl;
    }
    
    电影里的梦 2024-10-21 08:31:23

    尽管 sus,使用 std::string s( WideString.begin(), WideString.end() ) 确实有效,但会通过直接转换完全截断宽字符。

    该方法生成 2 个 C++ 警告,其中一个在 MSVC++ 中出现非常大:

    警告 C4244:“=”:从“wchar_t”转换为“char”,可能会丢失数据

    • <块引用>

      1>(编译源文件'Filename.cpp')
      1>C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xutility(4537,18):
      1>模板实例化上下文(最旧的第一个)是......

    警告 C4365:“=”:从“wchar_t”到“char”的转换,有符号/无符号不匹配

    要抑制这种情况,您不必使用 std::transform 或任何东西重写它,您可以只需使用#pragma warning(suppress:4244 4365)仅针对紧随其后的行抑制这些警告

    #include <string>
    using namespace std;
    
    int main() {
    
      wstring wide = L"A Хороший string";
      #pragma warning( suppress : 4244 4365 )
      string conv( wide.begin(), wide.end() );
    
      printf( "wide.size() = %zu, conv.size() = %zu\n", wide.size(), conv.size() );
      if( wide.size() == conv.size() ) puts( "Same size" ); // This is what happens
      else puts( "Different sizes!" );
    
      // They are the same size. So let's compare.
      for( int i = 0 ; i < wide.size() ; i++ ) {
        printf( "WIDE: [ %04x/%c ] CHAR: [ %04x/%c ]\n", wide[i], wide[i], conv[i], conv[i] );
      }
    
    }
    

    输出:

    宽字符串转换的字符截断

    我应该在这里注意这不是转换宽字符串的正确方法。它会丢失信息,这就是那些 C++ 警告的含义。如果您想保留宽字符的信息,同时让任何 ANSI/英语字符仅由 1 个字节表示,那么您可以使用 UTF8 编码。在 Windows 上,有一对非常简单的函数可以让您执行转换此处描述

    使用该函数的示例代码:

    string utf8 = utf8_encode( wide );  // getUtf8?
    printf( "The UTF8 string size = %zu\n", utf8.size() );
    for( int i = 0; i < utf8.size(); i++ ) {
      printf( "UTF8: [ %04x/%c ]\n", utf8[i], utf8[i] );
    }
    

    输出如下:

    utf8字符串转换

    在 Linux/使用 上,这些函数将是:

    #include <codecvt>
    #include <locale>
    
    // suppress 'codecvt_utf8<wchar_t>' is deprecated warnings
    #pragma clang diagnostic push
    #pragma clang diagnostic ignored "-Wdeprecated-declarations"
    std::string getUtf8( const std::wstring &wstr ) {
      std::wstring_convert< std::codecvt_utf8<wchar_t>, wchar_t > convert;
      return convert.to_bytes( wstr );
    }
    
    std::wstring fromUtf8( const std::string &str ) {
      std::wstring_convert< std::codecvt_utf8<wchar_t>, wchar_t > convert;
      return convert.from_bytes( str );
    }
    // turn warnings back on
    #pragma clang diagnostic pop
    

    Although sus, using std::string s( wideString.begin(), wideString.end() ) does work, but completely truncates the wide characters with a straight cast.

    That method generates 2 C++ warnings, one of which has a very large spew in MSVC++:

    warning C4244: '=': conversion from 'wchar_t' to 'char', possible loss of data

    • 1>(compiling source file 'Filename.cpp')
      1>C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xutility(4537,18):
      1>the template instantiation context (the oldest one first) is.....

    warning C4365: '=': conversion from 'wchar_t' to 'char', signed/unsigned mismatch

    To suppress that, you don't have to rewrite it using std::transform or anything, you can just use #pragma warning( suppress : 4244 4365 ), which suppresses those warnings only for the line that immediately follows it

    #include <string>
    using namespace std;
    
    int main() {
    
      wstring wide = L"A Хороший string";
      #pragma warning( suppress : 4244 4365 )
      string conv( wide.begin(), wide.end() );
    
      printf( "wide.size() = %zu, conv.size() = %zu\n", wide.size(), conv.size() );
      if( wide.size() == conv.size() ) puts( "Same size" ); // This is what happens
      else puts( "Different sizes!" );
    
      // They are the same size. So let's compare.
      for( int i = 0 ; i < wide.size() ; i++ ) {
        printf( "WIDE: [ %04x/%c ] CHAR: [ %04x/%c ]\n", wide[i], wide[i], conv[i], conv[i] );
      }
    
    }
    

    Output:

    chr truncation for wide string conversion

    I should note here that this is not the correct way to convert wide strings. It loses information and that's what those C++ warnings are about. If you want to retain the information of the wide characters, while having any ANSI/English characters represented by 1 byte only, then you can use the UTF8 encoding. On Windows, there's a really simple pair of functions that lets you do that transformation described here

    Sample code using that function:

    string utf8 = utf8_encode( wide );  // getUtf8?
    printf( "The UTF8 string size = %zu\n", utf8.size() );
    for( int i = 0; i < utf8.size(); i++ ) {
      printf( "UTF8: [ %04x/%c ]\n", utf8[i], utf8[i] );
    }
    

    Outputs like:

    utf8 string conversion

    On Linux/using <codecvt>, these functions would be:

    #include <codecvt>
    #include <locale>
    
    // suppress 'codecvt_utf8<wchar_t>' is deprecated warnings
    #pragma clang diagnostic push
    #pragma clang diagnostic ignored "-Wdeprecated-declarations"
    std::string getUtf8( const std::wstring &wstr ) {
      std::wstring_convert< std::codecvt_utf8<wchar_t>, wchar_t > convert;
      return convert.to_bytes( wstr );
    }
    
    std::wstring fromUtf8( const std::string &str ) {
      std::wstring_convert< std::codecvt_utf8<wchar_t>, wchar_t > convert;
      return convert.from_bytes( str );
    }
    // turn warnings back on
    #pragma clang diagnostic pop
    
    浅笑轻吟梦一曲 2024-10-21 08:31:23
    // Embarcadero C++ Builder 
    
    // convertion string to wstring
    string str1 = "hello";
    String str2 = str1;         // typedef UnicodeString String;   -> str2 contains now u"hello";
    
    // convertion wstring to string
    String str2 = u"hello";
    string str1 = UTF8string(str2).c_str();   // -> str1 contains now "hello"
    
    // Embarcadero C++ Builder 
    
    // convertion string to wstring
    string str1 = "hello";
    String str2 = str1;         // typedef UnicodeString String;   -> str2 contains now u"hello";
    
    // convertion wstring to string
    String str2 = u"hello";
    string str1 = UTF8string(str2).c_str();   // -> str1 contains now "hello"
    
    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文