如何将 Unicode 字符串转换为 utf-8 或 utf-16 字符串?

发布于 2024-07-08 01:01:36 字数 471 浏览 11 评论 0原文

如何将 Unicode 字符串转换为 utf-8 或 utf-16 字符串? 我的VS2005项目使用Unicode字符集,而cpp中的sqlite提供

int sqlite3_open(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);
int sqlite3_open16(
  const void *filename,   /* Database filename (UTF-16) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

打开文件夹的功能。 如何将 string、CString 或 wstring 转换为 UTF-8 或 UTF-16 字符集?

非常感谢!

How to convert Unicode string into a utf-8 or utf-16 string?
My VS2005 project is using Unicode char set, while sqlite in cpp provide

int sqlite3_open(
  const char *filename,   /* Database filename (UTF-8) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);
int sqlite3_open16(
  const void *filename,   /* Database filename (UTF-16) */
  sqlite3 **ppDb          /* OUT: SQLite db handle */
);

for opening a folder.
How can I convert string, CString, or wstring into UTF-8 or UTF-16 charset?

Thanks very much!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

聊慰 2024-07-15 01:01:36

简短回答:

如果您使用 Unicode 字符串(例如 CString 或 wstring),则不需要转换。 使用 sqlite3_open16()。
您必须确保传递一个 WCHAR 指针(转换为 void *。看起来很蹩脚!即使这个库是跨平台的,我猜他们可能已经定义了一个取决于平台的宽字符类型并且比 void *) 对 API 不那么不友好。 例如对于 CString:(void*)(LPCWSTR)strFilename

较长的答案:

您没有要转换为 UTF8 或 UTF16 的 Unicode 字符串。 您的程序中有一个使用给定编码表示的 Unicode 字符串:Unicode 本身并不是二进制表示形式。 编码说明了 Unicode 代码点(数值)在内存中的表示方式(数字的二进制布局)。 UTF8 和 UTF16 是最广泛使用的编码。 但它们有很大不同。

当 VS 项目说“Unicode charset”时,它实际上意味着“字符被编码为 UTF16”。 因此,您可以直接使用sqlite3_open16()。 无需转换。 字符存储在 WCHAR 类型(与 char 相反)中,该类型占用 16 位(标准 C 类型 wchar_t 的回退,在 Win32 上占用 16 位。在其他操作系统上可能有所不同感谢您的更正,跳棋)。

您可能还需要注意一个细节:UTF16 存在两种风格:Big Endian 和 Little Endian。 这就是这 16 位的字节顺序。 您为 UTF16 提供的函数原型没有说明使用哪种排序。 但是假设 sqlite 使用与 Windows 相同的字节序(Little Endian IIRC。我知道顺序,但一直对名称有问题:-)),你是相当安全的。

编辑:对 Checkers 评论的回答:

UTF16 使用 16 位代码单元。 在 Win32 下(并且在 Win32 上),wchar_t 用于此类存储单元。 诀窍是某些 Unicode 字符需要 2 个这样的 16 位代码单元的序列。 它们被称为代理对。

与 UTF8 使用 1 到 4 字节序列表示 1 个字符的方式相同。 然而 UTF8 与 char 类型一起使用。

Short answer:

No conversion required if you use Unicode strings such as CString or wstring. Use sqlite3_open16().
You will have to make sure you pass a WCHAR pointer (casted to void *. Seems lame! Even if this lib is cross platform, I guess they could have defined a wide char type that depends on the platform and is less unfriendly than a void *) to the API. Such as for a CString: (void*)(LPCWSTR)strFilename

The longer answer:

You don't have a Unicode string that you want to convert to UTF8 or UTF16. You have a Unicode string represented in your program using a given encoding: Unicode is not a binary representation per se. Encodings say how the Unicode code points (numerical values) are represented in memory (binary layout of the number). UTF8 and UTF16 are the most widely used encodings. They are very different though.

When a VS project says "Unicode charset", it actually means "characters are encoded as UTF16". Therefore, you can use sqlite3_open16() directly. No conversion required. Characters are stored in WCHAR type (as opposed to char) which takes 16 bits (Fallsback on standard C type wchar_t, which takes 16 bits on Win32. Might be different on other platforms. Thanks for the correction, Checkers).

There's one more detail that you might want to pay attention to: UTF16 exists in 2 flavors: Big Endian and Little Endian. That's the byte ordering of these 16 bits. The function prototype you give for UTF16 doesn't say which ordering is used. But you're pretty safe assuming that sqlite uses the same endian-ness as Windows (Little Endian IIRC. I know the order but have always had problem with the names :-) ).

EDIT: Answer to comment by Checkers:

UTF16 uses 16 bits code units. Under Win32 (and only on Win32), wchar_t is used for such storage unit. The trick is that some Unicode characters require a sequence of 2 such 16-bits code units. They are called Surrogate Pairs.

The same way an UTF8 represents 1 character using a 1 to 4 bytes sequence. Yet UTF8 are used with the char type.

旧人九事 2024-07-15 01:01:36

使用 WideCharToMultiByte 函数。 为 CodePage 参数指定 CP_UTF8

CHAR buf[256]; // or whatever
WideCharToMultiByte(
  CP_UTF8, 
  0, 
  StringToConvert, // the string you have
  -1, // length of the string - set -1 to indicate it is null terminated
  buf, // output
  __countof(buf), // size of the buffer in bytes - if you leave it zero the return value is the length required for the output buffer
  NULL,    
  NULL
);

此外,Windows 中 unicode 应用的默认编码是 UTF-16LE,因此您可能不需要执行任何转换,只需使用第二个版本 sqlite3_open16 即可。

Use the WideCharToMultiByte function. Specify CP_UTF8 for the CodePage parameter.

CHAR buf[256]; // or whatever
WideCharToMultiByte(
  CP_UTF8, 
  0, 
  StringToConvert, // the string you have
  -1, // length of the string - set -1 to indicate it is null terminated
  buf, // output
  __countof(buf), // size of the buffer in bytes - if you leave it zero the return value is the length required for the output buffer
  NULL,    
  NULL
);

Also, the default encoding for unicode apps in windows is UTF-16LE, so you might not need to perform any translation and just use the second version sqlite3_open16.

安人多梦 2024-07-15 01:01:36

所有 C++ 字符串类型都是字符集中性的。 他们只是确定字符宽度,而不做进一步的假设。 wstring 在 Windows 中使用 16 位字符,大致对应于 utf-16,但它仍然取决于您在线程中存储的内容。 wstring 不会以任何方式强制您放入其中的数据必须是有效的 utf16。 不过,当定义 UNICODE 时,Windows 使用 utf16,因此很可能您的字符串已经是 utf16,并且您不需要执行任何操作。

其他一些人建议使用 WideCharToMultiByte 函数,这是将 utf16 转换为 utf8 的方法之一。 但由于 sqlite 可以处理 utf16,所以这不是必需的。

All the C++ string types are charset neutral. They just settle on a character width, and make no further assumptions. A wstring uses 16-bit characters in Windows, corresponding roughly to utf-16, but it still depends on what you store in the thread. The wstring doesn't in any way enforce that the data you put in it must be valid utf16. Windows uses utf16 when UNICODE is defined though, so most likely your strings are already utf16, and you don't need to do anything.

A few others have suggested using the WideCharToMultiByte function, which is (one of) the way(s) to go to convert utf16 to utf8. But since sqlite can handle utf16, that shouldn't be necessary.

熟人话多 2024-07-15 01:01:36

utf-8和utf-16都是“unicode”字符编码。 您可能谈论的是 utf-32,它是一种固定大小的字符编码。 也许搜索

“将 utf-32 转换为 utf-8 或 utf-16”

可以为您提供一些相关结果或其他论文。

utf-8 and utf-16 are both "unicode" character encodings. What you probably talk about is utf-32 which is a fixed-size character encoding. Maybe searching for

"Convert utf-32 into utf-8 or utf-16"

provides you some results or other papers on this.

自我难过 2024-07-15 01:01:36

最简单的方法是使用 CStringA。 CString 类是 CStringA(ASCII 版本)或 CStringW(宽字符版本)的 typedef。 这两个类都有用于转换字符串类型的构造函数。 我通常使用:

sqlite3_open(CStringA(L"MyWideCharFileName"), ...);

The simplest way to do this is to use CStringA. The CString class is a typedef for either CStringA (ASCII version) or CStringW (wide char version). Both of these classes have constructors to convert string types. I typically use:

sqlite3_open(CStringA(L"MyWideCharFileName"), ...);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文