如何将 UTF-8 格式转换为拉丁语/阿拉伯语,反之亦然?
是否有跨平台方法可以在 C++ 中从 UTF-8 转换为拉丁/阿拉伯语以及从拉丁/阿拉伯语转换为 UTF-8?
Is there a cross-platform way to convert from UTF-8 to Latin/Arabic and from Latin/Arabicto UTF-8 in C++?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有诸如 icu 之类的库可用。但 Erik 当然是对的:从 Unicode 到 ISO 8859-6 的往返将会有损。 (是的,UTF-8 是“Unicode”。UTF-16 也是“Unicode”,只是同一代码编号具有不同的位模式。请参阅 Joel Spolsky 的文本(如果您不知道的话)。或者如果您还没有读过,这是很好的材料。)
There are libraries like icu available. But Erik is, of course, right: The round-trip from Unicode through ISO 8859-6 will be lossy. (Yes, UTF-8 is “Unicode.” UTF-16, is “Unicode,” too, just having different bit-patterns for the same code number. See Joel Spolsky's text if you didn't know that. Or if you haven't read it yet, it's good material.)
没有,但有一种跨平台方法可以在
wchar_t
表示的 unicode(Windows 上为 16 位,大多数其他平台上为 32 位)和设置为的任何内容之间进行转换使用标准 C 库中的wcstombs
/mbstowcs
例程或标准中locale
的codecvt
方面进行系统中的语言环境字符编码C++ 库。wchar_t
(其中每个元素是一个代码点)和 utf-8 之间的转换非常简单。因此,您可以从某处编写或复制一个例程,以在wchar_t
中在 utf-8 和 unicode 之间进行转换,并将其与wcstombs
/mbstowcs
结合起来。There is not, but there is a cross-platform way to convert between unicode represented in
wchar_t
(which is 16-bit on Windows and 32-bit on most of the other platforms) and whatever is set as locale character encoding in the system usingwcstombs
/mbstowcs
routines from standard C library orcodecvt
facet oflocale
in standard C++ library. The conversion betweenwchar_t
, where each element is one codepoint and utf-8 is than quite simple. So you can write or copy from somewhere a routine to convert between utf-8 and unicode inwchar_t
and combine it withwcstombs
/mbstowcs
.