支持和反对在跨平台库中专门支持 std::wstring 的争论
我目前正在开发一个跨平台的 C++ 库,我打算让它能够识别 Unicode。我目前通过 typedef 和宏对 std::string 或 std::wstring 提供编译时支持。这种方法的缺点是它迫使您使用 L("string") 等宏,并大量使用基于字符类型的模板。
支持和反对仅支持 std::wstring 的论据是什么?
使用 std::wstring 是否会专门阻碍 GNU/Linux 用户群(其中首选 UTF-8 编码)?
I'm currently developing a cross-platform C++ library which I intend to be Unicode aware. I currently have compile-time support for either std::string or std::wstring via typedefs and macros. The disadvantage with this approach is that it forces you to use macros like L("string")
and to make heavy use of templates based on character type.
What are the arguments for and against to support std::wstring only?
Would using std::wstring exclusively hinder the GNU/Linux user base, where UTF-8 encoding is preferred?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
很多人希望使用带有 UTF-8 (std::string) 的 unicode,而不是 UCS-2 (std::wstring)。 UTF-8 是许多 Linux 发行版和数据库上的标准编码 - 因此不支持它将是一个巨大的缺点。在 Linux 上,每次以字符串作为参数调用库中的函数时,都需要用户将(本机)UTF-8 字符串转换为 std::wstring。
在 gcc/linux 上,std::wstring 的每个字符将有 4 个字节,而在 Windows 上将有 2 个字节。这可能会在读取或写入文件(以及从不同平台复制文件或将文件复制到不同平台)时导致奇怪的效果。我宁愿推荐 UTF-8/std::string 用于跨平台项目。
A lot of people would want to use unicode with UTF-8 (std::string) and not UCS-2 (std::wstring). UTF-8 is the standard encoding on a lot of linux distributions and databases - so not supporting it would be a huge disadvantage. On Linux every call to a function in your library with a string as argument would require the user to convert a (native) UTF-8 string to std::wstring.
On gcc/linux each character of a std::wstring will have 4 bytes while it will have 2 bytes on Windows. This can lead to strange effects when reading or writing files (and copying them from/to different platforms). I would rather recomend UTF-8/std::string for a cross platform project.
支持使用宽字符的论点是,它可以完成窄字符可以做的所有事情,甚至更多。
据我所知,反对它的论点是:
至于灵活性:我维护了一个库(几个 kLoC)可以处理窄字符和宽字符。大部分是通过字符类型作为模板参数,我不记得任何宏(除了
UNICODE
)。不过,并非所有内容都是灵活的,其中有一些代码最终需要char
或wchar_t
字符串。 (使用宽字符使内部键字符串变宽是没有意义的。)用户可以决定是否只需要窄字符支持(在这种情况下
"string"
就可以)或只需要宽字符支持(这需要他们使用L"string"
)或者他们是否也想同时支持两者(这需要类似T("string")
的东西)。The argument in favor of using wide characters is that it can do everything narrow characters can and more.
The argument against it that I know are:
As for being flexible: I have maintained a library (several kLoC) that could deal with both narrow and wide characters. Most of it was through the character type being a template parameter, I don't remember any macros (other than
UNICODE
, that is). Not all of it was flexible, though, there was some code in there which ultimately required eitherchar
orwchar_t
string. (No point in making internal key strings wide using wide characters.)Users could decide whether they wanted only narrow character support (in which case
"string"
was fine) or only wide character support (which required them to useL"string"
) or whether they wanted to support both, too (which required something likeT("string")
).致:
反对:
For:
Against:
我想说使用
std::string
或std::wstring
是无关紧要的。无论如何,没有一个提供适当的 Unicode 支持。
如果您需要国际化,那么您需要适当的 Unicode 支持,并且应该开始研究 ICU 等库。
之后,就是使用哪种编码的问题,这取决于您所在的平台:将依赖于操作系统的设施包装在抽象层后面,并在适用时在实现层中进行转换。
不要担心您使用的 Unicode 库内部使用的编码(或构建?嗯),这是性能问题,不应该影响库本身的使用。
I would say that using
std::string
orstd::wstring
is irrelevant.None offer proper Unicode support anyway.
If you need internationalization, then you need proper Unicode support and should start investigating about libraries such as ICU.
After that, it's a matter of which encoding use, and this depends on the platform you're on: wrap the OS-dependent facilities behind an abstraction layer and convert in the implementation layer when applicable.
Don't worry about the encoding internally used by the Unicode library you use (or build ? hum), it's a matter of performance and should not impact the use of the library itself.
缺点:
因为 wstring 是真正的 UCS-2 而不是 UTF-16。总有一天我会踢你的小腿。而且它会踢得很厉害。
Disadvantage:
Since wstring is truly UCS-2 and not UTF-16. I will kick you in the shins one day. And it will kick hard.