支持和反对在跨平台库中专门支持 std::wstring 的争论

发布于 2024-09-18 01:09:02 字数 243 浏览 3 评论 0原文

我目前正在开发一个跨平台的 C++ 库,我打算让它能够识别 Unicode。我目前通过 typedef 和宏对 std::string 或 std::wstring 提供编译时支持。这种方法的缺点是它迫使您使用 L("string") 等宏,并大量使用基于字符类型的模板。

支持和反对仅支持 std::wstring 的论据是什么?

使用 std::wstring 是否会专门阻碍 GNU/Linux 用户群(其中首选 UTF-8 编码)?

I'm currently developing a cross-platform C++ library which I intend to be Unicode aware. I currently have compile-time support for either std::string or std::wstring via typedefs and macros. The disadvantage with this approach is that it forces you to use macros like L("string") and to make heavy use of templates based on character type.

What are the arguments for and against to support std::wstring only?

Would using std::wstring exclusively hinder the GNU/Linux user base, where UTF-8 encoding is preferred?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

耀眼的星火 2024-09-25 01:09:02

很多人希望使用带有 UTF-8 (std::string) 的 unicode,而不是 UCS-2 (std::wstring)。 UTF-8 是许多 Linux 发行版和数据库上的标准编码 - 因此不支持它将是一个巨大的缺点。在 Linux 上,每次以字符串作为参数调用库中的函数时,都需要用户将(本机)UTF-8 字符串转换为 std::wstring。

在 gcc/linux 上,std::wstring 的每个字符将有 4 个字节,而在 Windows 上将有 2 个字节。这可能会在读取或写入文件(以及从不同平台复制文件或将文件复制到不同平台)时导致奇怪的效果。我宁愿推荐 UTF-8/std::string 用于跨平台项目。

A lot of people would want to use unicode with UTF-8 (std::string) and not UCS-2 (std::wstring). UTF-8 is the standard encoding on a lot of linux distributions and databases - so not supporting it would be a huge disadvantage. On Linux every call to a function in your library with a string as argument would require the user to convert a (native) UTF-8 string to std::wstring.

On gcc/linux each character of a std::wstring will have 4 bytes while it will have 2 bytes on Windows. This can lead to strange effects when reading or writing files (and copying them from/to different platforms). I would rather recomend UTF-8/std::string for a cross platform project.

清引 2024-09-25 01:09:02

支持和反对仅支持 std::wstring 的论点是什么?

支持使用宽字符的论点是,它可以完成窄字符可以做的所有事情,甚至更多。

据我所知,反对它的论点是:

  • 宽字符需要更多的空间(这几乎不相关,原则上,中国人在记忆方面并不比美国人更头痛)
  • 使用宽字符让一些习惯于所有内容的西方人感到头疼他们的字符适合 7 位(并且不愿意学习注意不要将实际字符的字符类型与其他用途混合使用)

至于灵活性:我维护了一个库(几个 kLoC)可以处理窄字符和宽字符。大部分是通过字符类型作为模板参数,我不记得任何宏(除了 UNICODE )。不过,并非所有内容都是灵活的,其中有一些代码最终需要 charwchar_t 字符串。 (使用宽字符使内部键字符串变宽是没有意义的。)
用户可以决定是否只需要窄字符支持(在这种情况下 "string" 就可以)或只需要宽字符支持(这需要他们使用 L"string")或者他们是否也想同时支持两者(这需要类似 T("string") 的东西)。

What are the arguments for and against to support std::wstring only?

The argument in favor of using wide characters is that it can do everything narrow characters can and more.

The argument against it that I know are:

  • wide characters need more space (which is hardly relevant, the Chinese do not, in principle, have more headaches over memory than Americans have)
  • using wide characters gives headaches to some westerners who are used for all their characters to fit into 7bit (and are unwilling to learn to pay a bit of attention to not to intermingle uses of the character type for actual characters vs. other uses)

As for being flexible: I have maintained a library (several kLoC) that could deal with both narrow and wide characters. Most of it was through the character type being a template parameter, I don't remember any macros (other than UNICODE, that is). Not all of it was flexible, though, there was some code in there which ultimately required either char or wchar_t string. (No point in making internal key strings wide using wide characters.)
Users could decide whether they wanted only narrow character support (in which case "string" was fine) or only wide character support (which required them to use L"string") or whether they wanted to support both, too (which required something like T("string")).

林空鹿饮溪 2024-09-25 01:09:02

致:

反对:

  • 您可能必须与不支持 i18n 的代码交互。但就像任何优秀的库编写者一样,您只需将这些混乱隐藏在易于使用的界面后面,对吧?正确的?

For:

Against:

  • You might have to interface with code that isn't i18n-aware. But like any good library writer, you'll just hide that mess behind an easy-to-use interface, right? Right?
人生百味 2024-09-25 01:09:02

我想说使用 std::stringstd::wstring 是无关紧要的。

无论如何,没有一个提供适当的 Unicode 支持。

如果您需要国际化,那么您需要适当的 Unicode 支持,并且应该开始研究 ICU 等库。

之后,就是使用哪种编码的问题,这取决于您所在的平台:将依赖于操作系统的设施包装在抽象层后面,并在适用时在实现层中进行转换。

不要担心您使用的 Unicode 库内部使用的编码(或构建?嗯),这是性能问题,不应该影响库本身的使用。

I would say that using std::string or std::wstring is irrelevant.

None offer proper Unicode support anyway.

If you need internationalization, then you need proper Unicode support and should start investigating about libraries such as ICU.

After that, it's a matter of which encoding use, and this depends on the platform you're on: wrap the OS-dependent facilities behind an abstraction layer and convert in the implementation layer when applicable.

Don't worry about the encoding internally used by the Unicode library you use (or build ? hum), it's a matter of performance and should not impact the use of the library itself.

萌吟 2024-09-25 01:09:02

缺点:

因为 wstring 是真正的 UCS-2 而不是 UTF-16。总有一天我会踢你的小腿。而且它会踢得很厉害。

Disadvantage:

Since wstring is truly UCS-2 and not UTF-16. I will kick you in the shins one day. And it will kick hard.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文