C++ 的长度std::string 以字节为单位

发布于 2024-12-09 05:51:58 字数 339 浏览 1 评论 0原文

我在弄清楚 std::string.length() 的确切语义时遇到了一些麻烦。 文档明确指出length()返回字符串中的字符数,而不是字节数。我想知道在哪些情况下这实际上会产生影响。

特别是,这仅与 std::basic_string<> 的非 char 实例相关,还是在存储具有多字节字符的 UTF-8 字符串时也会遇到麻烦?标准是否允许 length() 识别 UTF8?

I'm having some trouble figuring out the exact semantics of std::string.length().
The documentation explicitly points out that length() returns the number of characters in the string and not the number of bytes. I was wondering in which cases this actually makes a difference.

In particular, is this only relevant to non-char instantiations of std::basic_string<> or can I also get into trouble when storing UTF-8 strings with multi-byte characters? Does the standard allow for length() to be UTF8-aware?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

傲娇萝莉攻 2024-12-16 05:51:58

在处理 std::basic_string<> 的非 char 实例化时,当然,长度可能不等于字节数。这在 std::wstring 中尤其明显:

std::wstring ws = L"hi";
cout << ws.length();     // <-- 2, not 4

但是 std::string 是关于 char 字符的;就 std::string 而言,不存在多字节字符这样的东西,无论您是否在较高级别上塞入一个字符。因此,std::string.length() 始终是字符串表示的字节数。请注意,如果您将多字节“字符”塞进 std::string 中,那么您对“字符”的定义突然与容器和标准的定义不一致。

When dealing with non-char instantiations of std::basic_string<>, sure, length may not equal number of bytes. This is particularly evident with std::wstring:

std::wstring ws = L"hi";
cout << ws.length();     // <-- 2, not 4

But std::string is about char characters; there is no such thing as a multi-byte character as far as std::string is concerned, whether you crammed one in at a high level or not. So, std::string.length() is always the number of bytes represented by the string. Note that if you're cramming multibyte "characters" into an std::string, then your definition of "character" suddenly becomes at odds with that of the container and of the standard.

独守阴晴ぅ圆缺 2024-12-16 05:51:58

如果我们具体讨论 std::string,那么 length() 确实 返回字节数。

这是因为 std::stringcharbasic_string,并且 C++ 标准定义了一个 char< 的大小。 /code> 正好是一个字节。

请注意,标准没有说明一个字节有多少位,但这完全是另一个故事,您可能不在乎。

编辑:标准确实规定实现应提供 CHAR_BIT 的定义,该定义表示一个字节中有多少位。

顺便说一句,如果您确实关心一个字节有多少位,您可以考虑阅读

If we are talking specifically about std::string, then length() does return the number of bytes.

This is because a std::string is a basic_string of chars, and the C++ Standard defines the size of one char to be exactly one byte.

Note that the Standard doesn't say how many bits are in a byte, but that's another story entirely and you probably don't care.

EDIT: The Standard does say that an implementation shall provide a definition for CHAR_BIT which says how many bits are in a byte.

By the way, if you go down a road where you do care how many bits are in a byte, you might consider reading this.

风月客 2024-12-16 05:51:58

std::stringstd::basic_string,因此 s.length() * sizeof(char) = 字节长度。另外,std::string 对 UTF-8 一无所知,因此即使这不是您真正想要的,您也将获得字节大小。

如果 std::string 中有 UTF-8 数据,则需要使用其他内容,例如 ICU 以获得“真实”长度。

A std::string is std::basic_string<char>, so s.length() * sizeof(char) = byte length. Also, std::string knows nothing of UTF-8, so you're going to get the byte size even if that's not really what you're after.

If you have UTF-8 data in a std::string, you'll need to use something else such as ICU to get the "real" length.

第几種人 2024-12-16 05:51:58

cplusplus.com 不是 std::string 的“文档”,它是一个充满低质量信息的低质量网站。 C++ 标准定义得非常清楚:

  • 21.1 [strings.general] ¶1

    <块引用>

    本条款描述了用于操作任何非数组 POD (3.9) 类型序列的组件。在本子句中,此类类型称为类字符类型,类字符类型的对象称为类字符对象或简称为字符

    21.1

  • 21.4.4 [字符串.容量] ¶1

    <块引用>

    size_type size() const noexcept;
    返回:字符串中当前类字符对象的数量。
    复杂性:恒定时间。

    size_type length() const noexcept;
    返回: size()

    21.4.4

cplusplus.com is not "the documentation" for std::string, it's a poor quality site full of poor quality information. The C++ standard defines it very clearly:

  • 21.1 [strings.general] ¶1

    This Clause describes components for manipulating sequences of any non-array POD (3.9) type. In this Clause such types are called char-like types, and objects of char-like types are called char-like objects or simply characters.

  • 21.4.4 [string.capacity] ¶1

    size_type size() const noexcept;
    Returns: A count of the number of char-like objects currently in the string.
    Complexity: constant time.

    size_type length() const noexcept;
    Returns: size()

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文