std::string 和字符串文字之间不一致

发布于 2024-11-24 07:48:52 字数 1120 浏览 3 评论 0原文

我发现 C++0x 中的 std::string 和字符串文字之间存在令人不安的不一致:

#include <iostream>
#include <string>

int main()
{
    int i = 0;
    for (auto e : "hello")
        ++i;
    std::cout << "Number of elements: " << i << '\n';

    i = 0;
    for (auto e : std::string("hello"))
        ++i;
    std::cout << "Number of elements: " << i << '\n';

    return 0;
}

输出是:

Number of elements: 6
Number of elements: 5

我理解为什么会发生这种情况的机制:字符串文字实际上是一个字符数组包含空字符,并且当基于范围的 for 循环在字符数组上调用 std::end() 时,它会获得一个超出数组末尾的指针;由于空字符是数组的一部分,因此它获得了一个超过空字符的指针。

但是,我认为这是非常不可取的:当涉及到像长度这样的基本属性时,std::string 和字符串文字肯定应该表现相同吗?

有办法解决这种不一致吗?例如,是否可以为字符数组重载 std::begin()std::end() ,以便它们界定的范围不包括终止空字符?如果是这样,为什么没有这样做?

编辑:为了向那些说我只是遭受使用“遗留功能”C 风格字符串的后果的人证明我的愤慨,请考虑如下代码

template <typename Range>
void f(Range&& r)
{
    for (auto e : r)
    {
        ...
    }
}

:您希望 f("hello")f(std::string("hello")) 做一些不同的事情吗?

I have discovered a disturbing inconsistency between std::string and string literals in C++0x:

#include <iostream>
#include <string>

int main()
{
    int i = 0;
    for (auto e : "hello")
        ++i;
    std::cout << "Number of elements: " << i << '\n';

    i = 0;
    for (auto e : std::string("hello"))
        ++i;
    std::cout << "Number of elements: " << i << '\n';

    return 0;
}

The output is:

Number of elements: 6
Number of elements: 5

I understand the mechanics of why this is happening: the string literal is really an array of characters that includes the null character, and when the range-based for loop calls std::end() on the character array, it gets a pointer past the end of the array; since the null character is part of the array, it thus gets a pointer past the null character.

However, I think this is very undesirable: surely std::string and string literals should behave the same when it comes to properties as basic as their length?

Is there a way to resolve this inconsistency? For example, can std::begin() and std::end() be overloaded for character arrays so that the range they delimit does not include the terminating null character? If so, why was this not done?

EDIT: To justify my indignation a bit more to those who have said that I'm just suffering the consequences of using C-style strings which are a "legacy feature", consider code like the following:

template <typename Range>
void f(Range&& r)
{
    for (auto e : r)
    {
        ...
    }
}

Would you expect f("hello") and f(std::string("hello")) to do something different?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

生寂 2024-12-01 07:48:52

如果我们为 const char 数组重载 std::begin()std::end() 以返回比数组大小小 1 的值,则以下代码将输出 4 而不是预期的 5:

#include <iostream>

int main()
{
    const char s[5] = {'h', 'e', 'l', 'l', 'o'};
    int i = 0;
    for (auto e : s)
        ++i;
    std::cout << "Number of elements: " << i << '\n';
}

If we overloaded std::begin() and std::end() for const char arrays to return one less than the size of the array, then the following code would output 4 instead of the expected 5:

#include <iostream>

int main()
{
    const char s[5] = {'h', 'e', 'l', 'l', 'o'};
    int i = 0;
    for (auto e : s)
        ++i;
    std::cout << "Number of elements: " << i << '\n';
}
淤浪 2024-12-01 07:48:52

但是,我认为这是非常不可取的:当涉及到像长度这样的基本属性时,std::string 和字符串文字肯定应该表现相同吗?

根据定义,字符串文字在字符串末尾有一个(隐藏的)空字符。 std::string 则不然。因为 std::strings 有长度,所以空字符有点多余。字符串库的标准部分明确允许非空终止字符串。

编辑
我认为我从未给出过更具争议性的答案,即大量赞成票和大量反对票。

auto 迭代器应用于 C 样式数组时会迭代数组的每个元素。范围的确定是在编译时而不是运行时确定的。这是格式不正确的,例如:

char * str;
for (auto c : str) {
   do_something_with (c);
}

有些人使用 char 类型的数组来保存任意数据。是的,这是一种老式的 C 思维方式,也许他们应该使用 C++ 风格的 std::array,但该构造非常有效且非常有用。如果他们在 char buffer[1024]; 上的自动迭代器仅仅因为该元素恰好与空字符具有相同的值而停止在元素 15 处,那么这些人会感到非常沮丧。 Type buffer[1024]; 上的自动迭代器将一直运行到最后。是什么让 char 数组如此值得完全不同的实现?

请注意,如果您希望字符数组上的自动迭代器提前停止,有一个简单的机制可以做到这一点:将 if (c == '0') break; 语句添加到您的环形。

底线:这里没有不一致之处。 char[] 数组上的 auto 迭代器与自动迭代器在任何其他 C 样式数组上的工作方式一致。

However, I think this is very undesirable: surely std::string and string literals should behave the same when it comes to properties as basic as their length?

String literals by definition have a (hidden) null character at the end of the string. Std::strings do not. Because std::strings have a length, that null character is a bit superfluous. The standard section on the string library explicitly allows non-null terminated strings.

Edit
I don't think I've ever given a more controversial answer in the sense of a huge amount of upvotes and a huge amount of downvotes.

The auto iterator when applied to a C-style array iterates over each element of the array. The determination of the range is made at compile-time, not run time. This is ill-formed, for instance:

char * str;
for (auto c : str) {
   do_something_with (c);
}

Some people use arrays of type char to hold arbitrary data. Yes, it is an old-style C way of thinking, and perhaps they should have used a C++-style std::array, but the construct is quite valid and quite useful. Those people would be rather upset if their auto iterator over a char buffer[1024]; stopped at element 15 just because that element happens to have the same value as the null character. An auto iterator over a Type buffer[1024]; will run all the way to the end. What makes a char array so worthy of a completely different implementation?

Note that if you want the auto iterator over a character array to stop early there is an easy mechanism to do that: Add a if (c == '0') break; statement to the body of your loop.

Bottom line: There is no inconsistency here. The auto iterator over a char[] array is consistent with how auto iterator work any other C-style array.

强者自强 2024-12-01 07:48:52

在第一种情况下得到 6 是 C 中无法避免的抽象泄漏。std::string“修复”了这个问题。为了兼容性,C 样式字符串文字的行为在 C++ 中不会改变。

例如, std::begin() 和 std::end() 是否可以重载
字符数组,以便它们分隔的范围不包括
终止空字符?如果是这样,为什么没有这样做?

假设通过指针访问(与 char[N] 相反),只需在包含字符数的字符串中嵌入一个变量,这样就不会查找 NULL不再需要了。哎呀!这就是std::string

“解决不一致问题”的方法是根本不使用遗留功能。

That you get 6 in the first case is an abstraction leak that couldn't be avoided in C. std::string "fixes" that. For compatibility, the behaviour of C-style string literals does not change in C++.

For example, can std::begin() and std::end() be overloaded for
character arrays so that the range they delimit does not include the
terminating null character? If so, why was this not done?

Assuming access through a pointer (as opposed to char[N]), only by embedding a variable inside the string containing the number of characters, so that seeking for NULL isn't required any more. Oops! That's std::string.

The way to "resolve the inconsistency" is not to use legacy features at all.

灰色世界里的红玫瑰 2024-12-01 07:48:52

根据 N3290 6.5.4,如果范围是数组,则边界值为
自动初始化,无需 begin/end 函数调度。
那么,准备一些像下面这样的包装怎么样?

struct literal_t {
    char const *b, *e;
    literal_t( char const* b, char const* e ) : b( b ), e( e ) {}
    char const* begin() const { return b; }
    char const* end  () const { return e; }
};

template< int N >
literal_t literal( char const (&a)[N] ) {
    return literal_t( a, a + N - 1 );
};

那么以下代码将是有效的:

for (auto e : literal("hello")) ...

如果您的编译器提供用户定义的文字,则缩写可能会有所帮助:

literal operator"" _l( char const* p, std::size_t l ) {
    return literal_t( p, p + l ); // l excludes '\0'
}

for (auto e : "hello"_l) ...

编辑: 以下代码的开销较小
(但用户定义的文字不可用)。

template< size_t N >
char const (&literal( char const (&x)[ N ] ))[ N - 1 ] {
    return (char const(&)[ N - 1 ]) x;
}

for (auto e : literal("hello")) ...

According to N3290 6.5.4, if the range is an array, boundary values are
initialized automatically without begin/end function dispatch.
So, how about preparing some wrapper like the following?

struct literal_t {
    char const *b, *e;
    literal_t( char const* b, char const* e ) : b( b ), e( e ) {}
    char const* begin() const { return b; }
    char const* end  () const { return e; }
};

template< int N >
literal_t literal( char const (&a)[N] ) {
    return literal_t( a, a + N - 1 );
};

Then the following code will be valid:

for (auto e : literal("hello")) ...

If your compiler provides user-defined literal, it might help to abbreviate:

literal operator"" _l( char const* p, std::size_t l ) {
    return literal_t( p, p + l ); // l excludes '\0'
}

for (auto e : "hello"_l) ...

EDIT: The following will have smaller overhead
(user-defined literal won't be available though).

template< size_t N >
char const (&literal( char const (&x)[ N ] ))[ N - 1 ] {
    return (char const(&)[ N - 1 ]) x;
}

for (auto e : literal("hello")) ...
丑疤怪 2024-12-01 07:48:52

如果您想要长度,您应该对 C 字符串使用 strlen(),对 C++ 字符串使用 .length()。您不能以相同的方式对待 C 字符串和 C++ 字符串——它们具有不同的行为。

If you wanted the length, you should use strlen() for the C string and .length() for the C++ string. You can't treat C strings and C++ strings identically--they have different behavior.

莫相离 2024-12-01 07:48:52

可以使用 C++0x 工具箱中的另一个工具来解决不一致问题:用户定义的文字。使用适当定义的用户定义文字:

std::string operator""s(const char* p, size_t n)
{
    return string(p, n);
}

我们将能够编写:

int i = 0;     
for (auto e : "hello"s)         
    ++i;     
std::cout << "Number of elements: " << i << '\n';

现在输出预期的数字:

Number of elements: 5

有了这些新的 std::string 文字,可以说再也没有理由使用 C 风格的字符串文字了。

The inconsistency can be resolved using another tool in C++0x's toolbox: user-defined literals. Using an appropriately-defined user-defined literal:

std::string operator""s(const char* p, size_t n)
{
    return string(p, n);
}

We'll be able to write:

int i = 0;     
for (auto e : "hello"s)         
    ++i;     
std::cout << "Number of elements: " << i << '\n';

Which now outputs the expected number:

Number of elements: 5

With these new std::string literals, there is arguably no more reason to use C-style string literals, ever.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文