当前位置：文江博客话题详情

String c++ immutability

不可变字符串与 std::string

发布于 2024-09-02 22:34:26 字数 778 浏览 16 评论 0 原文

我最近一直在阅读有关不可变字符串的内容为什么字符串在 Java 和 .NET 中不能可变？和为什么 .NET String 是不可变的？以及一些关于为什么D 选择了不可变字符串。似乎有很多优点。

线程安全、
更安全、
在大多数用例中，
内存效率更高。便宜的子字符串（标记化和切片）

更不用说大多数新语言都具有不可变字符串，D2.0、Java、C#、Python 等。C

++ 会从不可变字符串中受益吗？

是否有可能在 c++（或 c++0x）中实现具有所有这些优点的不可变字符串类？

更新：

对不可变字符串有两次尝试 const_string 和 fix_str。五年来两者都没有更新过。它们甚至被使用过吗？为什么 const_string 没有进入 boost ？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

牛↙奶布丁 2024-09-09 22:34:26

我发现这个帖子中的大多数人并不真正理解什么是immutable_string。这不仅仅与常量有关。 immutable_string 的真正强大之处在于性能（即使在单线程程序中）和内存使用情况。

想象一下，如果所有字符串都是不可变的，并且所有字符串都像这样实现，

class string {
    char* _head ;
    size_t _len ;
} ;

我们如何实现 sub-str 操作？我们不需要复制任何字符。我们所要做的就是分配 _head 和 _len。那么子字符串与源字符串共享相同的内存段。

当然，我们不能仅用两个数据成员来真正实现immutable_string。真正的实现可能需要一个引用计数（或fly-weighted）内存块。这样，

class immutable_string {
    boost::fly_weight<std::string> _s ;
    char* _head ;
    size_t _len ;
} ;

在大多数情况下，内存和性能都会比传统字符串更好，尤其是当你知道自己在做什么时。

当然，C++ 可以从不可变字符串中受益，并且拥有一个很好。我检查了 Cubbi 提到的 boost::const_string 和 fix_str 。我要说的应该就是这些。

I found most people in this thread do not really understand what immutable_string is. It is not only about the constness. The really power of immutable_string is the performance (even in single thread program) and the memory usage.

Imagine that, if all strings are immutable, and all string are implemented like

class string {
    char* _head ;
    size_t _len ;
} ;

How can we implement a sub-str operation? We don't need to copy any char. All we have to do is assign the _head and the _len. Then the sub-string shares the same memory segment with the source string.

Of course we can not really implement a immutable_string only with the two data members. The real implementation might need a reference-counted(or fly-weighted) memory block. Like this

class immutable_string {
    boost::fly_weight<std::string> _s ;
    char* _head ;
    size_t _len ;
} ;

Both the memory and the performance would be better than the traditional string in most cases, especially when you know what you are doing.

Of course C++ can benefit from immutable string, and it is nice to have one. I have checked the boost::const_string and the fix_str mentioned by Cubbi. Those should be what I am talking about.

回复收藏 0 原文

月依秋水 2024-09-09 22:34:26

作为一个意见：

是的，我非常想要一个 C++ 的不可变字符串库。
不，我不希望 std::string 是不可变的。

它真的值得做吗（作为标准库功能）？我想说不是。 const 的使用为您提供了本地不可变的字符串，而系统编程语言的基本性质意味着您确实需要可变字符串。

回复收藏 0 原文

趁年轻赶紧闹 2024-09-09 22:34:26

我的结论是，C++ 不需要不可变模式，因为它具有 const 语义。

在 Java 中，如果您有一个 Person 类，并且使用 getName() 方法返回该人的 String name，那么您唯一的保护是不可变的模式。如果它不在那里，您将不得不整夜clone()您的字符串（因为您必须处理不是典型值对象的数据成员，但仍然需要受到保护）。

在 C++ 中，你有 const std::string& getName() const.因此，您可以编写 SomeFunction(person.getName()) ，类似于 void SomeFunction(const std::string& subject) 。

没有发生复制
如果有人想复制，他可以自由地这样做
技术适用于所有数据类型，而不仅仅是字符串

回复收藏 0 原文

情仇皆在手 2024-09-09 22:34:26

你当然不是唯一这么想的人。事实上，有一个由 Maxim Yegorushkin 编写的 const_string 库，它似乎是在编写时考虑将其包含到 boost 中。这里有一个较新的库，fix_str，作者：Roland Pibinger。我不确定运行时的完整字符串实习有多棘手，但大多数优点在必要时都是可以实现的。

回复收藏 0 原文

柠檬 2024-09-09 22:34:26

我认为这里没有明确的答案。这是主观的——如果不是因为个人品味，那么至少是因为人们最常处理的代码类型。（仍然是一个有价值的问题。）

当内存便宜时，不可变字符串非常有用——开发 C++ 时情况并非如此，而且并非 C++ 所针对的所有平台上都是如此。（OTOH 在更有限的平台上 C 似乎比 C++ 更常见，因此这个论点很弱。）

您可以在 C++ 中创建一个不可变的字符串类，并且可以使其在很大程度上与 std::string 兼容—但是与具有专用优化和语言功能的内置字符串类相比，您仍然会失败。

std::string 是我们得到的最好的标准字符串，所以我不希望看到任何混乱。不过我很少使用它； std::string 在我看来有太多缺点。

回复收藏 0 原文

舂唻埖巳落 2024-09-09 22:34:26

const std::string

就这样吧。字符串文字也是不可变的，除非您想进入未定义的行为。

编辑：当然，这只是故事的一半。 const 字符串变量没有用，因为您无法使其引用新字符串。对 const 字符串的引用可以做到这一点，但 C++ 不允许您像 Python 等其他语言那样重新分配引用。最接近的是指向动态分配字符串的智能指针。

const std::string

There you go. A string literal is also immutable, unless you want to get into undefined behavior.

Edit: Of course that's only half the story. A const string variable isn't useful because you can't make it reference a new string. A reference to a const string would do it, except that C++ won't allow you to reassign a reference as in other languages like Python. The closest thing would be a smart pointer to a dynamically allocated string.

回复收藏 0 原文

北城半夏 2024-09-09 22:34:26

不可变字符串很棒如果，每当需要创建新字符串时，内存管理器将始终能够确定每个字符串引用的位置。在大多数平台上，可以以相对适中的成本提供对这种能力的语言支持，但在没有内置这种语言支持的平台上，要困难得多。

例如，如果想要在 x86 上设计一个支持不可变字符串的 Pascal 实现，则字符串分配器必须能够遍历堆栈以查找所有字符串引用；唯一的执行时间成本是需要一致的函数调用方法[例如不使用尾调用，并且让每个非叶函数维护一个帧指针]。每个用 new 分配的内存区域都需要有一个位来指示它是否包含任何字符串，而那些包含字符串的内存区域需要有一个内存布局描述符的索引，但这些成本将是相当轻微。

如果 GC 不是用于遍历堆栈的表，则需要让代码使用句柄而不是指针，并让代码在局部变量进入作用域时创建字符串句柄，并在局部变量超出作用域时销毁句柄。开销要大得多。

回复收藏 0 原文

萧瑟寒风 2024-09-09 22:34:26

Qt 还使用不可变字符串和写入时复制。
关于像样的编译器到底能带来多少性能，存在一些争论。

回复收藏 0 原文

挽清梦 2024-09-09 22:34:26

常量字符串对于值语义来说没有什么意义，并且共享并不是 C++ 的最大优势之一......

回复收藏 0 原文

染年凉城似染瑾 2024-09-09 22:34:26

C++ 会从不可变字符串中受益吗？可能不多。

不可变字符串与只读字符串不同。不变性保证字符串的可观察状态不会发生在您自己的代码可以影响的范围之外的情况，以至于如果您采用任何传递 std::string 的代码> 按值，您可以将其替换为这样一个不可变的字符串指针，并且一切都会起作用（您无法区分按值传递这样的字符串与按引用传递）。

仅当您从一开始就创建 const 对象时，C++ 才能保证这一点 - 将 const 添加到现有对象不会使其不可变，您可以通过 删除它const_cast 任何时候都不会出现任何问题。这基本上意味着你不能在 C++ 中使类型不可变，只能是一个对象：

const std::string str("hello");
// or
const std::string &str = *new const std::string("hello");

这里实际上有两个保证 - 你不能通过 const_cast 修改 str 。 std::string&>(str) （修改 const 对象时的未定义行为），并且无法通过 const_cast(str. data())（data() const 的未定义行为）。理论上你可以得到 const_cast(str).data()，并且修改它对于一个简单的 std::string 实现来说可能没问题，但是标准的 std::string 可能会被优化，以便它将字符数据（最多一定大小）存储在对象本身中，因此您仍然有修改对象的风险。

如何传递不可变的对象？你不能。

void f(const std::string &str);

这不再是一个不可变的对象，而是一个只读对象——您无法通过 str 修改它，但它仍然可以随时更改。

要拥有一个实际上不可变的字符串，您需要以某种方式在类型系统中对其进行编码。最好的方法是将其包装在另一个不可变的对象中，例如 const std::tuple; &str - 不存在允许从可变 std::tuple 获取此引用的元组差异，因此，在创建时，该字符串已经是一个 const 对象。

最后，您还需要保证此类对象的生命周期，因为即使是不可变的对象也可以被删除。值得庆幸的是，这并没有那么复杂 - std::shared_ptr> 为您提供了所有保证 - 一旦您观察到它的值，它应该保持不变< em>永远。这几乎是具有不可变字符串的语言为您提供的基础。

现在，要推理出答案，C++ 将如何从使用这样的字符串中受益？是否有任何地方存储只读字符串，而不让您控制特定类型？我能想出的东西并不多：

std::exception——从错误消息构造它需要复制字符串来保存它，但是使用不可变的字符串指针，它可以只存储它并在 what() 中返回。但是，您可以创建自己的异常类型来执行此操作。
std::locale - 同样，从区域设置名称构造可以仅将不可变字符串存储在内部，而无需复制。然而，区域设置名称并不经常使用，并且足够短，足以在大多数情况下进行小字符串优化。
std::messages - get() 可能是不可变字符串的主要目标，就像任何其他“字符串目录”一样 - 检索可能相对频繁地发生，并且字符串适度足够长且恒定。但是，没有什么可以阻止您向该对象添加缓存层。

我鼓励您找到更多这样的例子。事实上，接受和处理、生成和返回字符串的情况比简单存储的情况要多得多。

话虽这么说，用户代码中的情况却截然不同——一旦您存储了用户记录或配置，您就变得更需要这种类型。因此，虽然 C++ 语言本身可能不会从中受益，但您的代码肯定会受益！这样的类型需要：

可以轻松地从字符串文字构造而无需复制，创建具有永久生命周期的不可变字符串。
可从 std::string 移动构造，获取其字符数据的所有权。
无需从具有不变性的生命周期管理的其他对象进行复制即可构造，例如 std::shared_ptr> 或上述元组。
可以通过 std::string 、 char 迭代器和通常的东西的值来构造，将数据复制到其内部存储器。
“不安全可构造”，无需从 std::shared_ptr> 等内容进行复制，其中类型不保证不变性。该选项需要明确标记为危险，并且在结果的生命周期内以任何方式修改底层数据都应该导致未定义的行为。
与现有的基于 std::shared_ptr 的代码很好地接口，最好作为 std::shared_ptr 本身进行处理。
允许无分配切片，生成具有相同生命周期的数据视图。
包含与 C 字符串的兼容性，以 const char *data() const 的形式进行检索。请注意，切片字符串末尾可能会缺少 '\0' 字符，因此需要有一种方法来检测这种情况的发生并返回 std::shared_ptr 代替（或者为原始数据起别名，或者为末尾带有 '\0' 的副本）。

std::shared_ptr 和 std::string 之间的相互作用使得这种类型绝对更适合由标准而不是用户代码定义，因为某些情况可以处理得更多无需不必要的分配或间接，效果更好，并且不变性保证可以实现重要的优化。

Would C++ benefit from immutable strings? Probably not much.

An immutable string is not the same as a read-only string. Immutability guarantees that no change to the observable state of the string may occur outside of what your own code can affect, to the point that if you take any code passing std::string by value, you could just replace it with such an immutable string pointer and everything will work (you cannot distinguish passing such a string by value from passing by reference).

C++ guarantees this only when you create a const object right from the beginning ‒ adding const to an existing object does not make it immutable, and you can remove it via const_cast any time without any issues. This basically means that you cannot make a type immutable in C++, only an object:

const std::string str("hello");
// or
const std::string &str = *new const std::string("hello");

There are actually two guarantees at play here ‒ you cannot modify str via const_cast<std::string&>(str) (undefined behaviour when modifying const object), and you cannot modify the character data via const_cast<char*>(str.data()) (undefined behaviour for data() const). You could theoretically get const_cast<std::string&>(str).data(), and modifying that might be fine for a trivial std::string implementation, but the standard std::string might be optimized so that it stores the character data (up to a size) in the object itself, thus you still risk modifying the object.

How do you pass an immutable object around? You can't.

void f(const std::string &str);

This is no longer an immutable object, but a read-only object ‒ you cannot modify it through str, but it can still change at any time.

To have an actually immutable string, you need to encode this in the type system somehow. The best you can do is to wrap it in another object where it is immutable, like const std::tuple<const std::string> &str ‒ there is no tuple variance that would permit getting this reference from a mutable std::tuple<std::string>, and so, when created, the string is already a const object.

Lastly, you also need guarantees about the lifetime of such an object, since even an immutable object can be deleted. That is thankfully not so complicated ‒ std::shared_ptr<const std::tuple<const std::string>> gives you all the guarantees ‒ once you observe its value, it should stay constant forever. This is pretty much the basis of what languages with immutable strings give you.

Now, to reason about the answer, how would C++ benefit from using such a string? Are there any places where read-only strings are stored, without giving you the control over the specific type? There are not that many I could come up with:

std::exception ‒ constructing it from an error message needs to copy the string to preserve it, but with an immutable string pointer, it could just store that and return in what(). However, you can create your own exception types that do that.
std::locale ‒ likewise, constructing from a locale name could just store the immutable string inside, without copying. However, locale names are not that frequently used, and are short enough to make small string optimization kick in for most of cases.
std::messages ‒ get() could be a primary target for immutable strings, as any other "string catalogs" ‒ retrieval could happen relatively often, and the strings are moderately long and constant enough. Nothing stops you from adding a caching layer to this object however.

I encourage you to find more such examples. Indeed, there are many more cases where strings are accepted and processed, or generated and returned, than simply stored.

That being said, the situation in user code is drastically different ‒ once you store user records or configurations, you become more in need of such a type. So while the C++ language itself might not benefit from it, your code definitely would! Such a type needs to:

Be easily constructible from a string literal without copying, creating an immutable string with permanent lifetime.
Be move-constructible from std::string, taking ownership of its character data.
Be constructible without copying from other objects with lifetime management that exhibit immutability, such as std::shared_ptr<std::array<const char, N>> or the aforementioned tuple.
Be constructible by value from std::string, char iterators and the usual stuff, copying the data to its internal memory.
Be "unsafe-constructible" without copying from things like std::shared_ptr<std::span<char>>, where immutability is not guaranteed by the type. This option needs to be clearly marked as dangerous, and modifying the underlying data in any way during the lifetime of the result should cause undefined behaviour.
Interface well with existing std::shared_ptr-based code, preferably be handled as std::shared_ptr itself.
Allow allocation-less slicing, producing views into its data with the same lifetime.
Include compatibility with C strings in the form of const char *data() const for retrieval. Note that sliced strings may be missing the '\0' character at the end, and so there needs to be a way to detect this having happened and return std::shared_ptr<const char*> instead (either aliased to the original data, or to a copy thereof with '\0' at the end).

This interplay between std::shared_ptr and std::string makes such a type definitely preferable to be defined by the standard rather than user code, as some situations could be handled much better without unnecessary allocations or indirection, and the immutability guarantee enables important optimizations.

回复收藏 0 原文

沙与沫 2024-09-09 22:34:26

Ruby 中的字符串是可变的。

$ irb
>> foo="hello"
=> "hello"
>> bar=foo
=> "hello"
>> foo << "world"
=> "helloworld"
>> print bar
helloworld=> nil

线程安全

我往往会忘记安全参数。如果你想线程安全，就锁定它，或者不要碰它。 C++ 不是一种方便的语言，有自己的约定。

更安全

不。一旦您有了指针算术和对地址空间的不受保护的访问，就忘记安全问题了。是的，更安全地防止无辜的错误编码。

在大多数用例中内存效率更高。

除非你实现 CPU 密集型机制，否则我不知道如何实现。

廉价子字符串（标记化和切片）

这将是一个非常好的观点。可以通过引用具有反向引用的字符串来完成，其中对字符串的修改将导致复制。标记化和切片变得免费，而突变变得昂贵。

Strings are mutable in Ruby.

$ irb
>> foo="hello"
=> "hello"
>> bar=foo
=> "hello"
>> foo << "world"
=> "helloworld"
>> print bar
helloworld=> nil

trivially thread safe

I would tend to forget safety arguments. If you want to be thread-safe, lock it, or don't touch it. C++ is not a convenient language, have your own conventions.

more secure

No. As soon as you have pointer arithmetics and unprotected access to the address space, forget about being secure. Safer against innocently bad coding, yes.

more memory efficient in most use cases.

Unless you implement CPU-intensive mechanisms, I don't see how.

cheap substrings (tokenizing and slicing)

That would be one very good point. Could be done by referring to a string with backreferences, where modifications to a string would cause a copy. Tokenizing and slicing become free, mutations become expensive.

回复收藏 0 原文

野鹿林 2024-09-09 22:34:26

C++字符串是线程安全的，所有不可变对象都保证是线程安全的，但Java的StringBuffer像C++字符串一样是可变的，并且它们都是线程安全的。为什么要担心速度，使用 const 关键字定义方法或函数参数来告诉编译器该字符串在该范围内是不可变的。此外，如果字符串对象是按需不可变的，则在您绝对需要使用该字符串时等待，换句话说，当您将其他字符串附加到主字符串时，您将拥有一个字符串列表，直到您真正需要整个字符串为止，然后将它们连接起来在那一刻在一起。

据我所知，不可变对象和可变对象以相同的速度运行，除了它们的方法有利有弊。常量基元和变量基元以不同的速度移动，因为在机器级别，变量被分配到需要一些二进制操作的寄存器或内存空间，而常量是不需要任何这些操作的标签，因此速度更快（或完成的工作较少）。仅适用于基元，不适用于对象。