为什么 Matz 选择在 Ruby 中默认设置字符串可变？

发布于 2024-08-28 10:58:04 字数 254 浏览 10 评论 0原文

这是这个问题的反面：为什么字符串不能是可变的在 Java 和 .NET 中？

在 Ruby 中做出这种选择只是因为操作（附加等）对可变字符串有效，还是还有其他原因？

（如果只是效率，那似乎很奇怪，因为 Ruby 的设计似乎并没有对促进高效实现给予很高的重视。）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

勿忘初心 2024-09-04 10:58:04

正如您所注意到的，这符合 Ruby 的设计。不可变字符串比可变字符串更有效 - 减少复制，因为字符串被重复使用 - 但使程序员的工作更加困难。很直观地将字符串视为可变的 - 您可以将它们连接在一起。为了解决这个问题，Java 默默地将两个字符串的连接（通过 +）转换为 StringBuffer 对象的使用，并且我确信还有其他此类 hack。 Ruby 选择默认使字符串可变，但以牺牲性能为代价。

Ruby 还具有许多破坏性方法，例如依赖于可变字符串的 String#upcase!。

另一个可能的原因是 Ruby 受到 Perl 的启发，而 Perl 恰好使用可变字符串。

Ruby 有符号和冻结字符串，两者都是不可变的。作为额外的好处，每个可能的字符串值的符号都保证是唯一的。

回复收藏 0 原文

冷月断魂刀 2024-09-04 10:58:04

这些是我的意见，不是马茨的。出于这个答案的目的，当我说一种语言具有“不可变字符串”时，这意味着它的所有字符串都是不可变的，即无法创建可变的字符串。

“不可变字符串”设计将字符串视为标识符（例如，作为散列键和其他虚拟机内部用途）和数据存储结构。这个想法是，标识符可变是危险的。对我来说，这听起来违反了单一责任。在 Ruby 中，我们有标识符符号，因此字符串可以自由地充当数据存储。 Ruby 确实允许字符串作为哈希键，但我认为程序员很少会将字符串存储到变量中，将其用作哈希键，然后修改字符串。在程序员的心目中，字符串的两种用法是（或应该是）分开的。通常用作散列键的字符串是文字字符串，因此它被改变的可能性很小。使用字符串作为哈希键与使用两个字符串的数组作为哈希键没有太大区别。只要您的头脑很好地掌握了用作键的内容，那就没有问题。
从认知简单性的角度来看，使用字符串作为数据存储非常有用。只需考虑 Java 及其 StringBuffer。这是一种额外的数据结构（在一个已经很大且通常不直观的标准库中），如果您尝试执行字符串操作（例如在另一个字符串的某个索引处插入一个字符串），则必须对其进行管理。因此，一方面，Java 认识到需要执行此类操作，但由于不可变字符串暴露给程序员，因此他们必须引入另一种结构，以便操作仍然可以进行，而不需要我们重新发明轮子。这给程序员带来了额外的认知负担。
在Python中，似乎最简单的插入方法是获取插入点之前和之后的子字符串，然后将它们连接到要插入的字符串周围。我想他们可以轻松地向标准库添加一个方法来插入并返回一个新字符串。然而，如果调用该方法insert，初学者可能会认为它改变了字符串；为了具有描述性，它必须被称为 new_with_inserted 或类似的奇怪名称。在日常使用中，“插入”意味着您更改插入的内容（例如，将信封插入邮箱会更改邮箱的内容）。这再次提出了一个问题，“为什么我不能更改我的数据存储？”
Ruby 提供对象冻结功能，因此可以安全地传递它们，而不会引入细微的错误。好处是 Ruby 对待字符串就像对待任何其他数据结构（数组、散列、类实例）一样；它们都可以被冷冻。一致性对程序员来说是友好的。不可变字符串使字符串作为一种“特殊”数据结构脱颖而出，但实际上，如果您将其用作数据存储，则事实并非如此。

These are my opinions, not Matz's. For purposes of this answer, when I say that a language has "immutable strings", that means all its strings are immutable, i.e. there is no way to create a string that is mutable.

The "immutable string" design sees strings as both identifiers (e.g. as hash keys and other VM-internal uses) and data-storage structures. The idea is that it's dangerous for identifiers to be mutable. To me, this sounds like a violation of single-responsibility. In Ruby, we have symbol for identifiers, so strings are free to act as data stores. It's true that Ruby allows strings as hash keys, but I think it's rare for a programmer to store a string into a variable, use it as a hash key, then modify the string. In the programmer's mind, there is (or should be) a separation of 2 usages of strings. Often times a string used as a hash key is a literal string, so there is little chance of it being mutated. Using a string as a hash key is not much different from using an array of two strings as a hash key. As long as your mind has a good grasp on what you're using as a key, then there's no problem.
Having a string as a data-store is useful from a viewpoint of cognitive simplicity. Just consider Java and its StringBuffer. It's an extra data structure (in an already large and often unintuitive standard library) that you have to manage if you're trying to do string operations like inserting one string at a certain index of another string. So on the one hand, Java recognizes the need to do these kinds of operations, but because immutable strings are exposed to the programmer, they had to introduce another structure so the operations are still possible without making us reinvent the wheel. This puts extra cognitive load on the programmer.
In Python, it seems like the easiest way to insert is to grab the substrings before and after the insertion-point, then concatenate them around the to-be-inserted string. I suppose they could easily add a method to the standard library that inserts and returns a new string. However, if the method is called insert, beginners may think it mutates the string; to be descriptive it would have to be called new_with_inserted or something odd like that. In everyday usage, "inserting" meaning you change the contents of the things inserted into (e.g. inserting an envelope into a mailbox changes the contents of the mailbox). Again, this raises the question, "why can't I change my data store?"
Ruby provides freezing of objects, so they can be safely passed around without introducing subtle bugs. The nice thing is that Ruby treats strings just like any other data structure (arrays, hashes, class instances); they can all be frozen. Consistency is programmer-friendly. Immutable strings make strings stand out as a "special" data structure, when it's not really, if you use it as a data store.