字符串类型不可变的非技术好处

发布于 2024-09-16 03:30:13 字数 771 浏览 10 评论 0原文

我想知道从程序员的角度来看，字符串类型不可变的好处。

技术优势（在编译器/语言方面）可以概括为，如果类型是不可变的，则更容易进行优化。请阅读此处了解相关问题。

另外，在可变字符串类型中，要么已经内置了线程安全性（话又说回来，优化更难做到），要么必须自己做。在任何情况下，您都可以选择使用具有内置线程安全性的可变字符串类型，因此这并不是不可变字符串类型的真正优势。（同样，进行处理和优化以确保不可变类型的线程安全性会更容易，但这不是这里的重点。）

但是不可变字符串类型在使用中有什么好处呢？让某些类型不可变而另一些类型不可变有什么意义呢？这对我来说似乎非常不一致。

在 C++ 中，如果我想让某个字符串不可变，我会将其作为 const 引用传递给函数 (const std::string&)。如果我想要原始字符串的可更改副本，我会将其作为 std::string 传递。仅当我想让它可变时，我才会将其作为引用传递（std::string&）。所以我只能选择我想做的事情。我可以用所有可能的类型来做到这一点。

在 Python 或 Java 中，某些类型是不可变的（主要是所有原始类型和字符串），其他类型则不是。

在像 Haskell 这样的纯函数式语言中，一切都是不可变的。

存在这种不一致是否有充分的理由？或者仅仅是出于技术较低级别的原因？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月亮邮递员 2024-09-23 03:30:13

拥有一些有什么意义？
类型是不可变的，而其他类型不是？

如果没有一些可变类型，您就必须全力以赴地进行纯函数式编程——这是一种与目前最流行的面向对象编程和过程式方法完全不同的范式，虽然非常强大，但显然对很多程序员来说非常具有挑战性（当你确实需要一种没有任何东西可变的语言的副作用时，会发生什么，在现实世界的编程中，当然你不可避免地会这样做，这是挑战的一部分 -例如，Haskell 的 Monad 是一种非常优雅的方法，但是您知道有多少程序员完全、自信地理解它们并可以使用它们以及典型的 OOP 结构？-)。

如果您不了解拥有多种可用范式的巨大价值（FP 范式和都严重依赖于可变数据），我建议您学习 Haridi 和 Van Roy 的杰作，计算机编程的概念、技术和模型 -- "a 21 世纪的 SICP"，正如我曾经描述的那样；-)。

大多数程序员，无论是否熟悉 Haridi 和 Van Roy，都会欣然承认至少拥有一些可变数据类型对他们来说很重要。尽管我在上面引用了你的问题中的一句话，它采取了完全不同的观点，但我相信这也可能是你困惑的根源：不是“为什么每个”，而是“为什么”一些不可变根本”。

“完全可变”的方法曾经（偶然）在 Fortran 实现中获得。比如说，如果你有

  SUBROUTINE ZAP(I)
  I = 0
  RETURN

一个程序片段，例如，

  PRINT 23
  ZAP(23)
  PRINT 23

会打印 23，然后打印 0——数字 23 已经被改变，所以程序其余部分中对 23 的所有引用实际上都会参考 0。从技术上来说，这不是编译器中的错误：Fortran 对于程序在将常量与变量传递给分配给其参数的过程时不允许执行的操作有微妙的规则，并且此代码片段违反了那些鲜为人知的规则，不可编译器强制执行的规则，因此它是程序中的“但是”，而不是编译器中的“但是”。当然，在实践中，这种方式导致的错误数量高得令人无法接受，因此典型的编译器很快就会在这种情况下转向破坏性较小的行为（如果操作系统支持，则将常量放入只读段中以获得运行时错误；或者，传递常量的新副本而不是常量本身，尽管有开销等等），尽管从技术上讲它们是程序错误，允许编译器相当“正确”地显示未定义的行为;-) 。

在其他一些语言中强制执行的替代方案是增加多种参数传递方式的复杂性——最明显的是在 C++ 中，通过值、通过引用、通过常量引用、通过指针、通过常量指针，...当然，你会看到程序员对诸如 const foo* const bar 之类的声明感到困惑（如果 bar 是一个，那么最右边的 const 基本上是不相关的）某个函数的参数...但如果 bar 是一个局部变量，则至关重要...！-）。

实际上，Algol-68 可能沿着这个方向走得更远（如果你可以有一个值和一个引用，为什么不是引用到引用？或者引用到引用到引用？&c——Algol 68 对此没有任何限制，并且定义正在发生的事情的规则可能是在“真正使用的”编程语言中发现的最微妙、最困难的组合）。早期的 C（只有按值和按显式指针——没有 const，没有引用，没有复杂性）无疑在一定程度上是对它的反应，原始的 Pascal 也是如此。但 const 很快就出现了，复杂性又开始增加。

Java 和 Python（以及其他语言）以强大的简单大砍刀冲破了这个丛林：所有参数传递，和所有赋值，都是“通过对象引用”（从不引用变量或其他引用，决不进行语义隐含的复制，&c)。将（至少）数字定义为语义上不可变可以避免“哎呀”（例如上面的 Fortran 代码所展示的情况），从而保持程序员的理智（以及语言简单性的这一宝贵方面）。

将字符串视为像数字一样的基元，与语言预期的高语义级别非常一致，因为在现实生活中，我们确实需要像数字一样易于使用的字符串；诸如将字符串定义为字符列表（Haskell）或字符数组（C）之类的替代方案对编译器（在此类语义下保持高效性能）和程序员（有效地忽略这种任意结构以使字符串的使用变得简单）都提出了挑战原语，正如现实生活中的编程经常需要的那样）。

Python 更进一步，添加了一个简单的不可变容器（tuple）并将散列与“有效的不变性”联系起来（这避免了程序员发现的某些意外，例如，在 Perl 中，其散列允许可变字符串作为键）——为什么不呢？一旦你拥有了不变性（一个宝贵的概念，它可以让程序员不必学习 N 种不同的赋值和参数传递语义，其中 N 会随着时间的推移而增加;-)，你可能会从中获得全部收益;-) 。

What is the point of having some
types immutable and others not?

Without some mutable types, you'd have to go the whole hog to pure functional programming -- a completely different paradigm than the OOP and procedural approaches which are currently most popular, and, while extremely powerful, apparently very challenging to a lot of programmers (what happens when you do need side effects in a language where nothing is mutable, and in real-world programming of course you inevitably do, is part of the challenge -- Haskell's Monads are a very elegant approach, for example, but how many programmers do you know that fully and confidently understand them and can use them as well as typical OOP constructs?-).

If you don't understand the enormous value of having multiple paradigms available (both FP one and ones crucially relying on mutable data), I recommend studying Haridi's and Van Roy's masterpiece, Concepts, Techniques, and Models of Computer Programming -- "a SICP for the 21st Century", as I once described it;-).

Most programmers, whether familiar with Haridi and Van Roy or not, will readily admit that having at least some mutable data types is important to them. Despite the sentence I've quoted above from your Q, which takes a completely different viewpoint, I believe that may also be the root of your perplexity: not "why some of each", but rather "why some immutables at all".

The "thoroughly mutable" approach was once (accidentally) obtained in a Fortran implementation. If you had, say,

  SUBROUTINE ZAP(I)
  I = 0
  RETURN

then a program snippet doing, e.g.,

  PRINT 23
  ZAP(23)
  PRINT 23

would print 23, then 0 -- the number 23 had been mutated, so all references to 23 in the rest of the program would in fact refer to 0. Not a bug in the compiler, technically: Fortran had subtle rules about what your program is and is not allowed to do in passing constants vs variables to procedures that assign to their arguments, and this snippet violates those little-known, non-compiler-enforceable rules, so it's a but in the program, not in the compiler. In practice, of course, the number of bugs caused this way was unacceptably high, so typical compilers soon switched to less destructive behavior in such situations (putting constants in read-only segments to get a runtime error, if the OS supported that; or, passing a fresh copy of the constant rather than the constant itself, despite the overhead; and so forth) even though technically they were program bugs allowing the compiler to display undefined behavior quite "correctly";-).

The alternative enforced in some other languages is to add the complication of multiple ways of parameter passing -- most notably perhaps in C++, what with by-value, by-reference, by constant reference, by pointer, by constant pointer, ... and then of course you see programmers baffled by declarations such as const foo* const bar (where the rightmost const is basically irrelevant if bar is an argument to some function... but crucial instead if bar is a local variable...!-).

Actually Algol-68 probably went farther along this direction (if you can have a value and a reference, why not a reference to a reference? or reference to reference to reference? &c -- Algol 68 put no limitations on this, and the rules to define what was going on are perhaps the subtlest, hardest mix ever found in an "intended for real use" programming language). Early C (which only had by-value and by-explicit-pointer -- no const, no references, no complications) was no doubt in part a reaction to it, as was the original Pascal. But const soon crept in, and complications started mounting again.

Java and Python (among other languages) cut through this thicket with a powerful machete of simplicity: all argument passing, and all assignment, is "by object reference" (never reference to a variable or other reference, never semantically implicit copies, &c). Defining (at least) numbers as semantically immutable preserves programmers' sanity (as well as this precious aspect of language simplicity) by avoiding "oopses" such as that exhibited by the Fortran code above.

Treating strings as primitives just like numbers is quite consistent with the languages' intended high semantic level, because in real life we do need strings that are just as simple to use as numbers; alternatives such as defining strings as lists of characters (Haskell) or as arrays of characters (C) poses challenges to both the compiler (keeping efficient performance under such semantics) and the programmer (effectively ignoring this arbitrary structuring to enable use of strings as simple primitives, as real life programming often requires).

Python went a bit further by adding a simple immutable container (tuple) and tying hashing to "effective immutability" (which avoids certain surprises to the programmer that are found, e.g., in Perl, with its hashes allowing mutable strings as keys) -- and why not? Once you have immutability (a precious concept that saves the programmer from having to learn about N different semantics for assignment and argument passing, with N tending to increase with time;-), you might as well get full mileage out of it;-).

回复收藏 0 原文

遗弃Ｍ 2024-09-23 03:30:13

不过，我不确定这是否符合非技术性的要求：如果字符串是可变的，那么大多数（*）集合需要制作其字符串键的私有副本。

否则，“foo”键从外部更改为“bar”将导致“bar”位于集合的内部结构中应为“foo”的位置。这样，“foo”查找将找到“bar”，这是一个较小的问题（不返回任何内容，重新索引有问题的键），但“bar”查找将找不到任何内容，这是一个更大的问题。

(*) 在每次查找时对所有键进行线性扫描的哑集合不必这样做，因为它自然会适应键的更改。

回复收藏 0 原文

聽兲甴掵 2024-09-23 03:30:13

没有任何根本性理由不让字符串可变。我对它们的不变性找到的最好的解释是，它促进了一种功能更强大、副作用更少的编程方式。这最终变得更干净、更优雅、更Pythonic。

从语义上讲，它们应该是不可变的，不是吗？字符串 "hello" 应始终表示 "hello"。你无法改变它，就像你无法改变数字三一样！

回复收藏 0 原文

末骤雨初歇 2024-09-23 03:30:13

不确定您是否会将其视为“技术低级”优势，但不可变字符串隐式线程安全这一事实可以为您节省大量线程安全编码工作。

有点玩具的例子...

线程 A - 检查登录名 FOO 的用户是否有权执行某些操作，返回 true

线程 B - 将用户字符串修改为登录名 BAR

线程 A - 由于先前的权限检查通过而使用登录名 BAR 执行某些操作FOO。

字符串无法更改这一事实可以帮助您避免这种情况的发生。

回复收藏 0 原文

夜访吸血鬼 2024-09-23 03:30:13

如果你想要完全一致性，你只能让一切都不可变，因为可变的布尔或整数根本没有任何意义。事实上，一些函数式语言就是这样做的。

Python 的哲学是“简单胜于复杂”。在 C 语言中，您需要意识到字符串可能会发生变化，并考虑这会对您产生怎样的影响。 Python 假定字符串的默认用例是“将文本放在一起”——您完全不需要了解任何关于字符串的知识就可以做到这一点。但是，如果您想要更改字符串，则只需使用更合适的类型（即列表、StringIO、模板等）。

回复收藏 0 原文

酒浓于脸红 2024-09-23 03:30:13

在具有用户定义类型的引用语义的语言中，拥有可变字符串将是一场灾难，因为每次分配字符串变量时，都会为可变字符串对象添加别名，并且必须在各处进行防御性副本。这就是为什么字符串在 Java 和 C# 中是不可变的——如果字符串对象是不可变的，那么有多少变量指向它并不重要。

请注意，在 C++ 中，两个字符串变量从不共享状态（至少在概念上——从技术上讲，可能会发生写时复制，但由于多线程中的效率低下，这种情况已经过时了）线程场景）。

回复收藏 0 原文

纵山崖 2024-09-23 03:30:13

如果字符串是可变的，那么字符串的许多使用者将不得不复制它。如果字符串是不可变的，那么这就不那么重要了（除非不变性是由硬件互锁强制执行的，对于一些具有安全意识的字符串使用者来说，制作自己的副本可能不是一个坏主意，以防他们给出的字符串不可用。它们应该是不可变的）。

StringBuilder 类非常好，但我认为如果它有一个“Value”属性会更好（读取相当于 ToString，但它会显示在对象检查器中；写入将允许直接设置整个内容）并且到字符串的默认加宽转换。从理论上讲，如果 MutableString 类型是从 String 的共同祖先继承下来的，那就太好了，因此可以将可变字符串传递给一个不关心字符串是否可变的函数，尽管我怀疑依赖于这一事实的优化字符串有一定的固定实现，效果会较差。

回复收藏 0 原文