为什么是“加入”？比普通串联更快？

发布于 2024-08-22 08:02:47 字数 254 浏览 7 评论 0原文

我见过来自不同语言的几个例子，它们明确地证明了连接列表（数组）的元素比仅仅连接字符串快很多倍。为什么？

在这两种操作下工作的内部算法是什么？为什么一种算法比另一种更快？

下面是一个 Python 示例来说明我的意思：

# This is slow
x = 'a'
x += 'b'
...
x += 'z'

# This is fast
x = ['a', 'b', ... 'z']
x = ''.join(x)

原文

I've seen several examples from different languages that unambiguously prove that joining elements of a list (array) is many times faster than just concatenating string. Why?

What is the inner algorithm that works under both operations and why is the one faster than another?

Here is a Python example of what I mean:

# This is slow
x = 'a'
x += 'b'
...
x += 'z'

# This is fast
x = ['a', 'b', ... 'z']
x = ''.join(x)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

顾铮苏瑾 2024-08-29 08:02:47

连接函数中的代码预先知道要求连接的所有字符串以及这些字符串有多大，因此它可以在开始操作之前计算最终的字符串长度。

因此，它只需要为最终字符串分配一次内存，然后就可以将每个源字符串（和分隔符）放置在内存中的正确位置。

另一方面，对字符串进行单个 += 操作别无选择，只能为最终字符串（即两个字符串的串联）分配足够的内存。后续的 += 必须执行相同的操作，每个分配的内存在下一个 += 上都将被丢弃。每次不断增长的字符串都会从内存中的一个位置复制到另一个位置。

回复收藏 0 原文

旧时光的容颜 2024-08-29 08:02:47

原因是 Python（以及许多其他语言）中的字符串是不可变对象 - 也就是说，一旦创建后，它们无法更改。相反，连接字符串实际上会生成一个新字符串，该字符串由连接的两个较小字符串的内容组成，然后用新字符串替换旧字符串。

由于创建字符串需要一定的时间（需要分配内存、将字符串的内容复制到该内存等），因此创建多个字符串比创建单个字符串需要更长的时间。进行 N 个串联需要在此过程中创建 N 个新字符串。另一方面，join() 只需创建一个字符串（最终结果），因此工作速度要快得多。

回复收藏 0 原文

平安喜乐 2024-08-29 08:02:47

这是因为必须为字符串连接分配越来越大的内存块：

x = 'a' # String of size 1 allocated
x += 'b' # String of size 2 allocated, x copied, and 'b' added. Old x discarded
x += 'b' # String of size 3 allocated, x copied, and 'c' added. Old x discarded
x += 'b' # String of size 4 allocated, x copied, and 'd' added. Old x discarded
x += 'b' # String of size 5 allocated, x copied, and 'e' added. Old x discarded

因此，会发生的情况是您执行大量分配和复制，但随后又将它们丢弃。非常浪费。

x = ['a', 'b', ..., 'z'] # 26 small allocations
x = ''.join(x) # A single, large allocation

This is because a larger and larger chunk of memory has to be allocated for the string concatenation:

x = 'a' # String of size 1 allocated
x += 'b' # String of size 2 allocated, x copied, and 'b' added. Old x discarded
x += 'b' # String of size 3 allocated, x copied, and 'c' added. Old x discarded
x += 'b' # String of size 4 allocated, x copied, and 'd' added. Old x discarded
x += 'b' # String of size 5 allocated, x copied, and 'e' added. Old x discarded

So what happens is you perform large allocations and copies, but then turn around and throw them away. Very wasteful.

x = ['a', 'b', ..., 'z'] # 26 small allocations
x = ''.join(x) # A single, large allocation

回复收藏 0 原文

浮光之海 2024-08-29 08:02:47

请参阅Python字符串连接性能以及一个非常描述它的具体答案出色地：

建议是关于连接大量字符串。
计算 s = s1 + s2 + ... + sn，
使用+。创建一个新的字符串 s1+s2，然后创建一个新的字符串 s1+s2+s3，...，等等，因此涉及大量的内存分配和复制操作。事实上，s1 被复制了 n-1 次，s2 被复制了 n-2 次，...，等等
使用“”.join([s1,s2,...,sn])。连接是一次性完成的，字符串中的每个字符仅复制一次。