Python 的字符串连接与 str.join 相比有多慢?

发布于 2024-09-06 13:43:32 字数 220 浏览 2 评论 0原文

由于我在此线程,我想知道+=操作符和''.join()的速度差异是多少,

那么两者的速度对比是怎样的呢?

As a result of the comments in my answer on this thread, I wanted to know what the speed difference is between the += operator and ''.join()

So what is the speed comparison between the two?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

落日海湾 2024-09-13 13:43:32

来自:高效字符串连接

方法1:

def method1():
  out_str = ''
  for num in xrange(loop_count):
    out_str += 'num'
  return out_str

方法4:< /strong>

def method4():
  str_list = []
  for num in xrange(loop_count):
    str_list.append('num')
  return ''.join(str_list)

现在我意识到它们并不严格具有代表性,第四种方法在迭代和加入每个项目之前附加到列表,但这是一个公平的指示。

字符串连接比串联要快得多。

为什么?字符串是不可变的,不能就地更改。要改变其中一个,需要创建一个新的表示(两者的串联)。

替代文本

From: Efficient String Concatenation

Method 1:

def method1():
  out_str = ''
  for num in xrange(loop_count):
    out_str += 'num'
  return out_str

Method 4:

def method4():
  str_list = []
  for num in xrange(loop_count):
    str_list.append('num')
  return ''.join(str_list)

Now I realise they are not strictly representative, and the 4th method appends to a list before iterating through and joining each item, but it's a fair indication.

String join is significantly faster then concatenation.

Why? Strings are immutable and can't be changed in place. To alter one, a new representation needs to be created (a concatenation of the two).

alt text

冰之心 2024-09-13 13:43:32

注意: 这个基准测试是非正式的,需要重做,因为它没有全面展示这些方法如何在更实际的长字符串上执行。正如 @Mark Amery 的评论中提到的,+= 不如可靠地使用 f-strings 和 str# join 在实际用例中并没有那么慢。

由于后续 CPython 版本(尤其是 3.11)引入的显着性能改进,这些指标也可能已经过时。


现有的答案写得很好并且经过研究,但这是 Python 3.6 时代的另一个答案,因为现在我们有了 文字字符串插值(又称为 f-strings):

>>> import timeit
>>> timeit.timeit('f\'{"a"}{"b"}{"c"}\'', number=1000000)
0.14618930302094668
>>> timeit.timeit('"".join(["a", "b", "c"])', number=1000000)
0.23334730707574636
>>> timeit.timeit('a = "a"; a += "b"; a += "c"', number=1000000)
0.14985873899422586

使用 CPython 3.6.5 在配备 Intel Core i7、2.3 GHz 的 2012 Retina MacBook Pro 上执行测试。

Note: This benchmark was informal and is due to be redone because it doesn't show a full picture of how these methods will perform with more realistically long strings. As mentioned in the comments by @Mark Amery, += is not reliably as fast as using f-strings, and str#join isn't as dramatically slower in realistic use cases.

These metrics are also likely outdated by the significant performance improvements introduced by subsequent CPython versions, and most notably, 3.11.


The existing answers are very well-written and researched, but here's another answer for the Python 3.6 era, since now we have literal string interpolation (AKA, f-strings):

>>> import timeit
>>> timeit.timeit('f\'{"a"}{"b"}{"c"}\'', number=1000000)
0.14618930302094668
>>> timeit.timeit('"".join(["a", "b", "c"])', number=1000000)
0.23334730707574636
>>> timeit.timeit('a = "a"; a += "b"; a += "c"', number=1000000)
0.14985873899422586

Test performed using CPython 3.6.5 on a 2012 Retina MacBook Pro with an Intel Core i7 at 2.3 GHz.

花开柳相依 2024-09-13 13:43:32

我原来的代码是错误的,看来 + 连接通常更快(特别是在较新的硬件上使用较新版本的 Python)

时间如下:

Iterations: 1,000,000       

Windows 7 上的 Python 3.3,

String of len:   1 took:     0.5710     0.2880 seconds
String of len:   4 took:     0.9480     0.5830 seconds
String of len:   6 took:     1.2770     0.8130 seconds
String of len:  12 took:     2.0610     1.5930 seconds
String of len:  80 took:    10.5140    37.8590 seconds
String of len: 222 took:    27.3400   134.7440 seconds
String of len: 443 took:    52.9640   170.6440 seconds

Windows 上的 Core i7 Python 2.7 7、Core i7

String of len:   1 took:     0.7190     0.4960 seconds
String of len:   4 took:     1.0660     0.6920 seconds
String of len:   6 took:     1.3300     0.8560 seconds
String of len:  12 took:     1.9980     1.5330 seconds
String of len:  80 took:     9.0520    25.7190 seconds
String of len: 222 took:    23.1620    71.3620 seconds
String of len: 443 took:    44.3620   117.1510 seconds

On Linux Mint、Python 2.7、一些较慢的处理器

String of len:   1 took:     1.8840     1.2990 seconds
String of len:   4 took:     2.8394     1.9663 seconds
String of len:   6 took:     3.5177     2.4162 seconds
String of len:  12 took:     5.5456     4.1695 seconds
String of len:  80 took:    27.8813    19.2180 seconds
String of len: 222 took:    69.5679    55.7790 seconds
String of len: 443 took:   135.6101   153.8212 seconds

这是代码:

from __future__ import print_function
import time

def strcat(string):
    newstr = ''
    for char in string:
        newstr += char
    return newstr

def listcat(string):
    chars = []
    for char in string:
        chars.append(char)
    return ''.join(chars)

def test(fn, times, *args):
    start = time.time()
    for x in range(times):
        fn(*args)
    return "{:>10.4f}".format(time.time() - start)

def testall():
    strings = ['a', 'long', 'longer', 'a bit longer', 
               '''adjkrsn widn fskejwoskemwkoskdfisdfasdfjiz  oijewf sdkjjka dsf sdk siasjk dfwijs''',
               '''this is a really long string that's so long
               it had to be triple quoted  and contains lots of
               superflous characters for kicks and gigles
               @!#(*_#)(*$(*!#@&)(*E\xc4\x32\xff\x92\x23\xDF\xDFk^%#$!)%#^(*#''',
              '''I needed another long string but this one won't have any new lines or crazy characters in it, I'm just going to type normal characters that I would usually write blah blah blah blah this is some more text hey cool what's crazy is that it looks that the str += is really close to the O(n^2) worst case performance, but it looks more like the other method increases in a perhaps linear scale? I don't know but I think this is enough text I hope.''']

    for string in strings:
        print("String of len:", len(string), "took:", test(listcat, 1000000, string), test(strcat, 1000000, string), "seconds")

testall()

My original code was wrong, it appears that + concatenation is usually faster (especially with newer versions of Python on newer hardware)

The times are as follows:

Iterations: 1,000,000       

Python 3.3 on Windows 7, Core i7

String of len:   1 took:     0.5710     0.2880 seconds
String of len:   4 took:     0.9480     0.5830 seconds
String of len:   6 took:     1.2770     0.8130 seconds
String of len:  12 took:     2.0610     1.5930 seconds
String of len:  80 took:    10.5140    37.8590 seconds
String of len: 222 took:    27.3400   134.7440 seconds
String of len: 443 took:    52.9640   170.6440 seconds

Python 2.7 on Windows 7, Core i7

String of len:   1 took:     0.7190     0.4960 seconds
String of len:   4 took:     1.0660     0.6920 seconds
String of len:   6 took:     1.3300     0.8560 seconds
String of len:  12 took:     1.9980     1.5330 seconds
String of len:  80 took:     9.0520    25.7190 seconds
String of len: 222 took:    23.1620    71.3620 seconds
String of len: 443 took:    44.3620   117.1510 seconds

On Linux Mint, Python 2.7, some slower processor

String of len:   1 took:     1.8840     1.2990 seconds
String of len:   4 took:     2.8394     1.9663 seconds
String of len:   6 took:     3.5177     2.4162 seconds
String of len:  12 took:     5.5456     4.1695 seconds
String of len:  80 took:    27.8813    19.2180 seconds
String of len: 222 took:    69.5679    55.7790 seconds
String of len: 443 took:   135.6101   153.8212 seconds

And here is the code:

from __future__ import print_function
import time

def strcat(string):
    newstr = ''
    for char in string:
        newstr += char
    return newstr

def listcat(string):
    chars = []
    for char in string:
        chars.append(char)
    return ''.join(chars)

def test(fn, times, *args):
    start = time.time()
    for x in range(times):
        fn(*args)
    return "{:>10.4f}".format(time.time() - start)

def testall():
    strings = ['a', 'long', 'longer', 'a bit longer', 
               '''adjkrsn widn fskejwoskemwkoskdfisdfasdfjiz  oijewf sdkjjka dsf sdk siasjk dfwijs''',
               '''this is a really long string that's so long
               it had to be triple quoted  and contains lots of
               superflous characters for kicks and gigles
               @!#(*_#)(*$(*!#@&)(*E\xc4\x32\xff\x92\x23\xDF\xDFk^%#$!)%#^(*#''',
              '''I needed another long string but this one won't have any new lines or crazy characters in it, I'm just going to type normal characters that I would usually write blah blah blah blah this is some more text hey cool what's crazy is that it looks that the str += is really close to the O(n^2) worst case performance, but it looks more like the other method increases in a perhaps linear scale? I don't know but I think this is enough text I hope.''']

    for string in strings:
        print("String of len:", len(string), "took:", test(listcat, 1000000, string), test(strcat, 1000000, string), "seconds")

testall()
硪扪都還晓 2024-09-13 13:43:32

如果我期望得好,对于一个包含 k 个字符串、总共 n 个字符的列表,连接的时间复杂度应该是 O(nlogk),而经典串联的时间复杂度应该是 O(nk)。

这与合并 k 个排序列表的相对成本相同(有效的方法是 O(nlkg) ,而简单的方法,类似于串联是 O(nk) )。

If I expect well, for a list with k string, with n characters in total, time complexity of join should be O(nlogk) while time complexity of classic concatenation should be O(nk).

That would be the same relative costs as merging k sorted list (efficient method is O(nlkg), while the simple one, akin to concatenation is O(nk) ).

一曲爱恨情仇 2024-09-13 13:43:32

如果我从算法上来说,如果你选择 [ += ],那么它会生成一个新对象,并且时间复杂度为 O(n)**2。但如果你使用[.join]那么它将是O(n)。

If I say it algorithmically, if you choose [ += ] then it generates a new object and it will be O(n)**2. But if you use [ .join ] then it will be O(n).

他是夢罘是命 2024-09-13 13:43:32

我重写了上一个答案,请问您可以分享您对我测试方式的看法吗?

import time

start1 = time.clock()
for x in range (10000000):
    dog1 = ' and '.join(['spam', 'eggs', 'spam', 'spam', 'eggs', 'spam','spam', 'eggs', 'spam', 'spam', 'eggs', 'spam'])

end1 = time.clock()
print("Time to run Joiner = ", end1 - start1, "seconds")


start2 = time.clock()
for x in range (10000000):
    dog2 = 'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'

end2 = time.clock()
print("Time to run + = ", end2 - start2, "seconds")

注意:此示例是用 Python 3.5 编写的,其中 range() 的作用类似于以前的 xrange()

我得到的输出:

Time to run Joiner =  27.086106206103153 seconds
Time to run + =  69.79100515996426 seconds

就我个人而言,我更喜欢 ''.join([]) 而不是 'Plusser way',因为它更干净且更具可读性。

I rewrote the last answer, could jou please share your opinion on the way i tested?

import time

start1 = time.clock()
for x in range (10000000):
    dog1 = ' and '.join(['spam', 'eggs', 'spam', 'spam', 'eggs', 'spam','spam', 'eggs', 'spam', 'spam', 'eggs', 'spam'])

end1 = time.clock()
print("Time to run Joiner = ", end1 - start1, "seconds")


start2 = time.clock()
for x in range (10000000):
    dog2 = 'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'

end2 = time.clock()
print("Time to run + = ", end2 - start2, "seconds")

NOTE: This example is written in Python 3.5, where range() acts like the former xrange()

The output i got:

Time to run Joiner =  27.086106206103153 seconds
Time to run + =  69.79100515996426 seconds

Personally i prefer ''.join([]) over the 'Plusser way' because it's cleaner and more readable.

痞味浪人 2024-09-13 13:43:32

这就是愚蠢的程序旨在测试的:)

使用 plus

import time

if __name__ == '__main__':
    start = time.clock()
    for x in range (1, 10000000):
        dog = "a" + "b"

    end = time.clock()
    print "Time to run Plusser = ", end - start, "seconds"

输出:

Time to run Plusser =  1.16350010965 seconds

现在使用 join....

import time
if __name__ == '__main__':
    start = time.clock()
    for x in range (1, 10000000):
        dog = "a".join("b")

    end = time.clock()
    print "Time to run Joiner = ", end - start, "seconds"

输出:

Time to run Joiner =  21.3877386651 seconds

所以在 Windows 上的 python 2.6 上,我会说 + 比 join 快大约 18 倍:)

This is what silly programs are designed to test :)

Use plus

import time

if __name__ == '__main__':
    start = time.clock()
    for x in range (1, 10000000):
        dog = "a" + "b"

    end = time.clock()
    print "Time to run Plusser = ", end - start, "seconds"

Output of:

Time to run Plusser =  1.16350010965 seconds

Now with join....

import time
if __name__ == '__main__':
    start = time.clock()
    for x in range (1, 10000000):
        dog = "a".join("b")

    end = time.clock()
    print "Time to run Joiner = ", end - start, "seconds"

Output Of:

Time to run Joiner =  21.3877386651 seconds

So on python 2.6 on windows, I would say + is about 18 times faster than join :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文