Python 中的字符串连接与字符串替换

发布于 2024-07-10 03:07:20 字数 450 浏览 7 评论 0原文

在 Python 中，我不知道何时何地使用字符串连接与字符串替换。由于字符串连接在性能方面有了很大的提升，这是（变得更加）一种风格决定而不是实际决定吗？

举一个具体的例子，应该如何处理灵活 URI 的构造：

DOMAIN = 'http://stackoverflow.com'
QUESTIONS = '/questions'

def so_question_uri_sub(q_num):
    return "%s%s/%d" % (DOMAIN, QUESTIONS, q_num)

def so_question_uri_cat(q_num):
    return DOMAIN + QUESTIONS + '/' + str(q_num)

编辑：还有关于加入字符串列表和使用命名替换的建议。这些都是中心主题的变体，即，什么时候做这件事的正确方法？感谢您的回复！

原文

In Python, the where and when of using string concatenation versus string substitution eludes me. As the string concatenation has seen large boosts in performance, is this (becoming more) a stylistic decision rather than a practical one?

For a concrete example, how should one handle construction of flexible URIs:

DOMAIN = 'http://stackoverflow.com'
QUESTIONS = '/questions'

def so_question_uri_sub(q_num):
    return "%s%s/%d" % (DOMAIN, QUESTIONS, q_num)

def so_question_uri_cat(q_num):
    return DOMAIN + QUESTIONS + '/' + str(q_num)

Edit: There have also been suggestions about joining a list of strings and for using named substitution. These are variants on the central theme, which is, which way is the Right Way to do it at which time? Thanks for the responses!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花辞树 2024-07-17 03:07:20

根据我的机器，串联速度（明显）更快。但从风格上来说，如果性能并不重要，我愿意付出替代的代价。好吧，如果我需要格式化，甚至不需要问这个问题......除了使用插值/模板之外别无选择。

>>> import timeit
>>> def so_q_sub(n):
...  return "%s%s/%d" % (DOMAIN, QUESTIONS, n)
...
>>> so_q_sub(1000)
'http://stackoverflow.com/questions/1000'
>>> def so_q_cat(n):
...  return DOMAIN + QUESTIONS + '/' + str(n)
...
>>> so_q_cat(1000)
'http://stackoverflow.com/questions/1000'
>>> t1 = timeit.Timer('so_q_sub(1000)','from __main__ import so_q_sub')
>>> t2 = timeit.Timer('so_q_cat(1000)','from __main__ import so_q_cat')
>>> t1.timeit(number=10000000)
12.166618871951641
>>> t2.timeit(number=10000000)
5.7813972166853773
>>> t1.timeit(number=1)
1.103492206766532e-05
>>> t2.timeit(number=1)
8.5206360154188587e-06

>>> def so_q_tmp(n):
...  return "{d}{q}/{n}".format(d=DOMAIN,q=QUESTIONS,n=n)
...
>>> so_q_tmp(1000)
'http://stackoverflow.com/questions/1000'
>>> t3= timeit.Timer('so_q_tmp(1000)','from __main__ import so_q_tmp')
>>> t3.timeit(number=10000000)
14.564135316080637

>>> def so_q_join(n):
...  return ''.join([DOMAIN,QUESTIONS,'/',str(n)])
...
>>> so_q_join(1000)
'http://stackoverflow.com/questions/1000'
>>> t4= timeit.Timer('so_q_join(1000)','from __main__ import so_q_join')
>>> t4.timeit(number=10000000)
9.4431309007150048

Concatenation is (significantly) faster according to my machine. But stylistically, I'm willing to pay the price of substitution if performance is not critical. Well, and if I need formatting, there's no need to even ask the question... there's no option but to use interpolation/templating.

>>> import timeit
>>> def so_q_sub(n):
...  return "%s%s/%d" % (DOMAIN, QUESTIONS, n)
...
>>> so_q_sub(1000)
'http://stackoverflow.com/questions/1000'
>>> def so_q_cat(n):
...  return DOMAIN + QUESTIONS + '/' + str(n)
...
>>> so_q_cat(1000)
'http://stackoverflow.com/questions/1000'
>>> t1 = timeit.Timer('so_q_sub(1000)','from __main__ import so_q_sub')
>>> t2 = timeit.Timer('so_q_cat(1000)','from __main__ import so_q_cat')
>>> t1.timeit(number=10000000)
12.166618871951641
>>> t2.timeit(number=10000000)
5.7813972166853773
>>> t1.timeit(number=1)
1.103492206766532e-05
>>> t2.timeit(number=1)
8.5206360154188587e-06

>>> def so_q_tmp(n):
...  return "{d}{q}/{n}".format(d=DOMAIN,q=QUESTIONS,n=n)
...
>>> so_q_tmp(1000)
'http://stackoverflow.com/questions/1000'
>>> t3= timeit.Timer('so_q_tmp(1000)','from __main__ import so_q_tmp')
>>> t3.timeit(number=10000000)
14.564135316080637

>>> def so_q_join(n):
...  return ''.join([DOMAIN,QUESTIONS,'/',str(n)])
...
>>> so_q_join(1000)
'http://stackoverflow.com/questions/1000'
>>> t4= timeit.Timer('so_q_join(1000)','from __main__ import so_q_join')
>>> t4.timeit(number=10000000)
9.4431309007150048

回复收藏 0 原文

半世晨晓 2024-07-17 03:07:20

不要忘记命名替换：

def so_question_uri_namedsub(q_num):
    return "%(domain)s%(questions)s/%(q_num)d" % locals()

Don't forget about named substitution:

def so_question_uri_namedsub(q_num):
    return "%(domain)s%(questions)s/%(q_num)d" % locals()

回复收藏 0 原文

蓝眼睛不忧郁 2024-07-17 03:07:20

小心在循环中连接字符串！字符串连接的成本与结果的长度成正比。循环将直接带您进入 N 平方区域。某些语言会优化与最近分配的字符串的串联，但指望编译器将二次算法优化为线性是有风险的。最好使用原语（join？），它接受整个字符串列表，进行一次分配，然后一次性将它们全部连接起来。

回复收藏 0 原文

够钟 2024-07-17 03:07:20

“由于字符串连接大大提高了性能......”

如果性能很重要，那么了解这一点是件好事。

然而，我见过的性能问题从未归结为字符串操作。我通常会因为 I/O、排序和 O(n²) 操作成为瓶颈而遇到麻烦。

在字符串操作成为性能限制因素之前，我将坚持使用显而易见的内容。大多数情况下，当它是一行或更少时，这是替换，当它有意义时，是串联，当它很大时，是一个模板工具（如 Mako）。

回复收藏 0 原文

无言温柔 2024-07-17 03:07:20

您想要连接/插值的内容以及您想要如何格式化结果应该决定您的决定。

字符串插值允许您轻松添加格式。事实上，您的字符串插值版本与串联版本所做的事情并不相同。它实际上在 q_num 参数之前添加了一个额外的正斜杠。要执行相同的操作，您必须在该示例中编写 return DOMAIN + QUESTIONS + "/" + str(q_num)。
插值可以更轻松地格式化数字； “%d of %d (%2.2f%%)” %（当前、总计、总计/当前） 在串联形式中可读性要差得多。
当您没有固定数量的要字符串化的项目时，串联非常有用。

另外，要知道 Python 2.6 引入了新版本的字符串插值，称为字符串模板：

def so_question_uri_template(q_num):
    return "{domain}/{questions}/{num}".format(domain=DOMAIN,
                                               questions=QUESTIONS,
                                               num=q_num)

字符串模板预计最终会取代 %-interpolation，但我认为这在很长一段时间内不会发生。

What you want to concatenate/interpolate and how you want to format the result should drive your decision.

String interpolation allows you to easily add formatting. In fact, your string interpolation version doesn't do the same thing as your concatenation version; it actually adds an extra forward slash before the q_num parameter. To do the same thing, you would have to write return DOMAIN + QUESTIONS + "/" + str(q_num) in that example.
Interpolation makes it easier to format numerics; "%d of %d (%2.2f%%)" % (current, total, total/current) would be much less readable in concatenation form.
Concatenation is useful when you don't have a fixed number of items to string-ize.

Also, know that Python 2.6 introduces a new version of string interpolation, called string templating:

def so_question_uri_template(q_num):
    return "{domain}/{questions}/{num}".format(domain=DOMAIN,
                                               questions=QUESTIONS,
                                               num=q_num)

String templating is slated to eventually replace %-interpolation, but that won't happen for quite a while, I think.

回复收藏 0 原文

泪是无色的血 2024-07-17 03:07:20

我只是出于好奇而测试不同字符串连接/替换方法的速度。关于这个主题的谷歌搜索将我带到了这里。我想我应该发布我的测试结果，希望它可以帮助别人做出决定。

    import timeit
    def percent_():
            return "test %s, with number %s" % (1,2)

    def format_():
            return "test {}, with number {}".format(1,2)

    def format2_():
            return "test {1}, with number {0}".format(2,1)

    def concat_():
            return "test " + str(1) + ", with number " + str(2)

    def dotimers(func_list):
            # runs a single test for all functions in the list
            for func in func_list:
                    tmr = timeit.Timer(func)
                    res = tmr.timeit()
                    print "test " + func.func_name + ": " + str(res)

    def runtests(func_list, runs=5):
            # runs multiple tests for all functions in the list
            for i in range(runs):
                    print "----------- TEST #" + str(i + 1)
                    dotimers(func_list)

...运行 runtests((percent_, format_, format2_, concat_), runs=5) 后，我发现 % 方法在这些小字符串上的速度大约是其他方法的两倍。 concat 方法始终是最慢的（勉强）。在 format() 方法中切换位置时存在非常微小的差异，但切换位置总是比常规格式方法慢至少 0.01。

测试结果示例：

    test concat_()  : 0.62  (0.61 to 0.63)
    test format_()  : 0.56  (consistently 0.56)
    test format2_() : 0.58  (0.57 to 0.59)
    test percent_() : 0.34  (0.33 to 0.35)

我运行这些是因为我在脚本中使用了字符串连接，我想知道成本是多少。我以不同的顺序运行它们，以确保没有任何干扰，或者首先或最后获得更好的性能。顺便说一句，我在 "%s" + ("a" * 1024) 等函数中添加了一些更长的字符串生成器，常规 concat 的速度几乎是 3 倍（1.1 vs 2.8）使用 format 和 % 方法。我想这取决于字符串以及您想要实现的目标。如果性能真的很重要，那么尝试不同的事情并测试它们可能会更好。我倾向于选择可读性而不是速度，除非速度成为问题，但这只是我。所以不喜欢我的复制/粘贴，我不得不在所有内容上添加 8 个空格才能使其看起来正确。我一般用4个。

I was just testing the speed of different string concatenation/substitution methods out of curiosity. A google search on the subject brought me here. I thought I would post my test results in the hope that it might help someone decide.

    import timeit
    def percent_():
            return "test %s, with number %s" % (1,2)

    def format_():
            return "test {}, with number {}".format(1,2)

    def format2_():
            return "test {1}, with number {0}".format(2,1)

    def concat_():
            return "test " + str(1) + ", with number " + str(2)

    def dotimers(func_list):
            # runs a single test for all functions in the list
            for func in func_list:
                    tmr = timeit.Timer(func)
                    res = tmr.timeit()
                    print "test " + func.func_name + ": " + str(res)

    def runtests(func_list, runs=5):
            # runs multiple tests for all functions in the list
            for i in range(runs):
                    print "----------- TEST #" + str(i + 1)
                    dotimers(func_list)

...After running runtests((percent_, format_, format2_, concat_), runs=5), I found that the % method was about twice as fast as the others on these small strings. The concat method was always the slowest (barely). There were very tiny differences when switching the positions in the format() method, but switching positions was always at least .01 slower than the regular format method.

Sample of test results:

    test concat_()  : 0.62  (0.61 to 0.63)
    test format_()  : 0.56  (consistently 0.56)
    test format2_() : 0.58  (0.57 to 0.59)
    test percent_() : 0.34  (0.33 to 0.35)

I ran these because I do use string concatenation in my scripts, and I was wondering what the cost was. I ran them in different orders to make sure nothing was interfering, or getting better performance being first or last. On a side note, I threw in some longer string generators into those functions like "%s" + ("a" * 1024) and regular concat was almost 3 times as fast (1.1 vs 2.8) as using the format and % methods. I guess it depends on the strings, and what you are trying to achieve. If performance really matters, it might be better to try different things and test them. I tend to choose readability over speed, unless speed becomes a problem, but thats just me. SO didn't like my copy/paste, i had to put 8 spaces on everything to make it look right. I usually use 4.

回复收藏 0 原文