在 Python 中用单个空格替换多个空格

发布于 2024-08-18 10:30:51 字数 199 浏览 13 评论 0原文

我有这个字符串:

mystring = 'Here is  some   text   I      wrote   '

如何用一个空格替换两个、三个 (...) 空白字符,以便我得到:

mystring = 'Here is some text I wrote'

I have this string:

mystring = 'Here is  some   text   I      wrote   '

How can I substitute the double, triple (...) whitespace chracters with a single space, so that I get:

mystring = 'Here is some text I wrote'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

傲鸠 2024-08-25 10:30:51

一种简单的可能性(如果您宁愿避免 RE)是

' '.join(mystring.split())

split 和 join 执行您明确询问的任务 - 另外,它们还执行您没有谈论但在示例中看到的额外任务,删除尾随空格;-)。

A simple possibility (if you'd rather avoid REs) is

' '.join(mystring.split())

The split and join perform the task you're explicitly asking about -- plus, they also do the extra one that you don't talk about but is seen in your example, removing trailing spaces;-).

妄断弥空 2024-08-25 10:30:51

正则表达式可用于对组合的空白字符提供更多控制。

匹配 unicode 空白:

import re

_RE_COMBINE_WHITESPACE = re.compile(r"\s+")

my_str = _RE_COMBINE_WHITESPACE.sub(" ", my_str).strip()

仅匹配 ASCII 空白:

import re

_RE_COMBINE_WHITESPACE = re.compile(r"(?a:\s+)")
_RE_STRIP_WHITESPACE = re.compile(r"(?a:^\s+|\s+$)")

my_str = _RE_COMBINE_WHITESPACE.sub(" ", my_str)
my_str = _RE_STRIP_WHITESPACE.sub("", my_str)

有时,仅匹配 ASCII 空白对于保留控制字符(例如 x0b、x0c、x1c、x1d、x1e、x1f)至关重要。

参考:

关于\s

对于 Unicode (str) 模式:
匹配 Unicode 空白字符(包括 [ \t\n\r\f\v],以及许多其他字符,例如
许多语言的版式规则强制要求使用不间断空格)。
如果使用 ASCII 标志,则仅匹配 [ \t\n\r\f\v]。

关于re.ASCII

使 \w、\W、\b、\B、\d、\D、\s 和 \S 执行仅 ASCII 匹配,而不是完整的 Unicode 匹配。这仅对 Unicode 有意义
模式,并且对于字节模式被忽略。对应内联
标志 (?a)。

strip() 将远程删除任何前导和尾随空格。

A regular expression can be used to offer more control over the whitespace characters that are combined.

To match unicode whitespace:

import re

_RE_COMBINE_WHITESPACE = re.compile(r"\s+")

my_str = _RE_COMBINE_WHITESPACE.sub(" ", my_str).strip()

To match ASCII whitespace only:

import re

_RE_COMBINE_WHITESPACE = re.compile(r"(?a:\s+)")
_RE_STRIP_WHITESPACE = re.compile(r"(?a:^\s+|\s+$)")

my_str = _RE_COMBINE_WHITESPACE.sub(" ", my_str)
my_str = _RE_STRIP_WHITESPACE.sub("", my_str)

Matching only ASCII whitespace is sometimes essential for keeping control characters such as x0b, x0c, x1c, x1d, x1e, x1f.

Reference:

About \s:

For Unicode (str) patterns:
Matches Unicode whitespace characters (which includes [ \t\n\r\f\v], and also many other characters, for example the
non-breaking spaces mandated by typography rules in many languages).
If the ASCII flag is used, only [ \t\n\r\f\v] is matched.

About re.ASCII:

Make \w, \W, \b, \B, \d, \D, \s and \S perform ASCII-only matching instead of full Unicode matching. This is only meaningful for Unicode
patterns, and is ignored for byte patterns. Corresponds to the inline
flag (?a).

strip() will remote any leading and trailing whitespaces.

樱娆 2024-08-25 10:30:51

为了完整起见,您还可以使用:

mystring = mystring.strip()  # the while loop will leave a trailing space, 
                  # so the trailing whitespace must be dealt with
                  # before or after the while loop
while '  ' in mystring:
    mystring = mystring.replace('  ', ' ')

它将快速处理具有相对较少空格的字符串(在这些情况下比 re 更快)。

在任何情况下,Alex Martelli 的拆分/合并解决方案的执行速度至少一样快(通常要快得多)。

在您的示例中,使用 timeit.Timer.repeat() 的默认值,我得到以下时间:

str.replace: [1.4317800167340238, 1.4174888149192384, 1.4163512401715934]
re.sub:      [3.741931446594549,  3.8389395858970374, 3.973777672860706]
split/join:  [0.6530919432498195, 0.6252146571700905, 0.6346594329726258]

编辑:

刚刚发现这篇文章,它对这些方法的速度进行了相当长的比较。

For completeness, you can also use:

mystring = mystring.strip()  # the while loop will leave a trailing space, 
                  # so the trailing whitespace must be dealt with
                  # before or after the while loop
while '  ' in mystring:
    mystring = mystring.replace('  ', ' ')

which will work quickly on strings with relatively few spaces (faster than re in these situations).

In any scenario, Alex Martelli's split/join solution performs at least as quickly (usually significantly more so).

In your example, using the default values of timeit.Timer.repeat(), I get the following times:

str.replace: [1.4317800167340238, 1.4174888149192384, 1.4163512401715934]
re.sub:      [3.741931446594549,  3.8389395858970374, 3.973777672860706]
split/join:  [0.6530919432498195, 0.6252146571700905, 0.6346594329726258]

EDIT:

Just came across this post which provides a rather long comparison of the speeds of these methods.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文