如何修剪空白?
有没有一个Python函数可以从字符串中删除空格(空格和制表符)?
这样给定的输入 " \t example string\t "
就变成了 "example string"
。
Is there a Python function that will trim whitespace (spaces and tabs) from a string?
So that given input " \t example string\t "
becomes "example string"
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
对于两侧的空白,请使用
str.strip
:对于右侧的空白,请使用
str .rstrip
:对于左侧的空白,请使用
str.lstrip
:您可以提供一个参数来为这些函数中的任何一个删除任意字符,如下所示:
这将删除任何空格,
\t
、\n
或\r
字符串两侧的字符。上面的示例仅删除字符串左侧和右侧的字符串。 如果您还想删除字符串中间的字符,请尝试
re.sub
:应该打印出:
For whitespace on both sides, use
str.strip
:For whitespace on the right side, use
str.rstrip
:For whitespace on the left side, use
str.lstrip
:You can provide an argument to strip arbitrary characters to any of these functions, like this:
This will strip any space,
\t
,\n
, or\r
characters from both sides of the string.The examples above only remove strings from the left-hand and right-hand sides of strings. If you want to also remove characters from the middle of a string, try
re.sub
:That should print out:
在 Python 中 trim 方法被命名为
strip< /代码>:
In Python trim methods are named
strip
:对于前导和尾随空格:
否则,正则表达式有效:
For leading and trailing whitespace:
Otherwise, a regular expression works:
您还可以使用非常简单的基本函数: str.replace(),有效使用空格和制表符:
简单又容易。
You can also use very simple, and basic function: str.replace(), works with the whitespaces and tabs:
Simple and easy.
尚未有人发布这些正则表达式解决方案。
匹配:
搜索(您必须以不同的方式处理“仅空格”输入情况):
如果您使用
re.sub
,您可能会删除内部空格,这可能是不可取的。No one has posted these regex solutions yet.
Matching:
Searching (you have to handle the "only spaces" input case differently):
If you use
re.sub
, you may remove inner whitespace, which could be undesirable.空白包括空格、制表符和CRLF。 因此,我们可以使用的一个优雅且单行字符串函数是translate。
' hello apple'.translate(None, ' \n\t\r')
OR 如果你想彻底
Whitespace includes space, tabs and CRLF. So an elegant and one-liner string function we can use is translate.
' hello apple'.translate(None, ' \n\t\r')
OR if you want to be thorough
这将删除所有不需要的空格和换行符。 希望这会有所帮助
这将导致:
' a b \nc ' 将更改为 'ab c'
This will remove all the unwanted spaces and newline characters. Hope this help
This will result :
' a b \n c ' will be changed to 'a b c'
输出:
Adding Le Droid's comment to the answer.
To separate with a space:
输出:
output:
Adding Le Droid's comment to the answer.
To separate with a space:
output:
在这里查看了相当多具有不同理解程度的解决方案后,我想知道如果字符串以逗号分隔该怎么办...
问题
在尝试处理联系人信息的 csv 时,我需要一个解决方案:修剪无关的空格和一些垃圾,但保留尾随逗号和内部空格。 使用包含联系人注释的字段,我想删除垃圾,留下好东西。 删除所有标点符号和干扰,我不想丢失复合标记之间的空格,因为我不想稍后重建。
正则表达式和模式:
[\s_]+?\W+
该模式会惰性地查找任意空白字符和下划线 ('_') 的单个实例,次数从 1 到无限次(只要字符数很少)尽可能)与
[\s_]+?
出现在从 1 到无限时间的非单词字符之前:\W+
(相当于 <代码>[^a-zA-Z0-9_])。 具体来说,这会查找空白字符:空字符 (\0)、制表符 (\t)、换行符 (\n)、前馈 (\f)、回车符 (\r)。我认为这样做有两个优点:
它不会删除您可能想要保留在一起的完整单词/标记之间的空格;
Python内置的字符串方法
strip()
不处理字符串内部,只处理左右两端,默认arg为空字符(见下面的例子:文本中有几个换行符,并且strip()
不会将它们全部删除,而正则表达式模式会删除它们)。text.strip(' \n\t\r')
这超出了操作问题,但我认为在很多情况下我们可能会在文本数据中出现奇怪的病态实例,正如我所言做了(转义字符如何出现在某些文本中)。 此外,在类似列表的字符串中,我们不想消除分隔符,除非分隔符分隔两个空格字符或某些非单词字符,例如“-,”或“-,,,,”。
注意:不讨论 CSV 本身的分隔符。 仅适用于 CSV 中数据类似列表的实例,即子字符串的 cs 字符串。
全面披露:我只操作文本大约一个月,而正则表达式仅在过去两周操作,所以我确信我遗漏了一些细微差别。 也就是说,对于较小的字符串集合(我的字符串集合包含 12,000 行和 40 个奇数列),作为删除无关字符后的最后一步,这种方法效果非常好,特别是如果您在其中引入一些额外的空格,想要分隔由非单词字符连接的文本,但不想在以前没有的地方添加空格。
示例:
此输出:
So strip 一次删除一个空格。 所以在 OP 的情况下,
strip()
就可以了。 但如果事情变得更复杂,正则表达式和类似的模式可能对于更一般的设置有一定的价值。查看实际效果
Having looked at quite a few solutions here with various degrees of understanding, I wondered what to do if the string was comma separated...
the problem
While trying to process a csv of contact information, I needed a solution this problem: trim extraneous whitespace and some junk, but preserve trailing commas, and internal whitespace. Working with a field containing notes on the contacts, I wanted to remove the garbage, leaving the good stuff. Trimming out all the punctuation and chaff, I didn't want to lose the whitespace between compound tokens as I didn't want to rebuild later.
regex and patterns:
[\s_]+?\W+
The pattern looks for single instances of any whitespace character and the underscore ('_') from 1 to an unlimited number of times lazily (as few characters as possible) with
[\s_]+?
that come before non-word characters occurring from 1 to an unlimited amount of time with this:\W+
(is equivalent to[^a-zA-Z0-9_]
). Specifically, this finds swaths of whitespace: null characters (\0), tabs (\t), newlines (\n), feed-forward (\f), carriage returns (\r).I see the advantage to this as two-fold:
that it doesn't remove whitespace between the complete words/tokens that you might want to keep together;
Python's built in string method
strip()
doesn't deal inside the string, just the left and right ends, and default arg is null characters (see below example: several newlines are in the text, andstrip()
does not remove them all while the regex pattern does).text.strip(' \n\t\r')
This goes beyond the OPs question, but I think there are plenty of cases where we might have odd, pathological instances within the text data, as I did (some how the escape characters ended up in some of the text). Moreover, in list-like strings, we don't want to eliminate the delimiter unless the delimiter separates two whitespace characters or some non-word character, like '-,' or '-, ,,,'.
NB: Not talking about the delimiter of the CSV itself. Only of instances within the CSV where the data is list-like, ie is a c.s. string of substrings.
Full disclosure: I've only been manipulating text for about a month, and regex only the last two weeks, so I'm sure there are some nuances I'm missing. That said, for smaller collections of strings (mine are in a dataframe of 12,000 rows and 40 odd columns), as a final step after a pass for removal of extraneous characters, this works exceptionally well, especially if you introduce some additional whitespace where you want to separate text joined by a non-word character, but don't want to add whitespace where there was none before.
An example:
This outputs:
So strip removes one whitespace from at a time. So in the OPs case,
strip()
is fine. but if things get any more complex, regex and a similar pattern may be of some value for more general settings.see it in action
如果使用 Python 3:在 print 语句中,以 sep="" 结束。 这将分隔出所有空间。
示例:
这将打印:
我喜欢土豆。
而不是:
我喜欢土豆。
在你的情况下,因为你会尝试乘坐 \t,所以 sep="\t"
If using Python 3: In your print statement, finish with sep="". That will separate out all of the spaces.
EXAMPLE:
This will print:
I love potatoes.
Instead of:
I love potatoes .
In your case, since you would be trying to get ride of the \t, do sep="\t"
如果你想修剪掉字符串开头和结尾的空白,你可以这样做:
这很像 Qt 的 QString::trimmed() 方法,因为它删除前导和尾随空白,同时保留内部单独的空白。
但是,如果您想要像 Qt 的 QString::simplified() 方法这样的方法,该方法不仅可以删除前导和尾随空格,还可以将所有连续的内部空格“压缩”为一个空格字符,您可以使用
.split 的组合()
和" ".join
,如下所示:在最后一个示例中,每个内部空白序列都替换为单个空格,同时仍然修剪开头和结尾的空白细绳。
If you want to trim the whitespace off just the beginning and end of the string, you can do something like this:
This works a lot like Qt's QString::trimmed() method, in that it removes leading and trailing whitespace, while leaving internal whitespace alone.
But if you'd like something like Qt's QString::simplified() method which not only removes leading and trailing whitespace, but also "squishes" all consecutive internal whitespace to one space character, you can use a combination of
.split()
and" ".join
, like this:In this last example, each sequence of internal whitespace replaced with a single space, while still trimming the whitespace off the start and end of the string.
尝试翻译
try translate
一般来说,我使用以下方法:
注意:这仅用于删除“\n”、“\r”和“\t”。 它不会删除多余的空格。
Generally, I am using the following method:
Note: This is only for removing "\n", "\r" and "\t" only. It does not remove extra spaces.
这将从字符串的开头和结尾删除所有空格和换行符:
This will remove all whitespace and newlines from both the beginning and end of a string: