如何替换字符串的多个子字符串?
我想使用 .replace 函数来替换多个字符串。
我目前有
string.replace("condition1", "")
,但想要有类似的东西,
string.replace("condition1", "").replace("condition2", "text")
尽管这感觉不像好的语法,
但正确的方法是什么?有点像在 grep/regex 中如何执行 \1
和 \2
将字段替换为某些搜索字符串
I would like to use the .replace function to replace multiple strings.
I currently have
string.replace("condition1", "")
but would like to have something like
string.replace("condition1", "").replace("condition2", "text")
although that does not feel like good syntax
what is the proper way to do this? kind of like how in grep/regex you can do \1
and \2
to replace fields to certain search strings
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(29)
这是一个应该使用正则表达式来解决问题的简短示例:
例如:
Here is a short example that should do the trick with regular expressions:
For example:
你可以做一个漂亮的小循环函数。
其中
text
是完整的字符串,dic
是字典 - 每个定义都是一个字符串,将替换与术语的匹配项。注意:在 Python 3 中,
iteritems()
已替换为items()
小心: Python 字典不支持没有可靠的迭代顺序。此解决方案仅在以下情况下解决您的问题:
更新:上述与插入顺序相关的语句不适用于大于或等于 3.6 的 Python 版本(作为标准)字典已更改为使用插入顺序进行迭代。
例如:
可能的输出#1:
可能的输出#2
一种可能的解决方法是使用 OrderedDict。
输出:
小心#2:如果您的
文本
字符串太大或字典中有很多对,效率就会很低。You could just make a nice little looping function.
where
text
is the complete string anddic
is a dictionary — each definition is a string that will replace a match to the term.Note: in Python 3,
iteritems()
has been replaced withitems()
Careful: Python dictionaries don't have a reliable order for iteration. This solution only solves your problem if:
Update: The above statement related to ordering of insertion does not apply to Python versions greater than or equal to 3.6, as standard dicts were changed to use insertion ordering for iteration.
For instance:
Possible output #1:
Possible output #2
One possible fix is to use an OrderedDict.
Output:
Careful #2: Inefficient if your
text
string is too big or there are many pairs in the dictionary.为什么不采用这样的一种解决方案呢?
Why not one solution like this?
这是使用
reduce
的第一个解决方案的变体(从functools
导入),如果您喜欢功能性的话。 :)martineau 的更好版本:
Here is a variant of the first solution using
reduce
(import fromfunctools
), in case you like being functional. :)martineau's even better version:
这只是对 FJ 和 MiniQuark 出色答案以及 bgusach 最后但决定性改进的更简洁的回顾。实现同时进行多个字符串替换所需的只是以下函数:
用法:
如果您愿意,您可以从这个更简单的函数开始创建自己的专用替换函数。
This is just a more concise recap of F.J and MiniQuark great answers and last but decisive improvement by bgusach. All you need to achieve multiple simultaneous string replacements is the following function:
Usage:
If you wish, you can make your own dedicated replacement functions starting from this simpler one.
从
Python 3.8
开始,并引入赋值表达式 (PEP 572 ) (:=
运算符),我们可以在列表理解中应用替换:Starting
Python 3.8
, and the introduction of assignment expressions (PEP 572) (:=
operator), we can apply the replacements within a list comprehension:我将此建立在 FJ 的出色答案之上:
一次使用:
请注意,由于替换只需一次完成,因此“café”会更改为“tea”,但不会变回“café”。
如果您需要多次进行相同的替换,您可以轻松创建替换函数:
改进:
享受! :-)
I built this upon F.J.s excellent answer:
One shot usage:
Note that since replacement is done in just one pass, "café" changes to "tea", but it does not change back to "café".
If you need to do the same replacement many times, you can create a replacement function easily:
Improvements:
Enjoy! :-)
我想建议使用字符串模板。只需将要替换的字符串放入字典中,一切就完成了!示例来自 docs.python.org
I would like to propose the usage of string templates. Just place the string to be replaced in a dictionary and all is set! Example from docs.python.org
这是我的 0.02 美元。它基于 Andrew Clark 的答案,只是更清晰一点,它还涵盖了要替换的字符串是另一个要替换的字符串的子字符串的情况(较长的字符串获胜),
它在这个 这个要点,随意修改如果你有什么建议的话。
Here my $0.02. It is based on Andrew Clark's answer, just a little bit clearer, and it also covers the case when a string to replace is a substring of another string to replace (longer string wins)
It is in this this gist, feel free to modify it if you have any proposal.
就我而言,我需要用名称简单地替换唯一键,所以我想到了这一点:
In my case, I needed a simple replacing of unique keys with names, so I thought this up:
我需要一个解决方案,其中要替换的字符串可以是正则表达式,
例如,通过用单个空白字符替换多个空白字符来帮助标准化长文本。建立在其他人(包括 MiniQuark 和 mmj)的一系列答案的基础上,这就是我想到的:
它适用于其他答案中给出的示例,例如:
对我来说最重要的是你也可以使用正则表达式,例如仅替换整个单词,或标准化空格:
如果您想将字典键用作普通字符串,
您可以在调用 multiple_replace 之前使用例如此函数转义这些:
以下函数可以帮助在字典键中查找错误的正则表达式(因为来自 multiple_replace 的错误消息不是很说明问题):
请注意,它不会链接替换,而是同时执行它们。这使得它更加高效,同时又不限制它的功能。为了模仿链接的效果,您可能只需要添加更多字符串替换对并确保这些对的预期顺序:
I needed a solution where the strings to be replaced can be a regular expressions,
for example to help in normalizing a long text by replacing multiple whitespace characters with a single one. Building on a chain of answers from others, including MiniQuark and mmj, this is what I came up with:
It works for the examples given in other answers, for example:
The main thing for me is that you can use regular expressions as well, for example to replace whole words only, or to normalize white space:
If you want to use the dictionary keys as normal strings,
you can escape those before calling multiple_replace using e.g. this function:
The following function can help in finding erroneous regular expressions among your dictionary keys (since the error message from multiple_replace isn't very telling):
Note that it does not chain the replacements, instead performs them simultaneously. This makes it more efficient without constraining what it can do. To mimic the effect of chaining, you may just need to add more string-replacement pairs and ensure the expected ordering of the pairs:
注意:测试您的案例,请参阅评论。
这是一个在具有许多小替换的长字符串上更有效的示例。
重点是避免长字符串的多次串联。我们将源字符串切成片段,在形成列表时替换一些片段,然后将整个字符串重新连接回字符串。
Note: Test your case, see comments.
Here's a sample which is more efficient on long strings with many small replacements.
The point is in avoiding many concatenations of long strings. We chop the source string to fragments, replacing some of the fragments as we form the list, and then join the whole thing back into a string.
您可以使用
pandas
库和replace
函数,它支持精确匹配和正则表达式替换。例如:修改后的文本为:
您可以找到一个示例 此处。请注意,文本的替换是按照它们在列表中出现的顺序完成的
You can use the
pandas
library and thereplace
function which supports both exact matches as well as regex replacements. For example:And the modified text is:
You can find an example here. Notice that the replacements on the text are done with the order they appear in the lists
我也在为这个问题苦苦挣扎。对于许多替换,正则表达式很困难,并且比循环 string.replace 慢大约四倍(在我的实验条件下)。
您绝对应该尝试使用 Flashtext 库 (博客文章此处,Github 此处)。 就我而言有点结束了每个文档的速度快了两个数量级,从 1.8 秒到 0.015 秒(正则表达式需要 7.7 秒)。
在上面的链接中很容易找到使用示例,但这是一个工作示例:
请注意,Flashtext 在一次传递中进行替换(以避免 a --> b 和 b - -> c 将“a”翻译为“c”)。 Flashtext 还会查找整个单词(因此“is”不会匹配“this”)。如果您的目标是几个单词(将“This is”替换为“Hello”),则效果很好。
I was struggling with this problem as well. With many substitutions regular expressions struggle, and are about four times slower than looping
string.replace
(in my experiment conditions).You should absolutely try using the Flashtext library (blog post here, Github here). In my case it was a bit over two orders of magnitude faster, from 1.8 s to 0.015 s (regular expressions took 7.7 s) for each document.
It is easy to find use examples in the links above, but this is a working example:
Note that Flashtext makes substitutions in a single pass (to avoid a --> b and b --> c translating 'a' into 'c'). Flashtext also looks for whole words (so 'is' will not match 'this'). It works fine if your target is several words (replacing 'This is' by 'Hello').
我在一份学校作业中做了类似的练习。这是我的解决方案
在测试字符串上亲自查看结果
I was doing a similar exercise in one of my school homework. This was my solution
See result yourself on test string
我今天面临类似的问题,我不得不多次使用 .replace() 方法,但我感觉不太好。所以我做了这样的事情:
I face similar problem today, where I had to do use .replace() method multiple times but it didn't feel good to me. So I did something like this:
我觉得这个问题需要一个单行递归 lambda 函数答案才能完整,只是因为。用法
:
注意:
注意:与Python中的所有递归函数一样,太大的递归深度(即太大的替换字典)将导致错误。请参阅此处。
I feel this question needs a single-line recursive lambda function answer for completeness, just because. So there:
Usage:
Notes:
Note: As with all recursive functions in python, too large recursion depth (i.e. too large replacement dictionaries) will result in an error. See e.g. here.
你真的不应该这样做,但我只是觉得这太酷了:
现在,
answer
是所有替换的结果,这是非常 hacky并且不是您应该经常使用的东西。但很高兴知道如果您需要的话您可以做这样的事情。
You should really not do it this way, but I just find it way too cool:
Now,
answer
is the result of all the replacements in turnagain, this is very hacky and is not something that you should be using regularly. But it's just nice to know that you can do something like this if you ever need to.
要仅替换一个字符,请使用
translate
,而str.maketrans
是我最喜欢的方法。TL;博士>
result_string = your_string.translate(str.maketrans(dict_mapping))
演示
For replace only one character, use the
translate
andstr.maketrans
is my favorite method.tl;dr >
result_string = your_string.translate(str.maketrans(dict_mapping))
demo
我不知道速度,但这是我日常的快速修复:
...但我喜欢上面的 #1 正则表达式答案。注意 - 如果一个新值是另一个新值的子字符串,则该操作不可交换。
I don't know about speed but this is my workaday quick fix:
... but I like the #1 regex answer above. Note - if one new value is a substring of another one then the operation is not commutative.
这是一个支持基本正则表达式替换的版本。主要限制是表达式不得包含子组,并且可能存在一些边缘情况:
基于 @bgusach 和其他人的代码
测试
技巧是通过位置来识别匹配组。它不是超级高效(O(n)),但它确实有效。
更换一次性完成。
Here is a version with support for basic regex replacement. The main restriction is that expressions must not contain subgroups, and there may be some edge cases:
Code based on @bgusach and others
Tests
The trick is to identify the matched group by its position. It is not super efficient (O(n)), but it works.
Replacement is done in one pass.
从安德鲁的宝贵答案开始,我开发了一个脚本,该脚本从文件加载字典并详细说明打开的文件夹中的所有文件以进行替换。该脚本从外部文件加载映射,您可以在其中设置分隔符。我是初学者,但我发现这个脚本在多个文件中进行多次替换时非常有用。它在几秒钟内加载了一本包含 1000 多个条目的字典。它并不优雅,但对我有用
Starting from the precious answer of Andrew i developed a script that loads the dictionary from a file and elaborates all the files on the opened folder to do the replacements. The script loads the mappings from an external file in which you can set the separator. I'm a beginner but i found this script very useful when doing multiple substitutions in multiple files. It loaded a dictionary with more than 1000 entries in seconds. It is not elegant but it worked for me
这是我解决问题的方法。我在聊天机器人中使用它来一次替换不同的单词。
这将变成
猫猎狗
this is my solution to the problem. I used it in a chatbot to replace the different words at once.
this will become
The cat hunts the dog
另一个例子:
输入列表
所需的输出将是
Code :
Another example :
Input list
The desired output would be
Code :
我的方法是首先对字符串进行标记,然后决定每个标记是否包含它。
如果我们可以假设 O(1) 查找哈希图/集合,那么性能可能会更高:
filtered_sent
现在是'should modify string'
My approach would be to first tokenize the string, then decide for each token whether to include it or not.
Potentially, might be more performant, if we can assume O(1) lookup for a hashmap/set:
filtered_sent
is now'should modify string'
我知道这已经很旧了,但我正在尝试将 json 转换为 PHP,并且我喜欢使用括号 &新线路以查看每个替换将要执行的操作。
这是代码。
:)
它似乎也生成了很好的 PHP。
I know this is old but I was playing around with converting json to PHP and I liked using parentheses & new lines to see what each replacement was going to perform.
Here's the code.
:)
It seems to generate good PHP as well.
或者只是为了快速破解:
Or just for a fast hack:
这是使用字典的另一种方法:
Here is another way of doing it with a dictionary: