Python:如何更换很多字符串
我正在尝试替换许多字符串(只有三个字符串示例,但实际上我有成千上万个字符串)将其定义为“替换字”上的其他字符串。
- “替换字”没有规律性。
但是,我写的剂量不按照我的预期工作。
运行脚本后,输出如下:
before after
0 test1234 test1234
1 test1234 test1234
2 test1234 1349
3 test1234 test1234
4 test1234 test1234
我需要以下输出;
before after
1 test1234 1349
2 test9012 te1210st
3 test5678 8579
4 april I was born August
5 mcdonalds i like checkin
脚本
import os.path, time, re
import pandas as pd
import csv
body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"
replaceWord = [
["test9012","te1210st"],
["test5678","8579"],
["test1234","1349"],
["april","August"],
["mcdonalds","chicken"],
]
cols = ['before','after']
df = pd.DataFrame(index=[], columns=cols)
for word in replaceWord:
body01_after = re.sub(word[0], word[1], body01_before)
body02_after = re.sub(word[0], word[1], body02_before)
body03_after = re.sub(word[0], word[1], body03_before)
body04_after = re.sub(word[0], word[1], body04_before)
body05_after = re.sub(word[0], word[1], body05_before)
df=df.append({'before':body01_before,'after':body01_after}, ignore_index=True)
#df.head()
print(df)
df.to_csv('test_replace.csv')
I'm trying to replace a lots of strings (only three strings example but I have thousands strings actually) to other strings defined on "replaceWord".
- "replaceWord" has no regularity.
However,code i wrote dose not work as I expected.
After running script, output is as below:
before after
0 test1234 test1234
1 test1234 test1234
2 test1234 1349
3 test1234 test1234
4 test1234 test1234
I need output as below;
before after
1 test1234 1349
2 test9012 te1210st
3 test5678 8579
4 april I was born August
5 mcdonalds i like checkin
script
import os.path, time, re
import pandas as pd
import csv
body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"
replaceWord = [
["test9012","te1210st"],
["test5678","8579"],
["test1234","1349"],
["april","August"],
["mcdonalds","chicken"],
]
cols = ['before','after']
df = pd.DataFrame(index=[], columns=cols)
for word in replaceWord:
body01_after = re.sub(word[0], word[1], body01_before)
body02_after = re.sub(word[0], word[1], body02_before)
body03_after = re.sub(word[0], word[1], body03_before)
body04_after = re.sub(word[0], word[1], body04_before)
body05_after = re.sub(word[0], word[1], body05_before)
df=df.append({'before':body01_before,'after':body01_after}, ignore_index=True)
#df.head()
print(df)
df.to_csv('test_replace.csv')
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用正则表达式捕获非数字
(\ d+)
作为第一组,而Digits(\ d+)
作为第二组。从第二组\ 2
开始替换文本,然后第一个组\ 1
编辑
似乎没有数据集。您有变量:
Use regular expressions to capture the non-digits
(\D+)
as the first group and the digits(\d+)
as the second group. replace the text by starting with the second group\2
then first group\1
Edit
Seems that you do not have the dataset. You have variables:
这适合您的目的吗?
代码打印:
['9012 -test','5678 -test','1234 -test']
Does this suit your purpose?
The code prints:
['9012test', '5678test', '1234test']
据我了解,您有一个字符串列表和映射字典的形式: {oldString1:newstring1,oldstring2:newstring2:newstring2,...} ,您想用来替换原始列表字符串。
我能想到的最快(也许是最柔软的)方法是将映射字典作为python
dict
。例如:如果您的字符串列表存储为Python列表,则可以使用以下代码获取替换列表:
Note :我们使用
mapping.get()带有
default = old_string
,以便该函数返回old_string
,以防它不在映射字典中。如果您的字符串列表存储在一个熊猫系列(或pandas dataframe的一列),您可以快速替换字符串:
note :我们设置
na_action ='gighore'
,以便该函数返回old_string
,如果它不在映射字典中。As I understand, you have a list of strings and a mapping dictionary in the form of: {oldString1: newString1, oldString2: newString2, ...} that you want to use to replace the original list of strings.
The fastest (and maybe most Pythonic) approach I can think of is to simply save your mapping dictionary as a Python
dict
. For example:If your list of strings is stored as a Python list, you can get the replaced list with the following code:
Note: We use
mapping.get()
withdefault=old_string
so that the function return theold_string
in case it is not in the mapping dictionary.If your list of strings is stored in a Pandas Series (or a column of a Pandas DataFrame), you can quickly replace the strings with:
Note: We set
na_action='ignore'
so that the function return theold_string
in case it is not in the mapping dictionary.您可以使用Regex匹配模式。
因此,通过使用REGEX匹配项,您只能将字母分开,而数字只能分开。
You can use regex to match the pattern.
So by using regex match you can separate the alphabet only and the number only part.