Python:如何更换很多字符串

发布于 2025-01-24 00:56:46 字数 1451 浏览 3 评论 0原文

我正在尝试替换许多字符串(只有三个字符串示例,但实际上我有成千上万个字符串)将其定义为“替换字”上的其他字符串。

  • “替换字”没有规律性。

但是,我写的剂量不按照我的预期工作。

运行脚本后,输出如下:

     before     after
0  test1234  test1234
1  test1234  test1234
2  test1234      1349
3  test1234  test1234
4  test1234  test1234

我需要以下输出;

  before    after
1 test1234  1349
2 test9012  te1210st
3 test5678  8579
4 april     I was born August
5 mcdonalds i like checkin

脚本

import os.path, time, re
import pandas as pd
import csv


body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"

replaceWord = [
                ["test9012","te1210st"],
                ["test5678","8579"],
                ["test1234","1349"],
                ["april","August"],
                ["mcdonalds","chicken"],

]

cols = ['before','after']
df = pd.DataFrame(index=[], columns=cols)

for word in replaceWord:
    
    body01_after = re.sub(word[0], word[1], body01_before)
    body02_after = re.sub(word[0], word[1], body02_before)
    body03_after = re.sub(word[0], word[1], body03_before)
    body04_after = re.sub(word[0], word[1], body04_before)
    body05_after = re.sub(word[0], word[1], body05_before)

    df=df.append({'before':body01_before,'after':body01_after}, ignore_index=True)
    
#df.head()
print(df)

df.to_csv('test_replace.csv')

I'm trying to replace a lots of strings (only three strings example but I have thousands strings actually) to other strings defined on "replaceWord".

  • "replaceWord" has no regularity.

However,code i wrote dose not work as I expected.

After running script, output is as below:

     before     after
0  test1234  test1234
1  test1234  test1234
2  test1234      1349
3  test1234  test1234
4  test1234  test1234

I need output as below;

  before    after
1 test1234  1349
2 test9012  te1210st
3 test5678  8579
4 april     I was born August
5 mcdonalds i like checkin

script

import os.path, time, re
import pandas as pd
import csv


body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"

replaceWord = [
                ["test9012","te1210st"],
                ["test5678","8579"],
                ["test1234","1349"],
                ["april","August"],
                ["mcdonalds","chicken"],

]

cols = ['before','after']
df = pd.DataFrame(index=[], columns=cols)

for word in replaceWord:
    
    body01_after = re.sub(word[0], word[1], body01_before)
    body02_after = re.sub(word[0], word[1], body02_before)
    body03_after = re.sub(word[0], word[1], body03_before)
    body04_after = re.sub(word[0], word[1], body04_before)
    body05_after = re.sub(word[0], word[1], body05_before)

    df=df.append({'before':body01_before,'after':body01_after}, ignore_index=True)
    
#df.head()
print(df)

df.to_csv('test_replace.csv')

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦忆晨望 2025-01-31 00:56:46

使用正则表达式捕获非数字(\ d+)作为第一组,而Digits (\ d+)作为第二组。从第二组\ 2开始替换文本,然后第一个组\ 1

df['after'] = df['before'].str.replace(r'(\D+)(\d+)', r'\2\1', regex = True)

df
     before     after
1  test1234  1234test
2  test9012  9012test
3  test5678  5678test

编辑

似乎没有数据集。您有变量:

body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"

replaceWord = [
                ["test9012","te1210st"],
                ["test5678","8579"],
                ["test1234","1349"],
                ["april","August"],
                ["mcdonalds","chicken"],

]

# Gather the variables in a list
vars = re.findall('body0\\d[^,]+', ','.join(globals().keys()))
df = pd.DataFrame(vars, columns = ['before_1'])
# Obtain the values of the variable
df['before'] = df['before_1'].apply(lambda x:eval(x))

# replacement function
repl = lambda x: x[0] if (rp:=dict(replaceWord).get(x[0])) is None else rp

# Do the replacement
df['after'] = df['before'].str.replace('(\\w+)',repl, regex= True)

df
        before_1            before              after
0  body01_before          test1234               1349
1  body02_before          test9012           te1210st
2  body03_before          test5678               8579
3  body04_before  i like mcdonalds     i like chicken
4  body05_before  I was born april  I was born August

Use regular expressions to capture the non-digits (\D+) as the first group and the digits (\d+) as the second group. replace the text by starting with the second group \2 then first group \1

df['after'] = df['before'].str.replace(r'(\D+)(\d+)', r'\2\1', regex = True)

df
     before     after
1  test1234  1234test
2  test9012  9012test
3  test5678  5678test

Edit

Seems that you do not have the dataset. You have variables:

body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"

replaceWord = [
                ["test9012","te1210st"],
                ["test5678","8579"],
                ["test1234","1349"],
                ["april","August"],
                ["mcdonalds","chicken"],

]

# Gather the variables in a list
vars = re.findall('body0\\d[^,]+', ','.join(globals().keys()))
df = pd.DataFrame(vars, columns = ['before_1'])
# Obtain the values of the variable
df['before'] = df['before_1'].apply(lambda x:eval(x))

# replacement function
repl = lambda x: x[0] if (rp:=dict(replaceWord).get(x[0])) is None else rp

# Do the replacement
df['after'] = df['before'].str.replace('(\\w+)',repl, regex= True)

df
        before_1            before              after
0  body01_before          test1234               1349
1  body02_before          test9012           te1210st
2  body03_before          test5678               8579
3  body04_before  i like mcdonalds     i like chicken
4  body05_before  I was born april  I was born August
东北女汉子 2025-01-31 00:56:46

这适合您的目的吗?

words = ["test9012", "test5678", "test1234"]
updated = []

for word in words:
    for i, char in enumerate(word):
        if 47 < ord(char) < 58: # the character codes for digits 1-9
            updated.append(f"{word[i:]}{word[:i]}")
            break

print(updated)

代码打印:['9012 -test','5678 -test','1234 -test']

Does this suit your purpose?

words = ["test9012", "test5678", "test1234"]
updated = []

for word in words:
    for i, char in enumerate(word):
        if 47 < ord(char) < 58: # the character codes for digits 1-9
            updated.append(f"{word[i:]}{word[:i]}")
            break

print(updated)

The code prints: ['9012test', '5678test', '1234test']

地狱即天堂 2025-01-31 00:56:46

据我了解,您有一个字符串列表和映射字典的形式: {oldString1:newstring1,oldstring2:newstring2:newstring2,...} ,您想用来替换原始列表字符串。
我能想到的最快(也许是最柔软的)方法是将映射字典作为python dict。例如:

mapping = {
   "test9012":"9012test",
   "test5678","5678test",
   "test1234","1234test",
}

如果您的字符串列表存储为Python列表,则可以使用以下代码获取替换列表:

new_list = [mapping.get(key=old_string, default=old_string) for old_string in old_list]

Note :我们使用mapping.get()带有default = old_string,以便该函数返回old_string,以防它不在映射字典中。

如果您的字符串列表存储在一个熊猫系列(或pandas dataframe的一列),您可以快速替换字符串:

new_list = old_list.map(mapping, na_action='ignore')

note :我们设置na_action ='gighore',以便该函数返回old_string,如果它不在映射字典中。

As I understand, you have a list of strings and a mapping dictionary in the form of: {oldString1: newString1, oldString2: newString2, ...} that you want to use to replace the original list of strings.
The fastest (and maybe most Pythonic) approach I can think of is to simply save your mapping dictionary as a Python dict. For example:

mapping = {
   "test9012":"9012test",
   "test5678","5678test",
   "test1234","1234test",
}

If your list of strings is stored as a Python list, you can get the replaced list with the following code:

new_list = [mapping.get(key=old_string, default=old_string) for old_string in old_list]

Note: We use mapping.get() with default=old_string so that the function return the old_string in case it is not in the mapping dictionary.

If your list of strings is stored in a Pandas Series (or a column of a Pandas DataFrame), you can quickly replace the strings with:

new_list = old_list.map(mapping, na_action='ignore')

Note: We set na_action='ignore' so that the function return the old_string in case it is not in the mapping dictionary.

囚你心 2025-01-31 00:56:46

您可以使用Regex匹配模式。

import os.path, time, re
import pandas as pd
import csv

words = ["test9012", "test5678", "test1234"]

for word in words:
  textOnlyMatch = re.match("(([a-z]|[A-Z])*)", word)
  textOnly = textOnlyMatch.group(0) // take the entire match group
  numberPart = word.split(textOnly)[1] // take string of number only
  result = numberPart + textOnly
  df = df.append({'before':word,'after':result}, ignore_index=True)

#df.head()
print(df)

df.to_csv('test_replace.csv')

因此,通过使用REGEX匹配项,您只能将字母分开,而数字只能分开。

You can use regex to match the pattern.

import os.path, time, re
import pandas as pd
import csv

words = ["test9012", "test5678", "test1234"]

for word in words:
  textOnlyMatch = re.match("(([a-z]|[A-Z])*)", word)
  textOnly = textOnlyMatch.group(0) // take the entire match group
  numberPart = word.split(textOnly)[1] // take string of number only
  result = numberPart + textOnly
  df = df.append({'before':word,'after':result}, ignore_index=True)

#df.head()
print(df)

df.to_csv('test_replace.csv')

So by using regex match you can separate the alphabet only and the number only part.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文