根据条件替换 pandas 数据框列中的部分 int 或 string

发布于 2025-01-20 23:34:12 字数 503 浏览 0 评论 0原文

我有一个 pandas 数据框,其中有一列表示日期,但以 int 格式保存。对于几个日期,我有第 13 个月和第 14 个月。我想用第 12 个月替换第 13 个月和第 14 个月。然后,最终将其转换为 date_time 格式。

Original_date
20190101
20191301
20191401

New_date
20190101
20191201
20191201

我尝试将格式替换为字符串,然后仅根据字符串 [4:6] 中的月份索引进行替换,但没有成功:

df.original_date.astype(str)
for string in df['original_date']:
    if string[4:6]=="13" or string[4:6]=="14":
        string.replace(string, string[:4]+ "12" + string[6:])
print(df['original_date'])

I have a pandas dataframe with a column representing dates but saved in int format. For several dates I have a 13th and a 14th month. I would like to replace these 13th and 14th months by the 12th month. And then, eventually transform it into date_time format.

Original_date
20190101
20191301
20191401

New_date
20190101
20191201
20191201

I tried by replacing the format into string then replace only based on the index of the months in the string [4:6], but it didn't work out:

df.original_date.astype(str)
for string in df['original_date']:
    if string[4:6]=="13" or string[4:6]=="14":
        string.replace(string, string[:4]+ "12" + string[6:])
print(df['original_date'])

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一梦浮鱼 2025-01-27 23:34:12

您可以使用pregex使用.str.replace

df['New_date'] = df['Original_date'].astype(str).str.replace('(\d{4})(13|14)(\d{2})', r'\g<1>12\3', regex=True)
print(df)

   Original_date  New_date
0       20190101  20190101
1       20191301  20191201
2       20191401  20191201

You can use .str.replace with regex

df['New_date'] = df['Original_date'].astype(str).str.replace('(\d{4})(13|14)(\d{2})', r'\g<1>12\3', regex=True)
print(df)

   Original_date  New_date
0       20190101  20190101
1       20191301  20191201
2       20191401  20191201
盛夏已如深秋| 2025-01-27 23:34:12

为什么不写正则表达式呢?

s = pd.Series('''20190101
20191301
20191401'''.split('\n')).astype(str)
s.str.replace('(?<=\d{4})(13|14)(?=01)', '12', regex=True)

产量:(

0    20190101
1    20191201
2    20191201
dtype: object

NB您需要将输出重新分配给列以持久存储在内存中。)

Why not just write a regular expression?

s = pd.Series('''20190101
20191301
20191401'''.split('\n')).astype(str)
s.str.replace('(?<=\d{4})(13|14)(?=01)', '12', regex=True)

Yielding:

0    20190101
1    20191201
2    20191201
dtype: object

(Nb you will need to reassign the output back to a column to persist it in memory.)

半衬遮猫 2025-01-27 23:34:12

您可以在单独的函数中编写替换和逻辑,如果您还需要更改年份或月份,您还可以选择轻松调整它。 apply 允许您在 DataFrame 的每一行上使用该函数。

import pandas as pd

def split_and_replace(x):
    year = x[0:4]
    month = x[4:6]
    day = x[6:8]
    if month in ('13', '14'):
        month = '12'
    else:
        pass
    
    return year + month + day
    

df = pd.DataFrame(
    data={
        'Original_date': ['20190101', '20191301', '20191401']    
    }
)

res = df.Original_date.apply(lambda x: split_and_replace(x))

print(res)

You can write the replace and logic in a seperate function, which also gives you the option to adapt it easily if you also need to change the year or month. apply lets you use that function on each row of the DataFrame.

import pandas as pd

def split_and_replace(x):
    year = x[0:4]
    month = x[4:6]
    day = x[6:8]
    if month in ('13', '14'):
        month = '12'
    else:
        pass
    
    return year + month + day
    

df = pd.DataFrame(
    data={
        'Original_date': ['20190101', '20191301', '20191401']    
    }
)

res = df.Original_date.apply(lambda x: split_and_replace(x))

print(res)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文