我如何替换'每行的DateTime列中的值?

发布于 2025-01-19 13:42:34 字数 447 浏览 0 评论 0 原文

在我的数据框中,我有两列:“release_date”和“release_year”,

我尝试将每个“release_date”实例中的年份值替换为“release_year”中的相应值,

我尝试了以下

df.loc[: , 'release_date'] = df['release_date'].apply(lambda x: x.replace(x.year == df['release_year']))

但是我我收到错误:“值必须是整数,收到 ” 检查了 dtype后

,release_date 列存储为 datetime64[ns]

数据帧摘录

Within my dataframe I have two columns: 'release_date' and 'release_year'

I am trying to replace the year value in each 'release_date' instance with the corresponding value in 'release_year'

I have tried the following

df.loc[:, 'release_date'] = df['release_date'].apply(lambda x: x.replace(x.year == df['release_year']))

however I am getting the error: 'value must be an integer, received <class 'pandas.core.series.Series'> for year'

Having checked the dtype, the release_date column is stored as datetime64[ns]

Excerpt from dataframe

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

骄傲 2025-01-26 13:42:34

您需要在此处使用 pandaframe.dataframe.apply 而不是 pandas.series.pandas.series.apply 作为您需要来自其他列的数据,请考虑以下简单示例示例

import datetime
import pandas as pd
df = pd.DataFrame({'release_date':[datetime.date(1901,1,1),datetime.date(1902,1,1),datetime.date(1903,1,1)],'release_year':[2001,2002,2003]})
df['changed_date'] = df.apply(lambda x:x.release_date.replace(year=x.release_year),axis=1)
print(df)

输出

  release_date  release_year changed_date
0   1901-01-01          2001   2001-01-01
1   1902-01-01          2002   2002-01-01
2   1903-01-01          2003   2003-01-01

注释 axis = axis = axis = 1 将均值函数应用于每一行,您将行( pandas.Series )作为该函数的参数

You need to use pandas.DataFrame.apply here rather than pandas.Series.apply as you need data from other column, consider following simple example

import datetime
import pandas as pd
df = pd.DataFrame({'release_date':[datetime.date(1901,1,1),datetime.date(1902,1,1),datetime.date(1903,1,1)],'release_year':[2001,2002,2003]})
df['changed_date'] = df.apply(lambda x:x.release_date.replace(year=x.release_year),axis=1)
print(df)

output

  release_date  release_year changed_date
0   1901-01-01          2001   2001-01-01
1   1902-01-01          2002   2002-01-01
2   1903-01-01          2003   2003-01-01

Note axis=1 which mean function is applied to each row and you got row (pandas.Series) as argument for that function

我恋#小黄人 2025-01-26 13:42:34

转换为字符串然后解析为日期时间在这里更有效;如果你问我的话,也更具可读性。前任:

import datetime
import pandas as pd

N = 100000

df = pd.DataFrame({'release_date':[datetime.date(1901,1,1),datetime.date(1902,1,1),datetime.date(1903,1,1)]*N,
                   'release_year':[2001,2002,2003]*N})

df['changed_date'] = pd.to_datetime(
        df['release_year'].astype(str) + df['release_date'].astype(str).str[5:],
        format="%Y%m-%d"
    )

df['changed_date']
Out[176]: 
0        2001-01-01
1        2002-01-01
2        2003-01-01
3        2001-01-01
4        2002-01-01
   
299995   2002-01-01
299996   2003-01-01
299997   2001-01-01
299998   2002-01-01
299999   2003-01-01
Name: changed_date, Length: 300000, dtype: datetime64[ns]
>>> %timeit df['changed_date'] = df.apply(lambda x:x.release_date.replace(year=x.release_year),axis=1)
6.73 s ± 542 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit df['changed_date'] = pd.to_datetime(df['release_year'].astype(str)+df['release_date'].astype(str).str[5:], format="%Y%m-%d")
651 ms ± 78.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

casting to string then parsing to datetime is more efficient here; and also more readable if you ask me. Ex:

import datetime
import pandas as pd

N = 100000

df = pd.DataFrame({'release_date':[datetime.date(1901,1,1),datetime.date(1902,1,1),datetime.date(1903,1,1)]*N,
                   'release_year':[2001,2002,2003]*N})

df['changed_date'] = pd.to_datetime(
        df['release_year'].astype(str) + df['release_date'].astype(str).str[5:],
        format="%Y%m-%d"
    )

df['changed_date']
Out[176]: 
0        2001-01-01
1        2002-01-01
2        2003-01-01
3        2001-01-01
4        2002-01-01
   
299995   2002-01-01
299996   2003-01-01
299997   2001-01-01
299998   2002-01-01
299999   2003-01-01
Name: changed_date, Length: 300000, dtype: datetime64[ns]
>>> %timeit df['changed_date'] = df.apply(lambda x:x.release_date.replace(year=x.release_year),axis=1)
6.73 s ± 542 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit df['changed_date'] = pd.to_datetime(df['release_year'].astype(str)+df['release_date'].astype(str).str[5:], format="%Y%m-%d")
651 ms ± 78.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文