如何根据日期和名称转换熊猫数据框

发布于 2025-02-13 12:43:36 字数 643 浏览 3 评论 0原文

我正在尝试不用而没有达到我所需的输出的数据框架,这将感谢一些帮助:

输入数据:

date         name   value
2022-07-01   Anna    5
2022-07-01   Jim     3
2022-04-29   Anna    4
2022-04-29   Jim     2
2022-04-21   Anna    1
2021-07-29   Jim     5
2021-07-29   Anna    5

所需的输出:

Name         Last_30_days   Last_365_days
Anna            5              15
Jim             3              10

我最接近的尝试:

df_new = (df.set_index('date')
        .groupby('name', sort=False)
        .resample('MS')['value'].sum()
        .groupby(level=0)
        .cumsum()
        .reset_index(name='value'))

I'm trying to transform a dataframe without but haven't achieved my desired output, would appreciate some help:

Input data:

date         name   value
2022-07-01   Anna    5
2022-07-01   Jim     3
2022-04-29   Anna    4
2022-04-29   Jim     2
2022-04-21   Anna    1
2021-07-29   Jim     5
2021-07-29   Anna    5

Desired output:

Name         Last_30_days   Last_365_days
Anna            5              15
Jim             3              10

My closest attempt :

df_new = (df.set_index('date')
        .groupby('name', sort=False)
        .resample('MS')['value'].sum()
        .groupby(level=0)
        .cumsum()
        .reset_index(name='value'))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

多像笑话 2025-02-20 12:43:36

计算新的二进制列,以检查是否包括当天,然后枢纽,然后con绕所有选定的一天差异。

days_select = [30, 60, 365]
pd.concat([df.assign(in_last_n_days=(pd.Timestamp.today() - df.date) < pd.Timedelta(f"{ndays}d"))
             .query("in_last_n_days")
             .pivot_table(index="name", values="value", aggfunc=sum)
             .rename(columns={"value": f"last_{ndays}_days"})
           for ndays in days_select], axis=1)

      last_30_days  last_60_days  last_365_days
name                                           
Anna             5             5             15
Jim              3             3             10

Calculate new binary column to check whether the day is included, then pivot, then concat all selected day differences.

days_select = [30, 60, 365]
pd.concat([df.assign(in_last_n_days=(pd.Timestamp.today() - df.date) < pd.Timedelta(f"{ndays}d"))
             .query("in_last_n_days")
             .pivot_table(index="name", values="value", aggfunc=sum)
             .rename(columns={"value": f"last_{ndays}_days"})
           for ndays in days_select], axis=1)

      last_30_days  last_60_days  last_365_days
name                                           
Anna             5             5             15
Jim              3             3             10
请持续率性 2025-02-20 12:43:36

尝试以下尝试:

daydiff = (pd.Timestamp('now') - df['date']).dt.days
bins = [0, 30, 365, np.inf]
labels = [f'Last_{i}_days' for i in bins[1:-1]] + ['Greater than 365 days']
df['period'] = pd.cut(daydiff, bins=bins, labels=labels)
df_out = df.pivot_table('value', 'name', 'period', aggfunc='sum')
print(df_out)

输出:

period  Last_30_days  Last_365_days  Greater than 365 days
name                                                     
Anna               5             10                      0
Jim                3              7                      0

并且,您可以添加一个暨以沿行累积总数:

daydiff = (pd.Timestamp('now') - df['date']).dt.days
bins = [0,30,365,np.inf]
labels = [f'Last_{i}_days' for i in bins[1:-1]] + ['Greater than 365 days']
df['period'] = pd.cut(daydiff, bins=bins, labels=labels)
df_out = df.pivot_table('value', 'name', 'period', aggfunc='sum')
df_out = df_out.cumsum(axis=1)
print(df_out)

输出:

period  Last_30_days  Last_365_days  Greater than 365 days
name                                                     
Anna               5             15                     15
Jim                3             10                     10

Try this:

daydiff = (pd.Timestamp('now') - df['date']).dt.days
bins = [0, 30, 365, np.inf]
labels = [f'Last_{i}_days' for i in bins[1:-1]] + ['Greater than 365 days']
df['period'] = pd.cut(daydiff, bins=bins, labels=labels)
df_out = df.pivot_table('value', 'name', 'period', aggfunc='sum')
print(df_out)

Output:

period  Last_30_days  Last_365_days  Greater than 365 days
name                                                     
Anna               5             10                      0
Jim                3              7                      0

And, you can add a cumsum to get cumulative totals along the rows:

daydiff = (pd.Timestamp('now') - df['date']).dt.days
bins = [0,30,365,np.inf]
labels = [f'Last_{i}_days' for i in bins[1:-1]] + ['Greater than 365 days']
df['period'] = pd.cut(daydiff, bins=bins, labels=labels)
df_out = df.pivot_table('value', 'name', 'period', aggfunc='sum')
df_out = df_out.cumsum(axis=1)
print(df_out)

Output:

period  Last_30_days  Last_365_days  Greater than 365 days
name                                                     
Anna               5             15                     15
Jim                3             10                     10
深爱成瘾 2025-02-20 12:43:36
df_pivot = pd.pivot_table(df, values='value',index='name',columns='date')
df_pivot.fillna(0,inplace=True)
expected_df  = pd.DataFrame(df_pivot['2021-07-29'] + df_pivot['2022-04-21'] + df_pivot['2022-04-29'] + df_pivot['2022-07-01'])
expected_df['last_365_days'] = df_pivot['2021-07-29'] + df_pivot['2022-04-21'] + df_pivot['2022-04-29'] + df_pivot['2022-07-01']
del expected_df[0]
expected_df['last_30_days'] = df_pivot['2022-07-01']
df_pivot = pd.pivot_table(df, values='value',index='name',columns='date')
df_pivot.fillna(0,inplace=True)
expected_df  = pd.DataFrame(df_pivot['2021-07-29'] + df_pivot['2022-04-21'] + df_pivot['2022-04-29'] + df_pivot['2022-07-01'])
expected_df['last_365_days'] = df_pivot['2021-07-29'] + df_pivot['2022-04-21'] + df_pivot['2022-04-29'] + df_pivot['2022-07-01']
del expected_df[0]
expected_df['last_30_days'] = df_pivot['2022-07-01']
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文