如何根据日期和名称转换熊猫数据框

发布于 2025-02-13 12:43:36 字数 643 浏览 3 评论 0原文

我正在尝试不用而没有达到我所需的输出的数据框架，这将感谢一些帮助：

输入数据：

date         name   value
2022-07-01   Anna    5
2022-07-01   Jim     3
2022-04-29   Anna    4
2022-04-29   Jim     2
2022-04-21   Anna    1
2021-07-29   Jim     5
2021-07-29   Anna    5

所需的输出：

Name         Last_30_days   Last_365_days
Anna            5              15
Jim             3              10

我最接近的尝试：

df_new = (df.set_index('date')
        .groupby('name', sort=False)
        .resample('MS')['value'].sum()
        .groupby(level=0)
        .cumsum()
        .reset_index(name='value'))

原文

I'm trying to transform a dataframe without but haven't achieved my desired output, would appreciate some help:

Input data:

date         name   value
2022-07-01   Anna    5
2022-07-01   Jim     3
2022-04-29   Anna    4
2022-04-29   Jim     2
2022-04-21   Anna    1
2021-07-29   Jim     5
2021-07-29   Anna    5

Desired output:

Name         Last_30_days   Last_365_days
Anna            5              15
Jim             3              10

My closest attempt :

df_new = (df.set_index('date')
        .groupby('name', sort=False)
        .resample('MS')['value'].sum()
        .groupby(level=0)
        .cumsum()
        .reset_index(name='value'))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

多像笑话 2025-02-20 12:43:36

计算新的二进制列，以检查是否包括当天，然后枢纽，然后con绕所有选定的一天差异。

days_select = [30, 60, 365]
pd.concat([df.assign(in_last_n_days=(pd.Timestamp.today() - df.date) < pd.Timedelta(f"{ndays}d"))
             .query("in_last_n_days")
             .pivot_table(index="name", values="value", aggfunc=sum)
             .rename(columns={"value": f"last_{ndays}_days"})
           for ndays in days_select], axis=1)

      last_30_days  last_60_days  last_365_days
name                                           
Anna             5             5             15
Jim              3             3             10

Calculate new binary column to check whether the day is included, then pivot, then concat all selected day differences.

days_select = [30, 60, 365]
pd.concat([df.assign(in_last_n_days=(pd.Timestamp.today() - df.date) < pd.Timedelta(f"{ndays}d"))
             .query("in_last_n_days")
             .pivot_table(index="name", values="value", aggfunc=sum)
             .rename(columns={"value": f"last_{ndays}_days"})
           for ndays in days_select], axis=1)

      last_30_days  last_60_days  last_365_days
name                                           
Anna             5             5             15
Jim              3             3             10

回复收藏 0 原文

请持续率性 2025-02-20 12:43:36

尝试以下尝试：

daydiff = (pd.Timestamp('now') - df['date']).dt.days
bins = [0, 30, 365, np.inf]
labels = [f'Last_{i}_days' for i in bins[1:-1]] + ['Greater than 365 days']
df['period'] = pd.cut(daydiff, bins=bins, labels=labels)
df_out = df.pivot_table('value', 'name', 'period', aggfunc='sum')
print(df_out)

输出：

period  Last_30_days  Last_365_days  Greater than 365 days
name                                                     
Anna               5             10                      0
Jim                3              7                      0

并且，您可以添加一个暨以沿行累积总数：

daydiff = (pd.Timestamp('now') - df['date']).dt.days
bins = [0,30,365,np.inf]
labels = [f'Last_{i}_days' for i in bins[1:-1]] + ['Greater than 365 days']
df['period'] = pd.cut(daydiff, bins=bins, labels=labels)
df_out = df.pivot_table('value', 'name', 'period', aggfunc='sum')
df_out = df_out.cumsum(axis=1)
print(df_out)

输出：

period  Last_30_days  Last_365_days  Greater than 365 days
name                                                     
Anna               5             15                     15
Jim                3             10                     10

Try this:

daydiff = (pd.Timestamp('now') - df['date']).dt.days
bins = [0, 30, 365, np.inf]
labels = [f'Last_{i}_days' for i in bins[1:-1]] + ['Greater than 365 days']
df['period'] = pd.cut(daydiff, bins=bins, labels=labels)
df_out = df.pivot_table('value', 'name', 'period', aggfunc='sum')
print(df_out)

Output:

period  Last_30_days  Last_365_days  Greater than 365 days
name                                                     
Anna               5             10                      0
Jim                3              7                      0

And, you can add a cumsum to get cumulative totals along the rows:

daydiff = (pd.Timestamp('now') - df['date']).dt.days
bins = [0,30,365,np.inf]
labels = [f'Last_{i}_days' for i in bins[1:-1]] + ['Greater than 365 days']
df['period'] = pd.cut(daydiff, bins=bins, labels=labels)
df_out = df.pivot_table('value', 'name', 'period', aggfunc='sum')
df_out = df_out.cumsum(axis=1)
print(df_out)

Output:

period  Last_30_days  Last_365_days  Greater than 365 days
name                                                     
Anna               5             15                     15
Jim                3             10                     10

回复收藏 0 原文

深爱成瘾 2025-02-20 12:43:36

df_pivot = pd.pivot_table(df, values='value',index='name',columns='date')
df_pivot.fillna(0,inplace=True)
expected_df  = pd.DataFrame(df_pivot['2021-07-29'] + df_pivot['2022-04-21'] + df_pivot['2022-04-29'] + df_pivot['2022-07-01'])
expected_df['last_365_days'] = df_pivot['2021-07-29'] + df_pivot['2022-04-21'] + df_pivot['2022-04-29'] + df_pivot['2022-07-01']
del expected_df[0]
expected_df['last_30_days'] = df_pivot['2022-07-01']

df_pivot = pd.pivot_table(df, values='value',index='name',columns='date')
df_pivot.fillna(0,inplace=True)
expected_df  = pd.DataFrame(df_pivot['2021-07-29'] + df_pivot['2022-04-21'] + df_pivot['2022-04-29'] + df_pivot['2022-07-01'])
expected_df['last_365_days'] = df_pivot['2021-07-29'] + df_pivot['2022-04-21'] + df_pivot['2022-04-29'] + df_pivot['2022-07-01']
del expected_df[0]
expected_df['last_30_days'] = df_pivot['2022-07-01']

回复收藏 0 原文

~没有更多了~