按月小组，基于列的总和行，并保留其他列

发布于 2025-01-17 21:16:27 字数 1750 浏览 0 评论 0原文

我有一个 DataFrame df 如下：

|size    | date        | name | type     | revenue |
|10      | 13/12/2021  | A    | Standard | 0,2     |
|248743  | 15/12/2021  | A    | Standard | 0,2     |
|234     | 03/12/2022  | A    | Basic    | 0,1     |
|8734684 | 31/03/2022  | B    | Basic    | 0,1     |
|3589749 | 01/04/2021  | C    | Basic    | 0,4     |
|3356943 | 02/04/2021  | A    | Basic    | 0,1     |
|6908746 | 21/04/2021  | A    | Basic    | 0,1     |
|2375940 | 21/02/2022  | D    | Premium  | 0,7     |
|9387295 | 21/02/2022  | D    | Premium  | 0,7     |
|286432  | 21/02/2022  | D    | Premium  | 0,7     |
|192     | 31/03/2022  | D    | Premium  | 0,7     |
|486     | 18/02/2022  | E    | Standard | 0,9     |
|23847   | 24/10/2021  | F    | Basic    | 0,3     |
|82346   | 12/11/2021  | B    | Premium  | 0,5     |
|28352   | 03/01/2022  | A    | Basic    | 0,1     |

我需要按月分组，其中名称和类型相同的行的大小总和：

|size    | date | name | type     | revenue |
|28352   | Jan  | A    | Basic    | 0,1     |
|486     | Feb  | E    | Standard | 0,9     |
|12049667| Feb  | D    | Premium  | 0,7     |
|192     | Mar  | D    | Premium  | 0,7     |
|8734684 | Mar  | B    | Basic    | 0,1     |
|3589749 | Apr  | C    | Basic    | 0,4     |
|10265689| Apr  | A    | Basic    | 0,1     |
|23847   | Oct  | F    | Basic    | 0,3     |
|82346   | Nov  | B    | Premium  | 0,5     |
|248753  | Dec  | A    | Standard | 0,2     |
|234     | Dec  | A    | Basic    | 0,1     |

我尝试了此代码，但它不起作用：

df['date'] = pd.to_datetime(df['date'])
df1 = df.groupby(df['date'].dt.strftime('%B'))['size'].sum()
df2 = df1.groupby(['date', 'name', 'type', 'revenue'],as_index=False).sum()

我该怎么做？

原文

I have a DataFrame df as follows:

|size    | date        | name | type     | revenue |
|10      | 13/12/2021  | A    | Standard | 0,2     |
|248743  | 15/12/2021  | A    | Standard | 0,2     |
|234     | 03/12/2022  | A    | Basic    | 0,1     |
|8734684 | 31/03/2022  | B    | Basic    | 0,1     |
|3589749 | 01/04/2021  | C    | Basic    | 0,4     |
|3356943 | 02/04/2021  | A    | Basic    | 0,1     |
|6908746 | 21/04/2021  | A    | Basic    | 0,1     |
|2375940 | 21/02/2022  | D    | Premium  | 0,7     |
|9387295 | 21/02/2022  | D    | Premium  | 0,7     |
|286432  | 21/02/2022  | D    | Premium  | 0,7     |
|192     | 31/03/2022  | D    | Premium  | 0,7     |
|486     | 18/02/2022  | E    | Standard | 0,9     |
|23847   | 24/10/2021  | F    | Basic    | 0,3     |
|82346   | 12/11/2021  | B    | Premium  | 0,5     |
|28352   | 03/01/2022  | A    | Basic    | 0,1     |

I need to group by month with the size sum for rows which name and type are the same:

|size    | date | name | type     | revenue |
|28352   | Jan  | A    | Basic    | 0,1     |
|486     | Feb  | E    | Standard | 0,9     |
|12049667| Feb  | D    | Premium  | 0,7     |
|192     | Mar  | D    | Premium  | 0,7     |
|8734684 | Mar  | B    | Basic    | 0,1     |
|3589749 | Apr  | C    | Basic    | 0,4     |
|10265689| Apr  | A    | Basic    | 0,1     |
|23847   | Oct  | F    | Basic    | 0,3     |
|82346   | Nov  | B    | Premium  | 0,5     |
|248753  | Dec  | A    | Standard | 0,2     |
|234     | Dec  | A    | Basic    | 0,1     |

I tried this code but it did not work:

df['date'] = pd.to_datetime(df['date'])
df1 = df.groupby(df['date'].dt.strftime('%B'))['size'].sum()
df2 = df1.groupby(['date', 'name', 'type', 'revenue'],as_index=False).sum()

How can I do it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

彩扇题诗 2025-01-24 21:16:27

IIUC，您需要一个groupby。您需要将“收入”列重新设计为数字。

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

group = df['date'].dt.strftime('%b')

(df.assign(revenue=pd.to_numeric(df['revenue'].str.replace(',', '.')))
   .groupby([group, 'name', 'type'])
   .agg('sum')
   .reset_index()
 )

输出：

   date name      type      size  revenue
0   Apr    A     Basic   6908746      0.1
1   Dec    A  Standard    248753      0.4
2   Dec    B   Premium     82346      0.5
3   Feb    A     Basic   3356943      0.1
4   Feb    D   Premium  12049667      2.1
5   Feb    E  Standard       486      0.9
6   Jan    C     Basic   3589749      0.4
7   Mar    A     Basic     28586      0.2
8   Mar    B     Basic   8734684      0.1
9   Mar    D   Premium       192      0.7
10  Oct    F     Basic     23847      0.3

请注意，上面是将不同年份的月份聚合到同一组中。如果您想将年份分开，请使用句点：

group = df['date'].dt.to_period('M')

输出：

       date name      type      size  revenue
0   2021-01    C     Basic   3589749      0.4
1   2021-02    A     Basic   3356943      0.1
2   2021-04    A     Basic   6908746      0.1
3   2021-10    F     Basic     23847      0.3
4   2021-12    A  Standard    248753      0.4
5   2021-12    B   Premium     82346      0.5
6   2022-02    D   Premium  12049667      2.1
7   2022-02    E  Standard       486      0.9
8   2022-03    A     Basic     28586      0.2
9   2022-03    B     Basic   8734684      0.1
10  2022-03    D   Premium       192      0.7

IIUC, you need a single groupby. You need to rework your "revenue" column as numeric.

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

group = df['date'].dt.strftime('%b')

(df.assign(revenue=pd.to_numeric(df['revenue'].str.replace(',', '.')))
   .groupby([group, 'name', 'type'])
   .agg('sum')
   .reset_index()
 )

Output:

   date name      type      size  revenue
0   Apr    A     Basic   6908746      0.1
1   Dec    A  Standard    248753      0.4
2   Dec    B   Premium     82346      0.5
3   Feb    A     Basic   3356943      0.1
4   Feb    D   Premium  12049667      2.1
5   Feb    E  Standard       486      0.9
6   Jan    C     Basic   3589749      0.4
7   Mar    A     Basic     28586      0.2
8   Mar    B     Basic   8734684      0.1
9   Mar    D   Premium       192      0.7
10  Oct    F     Basic     23847      0.3

Note that the above is aggregating months of different years into the same group. If you want to keep years separate, use a period:

group = df['date'].dt.to_period('M')

Output:

       date name      type      size  revenue
0   2021-01    C     Basic   3589749      0.4
1   2021-02    A     Basic   3356943      0.1
2   2021-04    A     Basic   6908746      0.1
3   2021-10    F     Basic     23847      0.3
4   2021-12    A  Standard    248753      0.4
5   2021-12    B   Premium     82346      0.5
6   2022-02    D   Premium  12049667      2.1
7   2022-02    E  Standard       486      0.9
8   2022-03    A     Basic     28586      0.2
9   2022-03    B     Basic   8734684      0.1
10  2022-03    D   Premium       192      0.7

回复收藏 0 原文

~没有更多了~