按月小组,基于列的总和行,并保留其他列
我有一个 DataFrame df 如下:
|size | date | name | type | revenue |
|10 | 13/12/2021 | A | Standard | 0,2 |
|248743 | 15/12/2021 | A | Standard | 0,2 |
|234 | 03/12/2022 | A | Basic | 0,1 |
|8734684 | 31/03/2022 | B | Basic | 0,1 |
|3589749 | 01/04/2021 | C | Basic | 0,4 |
|3356943 | 02/04/2021 | A | Basic | 0,1 |
|6908746 | 21/04/2021 | A | Basic | 0,1 |
|2375940 | 21/02/2022 | D | Premium | 0,7 |
|9387295 | 21/02/2022 | D | Premium | 0,7 |
|286432 | 21/02/2022 | D | Premium | 0,7 |
|192 | 31/03/2022 | D | Premium | 0,7 |
|486 | 18/02/2022 | E | Standard | 0,9 |
|23847 | 24/10/2021 | F | Basic | 0,3 |
|82346 | 12/11/2021 | B | Premium | 0,5 |
|28352 | 03/01/2022 | A | Basic | 0,1 |
我需要按月分组,其中名称和类型相同的行的大小总和:
|size | date | name | type | revenue |
|28352 | Jan | A | Basic | 0,1 |
|486 | Feb | E | Standard | 0,9 |
|12049667| Feb | D | Premium | 0,7 |
|192 | Mar | D | Premium | 0,7 |
|8734684 | Mar | B | Basic | 0,1 |
|3589749 | Apr | C | Basic | 0,4 |
|10265689| Apr | A | Basic | 0,1 |
|23847 | Oct | F | Basic | 0,3 |
|82346 | Nov | B | Premium | 0,5 |
|248753 | Dec | A | Standard | 0,2 |
|234 | Dec | A | Basic | 0,1 |
我尝试了此代码,但它不起作用:
df['date'] = pd.to_datetime(df['date'])
df1 = df.groupby(df['date'].dt.strftime('%B'))['size'].sum()
df2 = df1.groupby(['date', 'name', 'type', 'revenue'],as_index=False).sum()
我该怎么做?
I have a DataFrame df as follows:
|size | date | name | type | revenue |
|10 | 13/12/2021 | A | Standard | 0,2 |
|248743 | 15/12/2021 | A | Standard | 0,2 |
|234 | 03/12/2022 | A | Basic | 0,1 |
|8734684 | 31/03/2022 | B | Basic | 0,1 |
|3589749 | 01/04/2021 | C | Basic | 0,4 |
|3356943 | 02/04/2021 | A | Basic | 0,1 |
|6908746 | 21/04/2021 | A | Basic | 0,1 |
|2375940 | 21/02/2022 | D | Premium | 0,7 |
|9387295 | 21/02/2022 | D | Premium | 0,7 |
|286432 | 21/02/2022 | D | Premium | 0,7 |
|192 | 31/03/2022 | D | Premium | 0,7 |
|486 | 18/02/2022 | E | Standard | 0,9 |
|23847 | 24/10/2021 | F | Basic | 0,3 |
|82346 | 12/11/2021 | B | Premium | 0,5 |
|28352 | 03/01/2022 | A | Basic | 0,1 |
I need to group by month with the size sum for rows which name and type are the same:
|size | date | name | type | revenue |
|28352 | Jan | A | Basic | 0,1 |
|486 | Feb | E | Standard | 0,9 |
|12049667| Feb | D | Premium | 0,7 |
|192 | Mar | D | Premium | 0,7 |
|8734684 | Mar | B | Basic | 0,1 |
|3589749 | Apr | C | Basic | 0,4 |
|10265689| Apr | A | Basic | 0,1 |
|23847 | Oct | F | Basic | 0,3 |
|82346 | Nov | B | Premium | 0,5 |
|248753 | Dec | A | Standard | 0,2 |
|234 | Dec | A | Basic | 0,1 |
I tried this code but it did not work:
df['date'] = pd.to_datetime(df['date'])
df1 = df.groupby(df['date'].dt.strftime('%B'))['size'].sum()
df2 = df1.groupby(['date', 'name', 'type', 'revenue'],as_index=False).sum()
How can I do it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
IIUC,您需要一个
groupby
。您需要将“收入”列重新设计为数字。输出:
请注意,上面是将不同年份的月份聚合到同一组中。如果您想将年份分开,请使用句点:
输出:
IIUC, you need a single
groupby
. You need to rework your "revenue" column as numeric.Output:
Note that the above is aggregating months of different years into the same group. If you want to keep years separate, use a period:
Output: