Pandas 将行组合成由斜杠分隔的字符串并由其他一些列聚合
我有初始的DF,我想将“组合”列汇总成一个唯一的字符串,被斜线隔开,但尊重此类指示的顺序。 在所需的数据中,您可以找到我的最终目标数据集
raw_data = {'name': ['B','B','A','A','A','A','C'],
'date' : pd.to_datetime(pd.Series(['2017-04-03','2017-04-03','2017-03-31','2017-03-31','2017-03-31','2017-04-04','2017-04-04'])),
'order': [2,1,4,2,1,1,1],
'combo': ['x','y','x','y','z','x','x']}
df = pd.DataFrame(raw_data, columns = ['name','date','order','combo'])
df=df.sort_values(["name","date","order"])
df
desired_raw = {'name': ['A','A','B','C'],
'date' : pd.to_datetime(pd.Series(['2017-03-31','2017-04-04','2017-04-03','2017-04-04'])),
'combined_combo': ["z/y/x","x","y/x","x"]}
desired_data = pd.DataFrame(desired_raw, columns = ['name','date','combined_combo'])
desired_data
#what I did until now
df1 = df.groupby(['name','date'])['combo'].apply(list).reset_index(name='new')
df1
I have the initial df and I want to aggregate the 'combo' column into a unique string, separated by slashes, but respecting the order indicated in the sort.
In desired data you can find my final target dataset
raw_data = {'name': ['B','B','A','A','A','A','C'],
'date' : pd.to_datetime(pd.Series(['2017-04-03','2017-04-03','2017-03-31','2017-03-31','2017-03-31','2017-04-04','2017-04-04'])),
'order': [2,1,4,2,1,1,1],
'combo': ['x','y','x','y','z','x','x']}
df = pd.DataFrame(raw_data, columns = ['name','date','order','combo'])
df=df.sort_values(["name","date","order"])
df
desired_raw = {'name': ['A','A','B','C'],
'date' : pd.to_datetime(pd.Series(['2017-03-31','2017-04-04','2017-04-03','2017-04-04'])),
'combined_combo': ["z/y/x","x","y/x","x"]}
desired_data = pd.DataFrame(desired_raw, columns = ['name','date','combined_combo'])
desired_data
#what I did until now
df1 = df.groupby(['name','date'])['combo'].apply(list).reset_index(name='new')
df1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一种方法:
OUT:
如果您不希望组作为索引使用:
Here is one way:
Out:
If you don't want the groups as the index use: