Pandas 将行组合成由斜杠分隔的字符串并由其他一些列聚合

发布于 2025-01-20 22:08:57 字数 870 浏览 0 评论 0原文

我有初始的DF,我想将“组合”列汇总成一个唯一的字符串,被斜线隔开,但尊重此类指示的顺序。 在所需的数据中,您可以找到我的最终目标数据集


raw_data = {'name': ['B','B','A','A','A','A','C'],
'date' : pd.to_datetime(pd.Series(['2017-04-03','2017-04-03','2017-03-31','2017-03-31','2017-03-31','2017-04-04','2017-04-04'])),
        'order': [2,1,4,2,1,1,1],
           'combo': ['x','y','x','y','z','x','x']}
df = pd.DataFrame(raw_data, columns = ['name','date','order','combo'])
df=df.sort_values(["name","date","order"])
df


desired_raw = {'name': ['A','A','B','C'],
'date' : pd.to_datetime(pd.Series(['2017-03-31','2017-04-04','2017-04-03','2017-04-04'])),
'combined_combo': ["z/y/x","x","y/x","x"]}

desired_data = pd.DataFrame(desired_raw, columns = ['name','date','combined_combo'])

desired_data

#what I did until now

df1 = df.groupby(['name','date'])['combo'].apply(list).reset_index(name='new')
df1

I have the initial df and I want to aggregate the 'combo' column into a unique string, separated by slashes, but respecting the order indicated in the sort.
In desired data you can find my final target dataset


raw_data = {'name': ['B','B','A','A','A','A','C'],
'date' : pd.to_datetime(pd.Series(['2017-04-03','2017-04-03','2017-03-31','2017-03-31','2017-03-31','2017-04-04','2017-04-04'])),
        'order': [2,1,4,2,1,1,1],
           'combo': ['x','y','x','y','z','x','x']}
df = pd.DataFrame(raw_data, columns = ['name','date','order','combo'])
df=df.sort_values(["name","date","order"])
df


desired_raw = {'name': ['A','A','B','C'],
'date' : pd.to_datetime(pd.Series(['2017-03-31','2017-04-04','2017-04-03','2017-04-04'])),
'combined_combo': ["z/y/x","x","y/x","x"]}

desired_data = pd.DataFrame(desired_raw, columns = ['name','date','combined_combo'])

desired_data

#what I did until now

df1 = df.groupby(['name','date'])['combo'].apply(list).reset_index(name='new')
df1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

淡水深流 2025-01-27 22:08:57

这是一种方法:

combined_combo = df.groupby(['name', 'date'])['combo'].agg('/'.join).rename('combined_combo')
print(combined_combo)

OUT:

name  date      
A     2017-03-31    z/y/x
      2017-04-04        x
B     2017-04-03      y/x
C     2017-04-04        x
Name: combined_combo, dtype: object

如果您不希望组作为索引使用:

desired_data = combined_combo.reset_index()

Here is one way:

combined_combo = df.groupby(['name', 'date'])['combo'].agg('/'.join).rename('combined_combo')
print(combined_combo)

Out:

name  date      
A     2017-03-31    z/y/x
      2017-04-04        x
B     2017-04-03      y/x
C     2017-04-04        x
Name: combined_combo, dtype: object

If you don't want the groups as the index use:

desired_data = combined_combo.reset_index()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文