Pandas:累计对组内和跨另一个组的行进行编号
给定以下数据框:
col_1 col_2 col_3
0 1 A 1
1 1 B 1
2 2 A 3
3 2 A 3
4 2 A 3
5 2 B 3
6 2 B 3
7 2 B 3
8 3 A 2
9 3 A 2
10 3 C 2
11 3 C 2
我需要创建一个新列,其中的行在“col_1”和“col_2”形成的每个组内累积编号,而且在每个“col_1”组之后累积编号,如下所示:
col_1 col_2 col_3 new
0 1 A 1 1
1 1 B 1 1
2 2 A 3 2
3 2 A 3 3
4 2 A 3 4
5 2 B 3 2
6 2 B 3 3
7 2 B 3 4
8 3 A 2 5
9 3 A 2 6
10 3 C 2 5
11 3 C 2 6
我尝试过:
df['new'] = df.groupby(['col_1', 'col_2']).cumcount() + 1
但这并没有按预期与前一组相加。
Given the following dataframe:
col_1 col_2 col_3
0 1 A 1
1 1 B 1
2 2 A 3
3 2 A 3
4 2 A 3
5 2 B 3
6 2 B 3
7 2 B 3
8 3 A 2
9 3 A 2
10 3 C 2
11 3 C 2
I need to create a new column in which the rows are numbered cumulatively within each group formed by 'col_1' and 'col_2', but also cumulatively after each group of 'col_1', like this:
col_1 col_2 col_3 new
0 1 A 1 1
1 1 B 1 1
2 2 A 3 2
3 2 A 3 3
4 2 A 3 4
5 2 B 3 2
6 2 B 3 3
7 2 B 3 4
8 3 A 2 5
9 3 A 2 6
10 3 C 2 5
11 3 C 2 6
I've tried:
df['new'] = df.groupby(['col_1', 'col_2']).cumcount() + 1
But this doesn't add up from the previous group as intended.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个棘手的问题。您想计算组内的量集,但是对于所有后续组,您需要跟踪已经增加了多少,以便您知道要应用的偏移。可以使用以前的组上的
max
+ <代码> cumscount 的cumsum 。在这里,唯一的并发症是您需要确定以前的组标签和后续标签之间的关系,如果Susbequent组的标签之间没有简单的 + 1增量。This is a tricky problem. You want to calculate the cumcount within group, but for all subsequent groups you need to keep track of how much that was already incremented so you know the offset to apply. That can be done with a
max
+cumsum
of thiscumcount
over the previous groups. Here the only complication is that you need to determine the relationship between previous and subsequent group labels, in case there isn't some simple + 1 increment between labels of susbequent groups.您可以使用两个连续的
groupby
,一个在两列上,第二个在第一组上仅通过 col_1:输出:
You can use two consecutive
groupby
, one on the two columns, the second on the first group only by col_1:output: