形成子组并根据值增加/减少
我的初始数据框如下所示(除了列 Counter
):
Index | User | Status | Counter |
---|---|---|---|
1 | John | A | 1 |
2 | Ellen | A | 1 |
3 | John | B | 0 |
4 | Ellen | A | 2 |
5 | John | A | 1 |
6 | John | A | 2 |
7 | John | A | 3 |
8 | 约翰 | A | 4 |
9 | 艾伦 | A | 3 |
10 | 约翰 | B | 3 |
11 | 艾伦 | B | 2 |
12 | 艾伦 | C | 1 |
13 | 艾伦 | A | 2 |
14 | Ellen | A | 3 |
在本例中,我有两个用户 (John/Ellen
)。事实上,用户还有更多。
Counter
列是我要实现的目标。如果我只有一个用户,代码将如下所示:
count = 0
CounterList = []
for i, row in df.iterrows():
if row["Status"] == "A":
count += 1
elif row["Status"] == "B" or row["Status"] == "C":
count -= 1
CounterList.append(count)
df["Counter"] = CounterList
df
对于状态 A,计数器加 1,对于状态 B 或 C,计数器减 1。
但如何处理两个或更多用户呢?如何建立子组并分别统计每个用户子组?
My intial Dataframe looks as follows (except column Counter
):
Index | User | Status | Counter |
---|---|---|---|
1 | John | A | 1 |
2 | Ellen | A | 1 |
3 | John | B | 0 |
4 | Ellen | A | 2 |
5 | John | A | 1 |
6 | John | A | 2 |
7 | John | A | 3 |
8 | John | A | 4 |
9 | Ellen | A | 3 |
10 | John | B | 3 |
11 | Ellen | B | 2 |
12 | Ellen | C | 1 |
13 | Ellen | A | 2 |
14 | Ellen | A | 3 |
In this case I have two users (John/Ellen
). In fact, there are way more users.
The Counter
column is my goal to achieve. If I had only one user, the code would look like this:
count = 0
CounterList = []
for i, row in df.iterrows():
if row["Status"] == "A":
count += 1
elif row["Status"] == "B" or row["Status"] == "C":
count -= 1
CounterList.append(count)
df["Counter"] = CounterList
df
With status A it counts up by 1, with status B or C the counter is reduced by one.
But how to handle two or more users? How to build subgroups and counting each user-subgroup separately?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
创建 Status 到对应分数的映射;然后
groupby
+cumsum
:输出:
Create a mapping from Status to its corresponding score; then
groupby
+cumsum
:Output:
您可以检查 Status 是否等于 A 并映射 1,否则为 -1。然后对每组执行 cumsum:
输出:
You can check if the Status is equal to A and map 1, -1 otherwise. Then perform a cumsum per group:
output: