熊猫 - 按列值分组和检测值

发布于 2025-02-07 19:13:33 字数 908 浏览 0 评论 0原文

我有一个数据框架：

data = [['123', 'Yes', 'No', 'No'], ['123', 'No', 'Yes', 'No'],['1234', 'No', 'Yes', 'No']]
df = pd.DataFrame(data, columns=['ID', 'Object_1', 'Object_2', 'Object_3'])

id	Object_1	Object_2	Object_3
123	是否	否	123
123	否	否	是
1234	否	是否	我想通过ID列分组，

尽管Object_1，Object_2和Object_3的值可能不同。如果存在“是”的值，我希望它保留在最终的分组数据框架中。

是具有以下值的数据框架：

id	Object_1	Object_2	Object_3
123	是否	是	所需的输出将
1234	否	是否	是

原文

I have a data frame as such:

data = [['123', 'Yes', 'No', 'No'], ['123', 'No', 'Yes', 'No'],['1234', 'No', 'Yes', 'No']]
df = pd.DataFrame(data, columns=['ID', 'Object_1', 'Object_2', 'Object_3'])

ID	Object_1	Object_2	Object_3
123	Yes	No	No
123	No	Yes	No
1234	No	Yes	No

I want to group by the ID column though the values for Object_1, Object_2 and Object_3 may be different. If the value 'Yes' exists, I would like that remain in the final grouped dataframe.

Desired output would be a dataframe with the following values:

ID	Object_1	Object_2	Object_3
123	Yes	Yes	No
1234	No	Yes	No

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泅人 2025-02-14 19:13:33

您可以利用是在no之后进行词典分类的事实：

df.groupby('ID', as_index=False).max()

输出：

     ID Object_1 Object_2 Object_3
0   123      Yes      Yes       No
1  1234       No      Yes       No

更健壮/通用方式

您可以使用有序的分类类型来处理任何值，甚至更多大于两个（例如，否/也许/是）：

# convert to Categorical
df.update(df.filter(like='Object')
            .apply(pd.Categorical,
                   categories=['No', 'Yes'],
                   ordered=True))

# get max per group
df.groupby('ID', as_index=False).max()

You can take advantage of the fact that Yes is lexicographically sorted after No:

df.groupby('ID', as_index=False).max()

Output:

     ID Object_1 Object_2 Object_3
0   123      Yes      Yes       No
1  1234       No      Yes       No

more robust/generic way

You can use an ordered Categorical type to handle any values, even more than two (e.g, No/Maybe/Yes):

# convert to Categorical
df.update(df.filter(like='Object')
            .apply(pd.Categorical,
                   categories=['No', 'Yes'],
                   ordered=True))

# get max per group
df.groupby('ID', as_index=False).max()

回复收藏 0 原文

~没有更多了~