DataFrame 选择所有类别中符合条件的用户
我有以下 DataFrame:
user category x y
0 AB A 1 1
1 EF A 1 1
2 SG A 1 0
3 MN A 1 0
4 AB B 0 0
5 EF B 0 1
6 SG B 0 1
7 MN B 0 0
8 AB C 1 1
9 EF C 1 1
10 SG C 1 1
11 MN C 1 1
我想选择所有类别中都有 x=y
的用户。我可以使用以下代码来做到这一点:
data = pd.DataFrame({'user': ['AB', 'EF', 'SG', 'MN', 'AB', 'EF',
'SG', 'MN', 'AB', 'EF', 'SG', 'MN'],
'category': ['A', 'A', 'A', 'A', 'B', 'B',
'B', 'B', 'C', 'C', 'C', 'C'],
'x': [1,1,1,1, 0,0,0,0, 1,1,1,1],
'y': [1,1,0,0, 0,1,1,0, 1,1,1,1]})
data = data[data['x'] == data['y']][['user', 'category']]
count_users_match = data.groupby('user', as_index=False).count()
count_cat = data['category'].unique().shape[0]
print(count_users_match[count_users_match['category'] == count_cat])
输出:
user category
0 AB 3
我觉得这是一个相当长的解决方案。有没有更短的方法来实现这一目标?
I have the following DataFrame:
user category x y
0 AB A 1 1
1 EF A 1 1
2 SG A 1 0
3 MN A 1 0
4 AB B 0 0
5 EF B 0 1
6 SG B 0 1
7 MN B 0 0
8 AB C 1 1
9 EF C 1 1
10 SG C 1 1
11 MN C 1 1
I want to select users that have x=y
in all categories. I was able to do that using the following code:
data = pd.DataFrame({'user': ['AB', 'EF', 'SG', 'MN', 'AB', 'EF',
'SG', 'MN', 'AB', 'EF', 'SG', 'MN'],
'category': ['A', 'A', 'A', 'A', 'B', 'B',
'B', 'B', 'C', 'C', 'C', 'C'],
'x': [1,1,1,1, 0,0,0,0, 1,1,1,1],
'y': [1,1,0,0, 0,1,1,0, 1,1,1,1]})
data = data[data['x'] == data['y']][['user', 'category']]
count_users_match = data.groupby('user', as_index=False).count()
count_cat = data['category'].unique().shape[0]
print(count_users_match[count_users_match['category'] == count_cat])
Output:
user category
0 AB 3
I felt that this is a quite long solution. Is there any shorter way to achieve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
试试这个:
输出:
Try this:
Output:
我们可以使用
query
+groupby
+size
来查找每个用户的匹配类别数量。然后将其与每个用户的类别数量进行比较:输出:
We could use
query
+groupby
+size
to find the number of matching categories for each user. Then compare it with the number of categories for each user:Output:
这是一种更紧凑的方法,但我不知道它是否也更有效。
This is a more compact way to do it, but I don't know if it is also more efficient.