DataFrame 选择所有类别中符合条件的用户

发布于 2025-01-13 22:15:58 字数 1153 浏览 1 评论 0原文

我有以下 DataFrame:

   user category  x  y
0    AB        A  1  1
1    EF        A  1  1
2    SG        A  1  0
3    MN        A  1  0
4    AB        B  0  0
5    EF        B  0  1
6    SG        B  0  1
7    MN        B  0  0
8    AB        C  1  1
9    EF        C  1  1
10   SG        C  1  1
11   MN        C  1  1

我想选择所有类别中都有 x=y 的用户。我可以使用以下代码来做到这一点:

data = pd.DataFrame({'user': ['AB', 'EF', 'SG', 'MN', 'AB', 'EF', 
                              'SG', 'MN', 'AB', 'EF', 'SG', 'MN'],
                     'category': ['A', 'A', 'A', 'A', 'B', 'B', 
                                  'B', 'B', 'C', 'C', 'C', 'C'],
                     'x': [1,1,1,1, 0,0,0,0, 1,1,1,1],
                     'y': [1,1,0,0, 0,1,1,0, 1,1,1,1]})

data = data[data['x'] == data['y']][['user', 'category']]
count_users_match = data.groupby('user', as_index=False).count()
count_cat = data['category'].unique().shape[0]
print(count_users_match[count_users_match['category'] == count_cat])

输出:

  user  category
0   AB         3

我觉得这是一个相当长的解决方案。有没有更短的方法来实现这一目标?

I have the following DataFrame:

   user category  x  y
0    AB        A  1  1
1    EF        A  1  1
2    SG        A  1  0
3    MN        A  1  0
4    AB        B  0  0
5    EF        B  0  1
6    SG        B  0  1
7    MN        B  0  0
8    AB        C  1  1
9    EF        C  1  1
10   SG        C  1  1
11   MN        C  1  1

I want to select users that have x=y in all categories. I was able to do that using the following code:

data = pd.DataFrame({'user': ['AB', 'EF', 'SG', 'MN', 'AB', 'EF', 
                              'SG', 'MN', 'AB', 'EF', 'SG', 'MN'],
                     'category': ['A', 'A', 'A', 'A', 'B', 'B', 
                                  'B', 'B', 'C', 'C', 'C', 'C'],
                     'x': [1,1,1,1, 0,0,0,0, 1,1,1,1],
                     'y': [1,1,0,0, 0,1,1,0, 1,1,1,1]})

data = data[data['x'] == data['y']][['user', 'category']]
count_users_match = data.groupby('user', as_index=False).count()
count_cat = data['category'].unique().shape[0]
print(count_users_match[count_users_match['category'] == count_cat])

Output:

  user  category
0   AB         3

I felt that this is a quite long solution. Is there any shorter way to achieve this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

書生途 2025-01-20 22:15:58

试试这个:

filtered = df.x.eq(df.y).groupby(df['user']).sum().loc[lambda x: x == df['category'].nunique()].reset_index(name='category')

输出:

>>> filtered
  user  category
0   AB         3

Try this:

filtered = df.x.eq(df.y).groupby(df['user']).sum().loc[lambda x: x == df['category'].nunique()].reset_index(name='category')

Output:

>>> filtered
  user  category
0   AB         3
南冥有猫 2025-01-20 22:15:58

我们可以使用 query + groupby + size 来查找每个用户的匹配类别数量。然后将其与每个用户的类别数量进行比较:

tmp = data.query('x==y').groupby('user').size()
out = tmp[tmp == data['category'].nunique()].reset_index(name='category')

输出:

  user  category
0   AB         3

We could use query + groupby + size to find the number of matching categories for each user. Then compare it with the number of categories for each user:

tmp = data.query('x==y').groupby('user').size()
out = tmp[tmp == data['category'].nunique()].reset_index(name='category')

Output:

  user  category
0   AB         3
红玫瑰 2025-01-20 22:15:58

这是一种更紧凑的方法,但我不知道它是否也更有效。

out = [{'user': user, 'frequency': data.loc[data['x'] == data['y']]['user'].value_counts()[user]} for user in data['user'].unique() if data.loc[data['x'] == data['y']]['user'].value_counts()[user] == data['user'].value_counts()[user]]
>>> out
[{'user': 'AB', 'frequency': 3}]

This is a more compact way to do it, but I don't know if it is also more efficient.

out = [{'user': user, 'frequency': data.loc[data['x'] == data['y']]['user'].value_counts()[user]} for user in data['user'].unique() if data.loc[data['x'] == data['y']]['user'].value_counts()[user] == data['user'].value_counts()[user]]
>>> out
[{'user': 'AB', 'frequency': 3}]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文