将布尔列添加到 pandas 数据框中，其中一行为 true 应该使所有相同的用户行为 true

发布于 2025-01-10 17:35:25 字数 1033 浏览 3 评论 0原文

添加布尔列时，我遇到 pandas 数据框问题。数据用户拥有可以在多个位置打开的项目。我需要有一组在多个地方处理过同一项目的用户。如果同一个用户在不同的地方打开了同一个项目，即使只有一次，它也会使shared_projects为真。那么具有该 user_id 的所有行都应该为 true。

这是一个示例 df：

user_id   project_id_x   project_id_y
   1           1              2 
   1           3              4  
   2           5              6 
   2           7              7 
   2           8              9
   3           10             11                     
   3           12             10

这是我想要得到的一个简单示例。如果一行中的条件为 true，则具有该 user_id 的所有用户都为 true。

user_id   project_id_x   project_id_y   shared_projects
   1           1              2           false
   1           3              4           false 
   2           5              6           true
   2           7              7           true
   2           8              9           true
   3           10             11          true           
   3           12             10          true

我可以根据每一行获取布尔值，但如果某一行为真，如何使其对所有用户都为真，我却陷入困境。

原文

I have problems with pandas dataframe when adding a boolean column. Data has users who have projects they can open in several places. I would need to have a group of users who have worked with the same project in several places. If the same user has opened the same project in different places even once it would make shared_projects true. Then all rows with that user_id should be true.

Here is an example df:

user_id   project_id_x   project_id_y
   1           1              2 
   1           3              4  
   2           5              6 
   2           7              7 
   2           8              9
   3           10             11                     
   3           12             10

This is a simple example what I would like to get out. If the condition is true in one line it will be true in all the users with that user_id.

user_id   project_id_x   project_id_y   shared_projects
   1           1              2           false
   1           3              4           false 
   2           5              6           true
   2           7              7           true
   2           8              9           true
   3           10             11          true           
   3           12             10          true

I can get boolean values based on each row but I am stuck how to make it true to all users if it is true on one row.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

故人爱我别走 2025-01-17 17:35:25

假设您想在同一行上匹配：

df['shared_projects'] = (df['project_id_x'].eq(df['project_id_y'])
                         .groupby(df['user_id']).transform('any')
                        )

如果您想匹配给定用户的任何值 x/y，您可以使用 set 交集：

s = df.groupby('user_id').apply(lambda g: bool(set(g['project_id_x'])
                                              .intersection(g['project_id_y'])))

df.merge(s.rename('shared_project'), left_on='user_id', right_index=True)

输出：

   user_id  project_id_x  project_id_y  shared_projects
0        1             1             2            False
1        1             3             4            False
2        2             5             6             True
3        2             7             7             True
4        2             8             9             True

Assuming you want to match on the same row:

df['shared_projects'] = (df['project_id_x'].eq(df['project_id_y'])
                         .groupby(df['user_id']).transform('any')
                        )

If you want to match on any value x/y for a given user, you can use a set intersection:

s = df.groupby('user_id').apply(lambda g: bool(set(g['project_id_x'])
                                              .intersection(g['project_id_y'])))

df.merge(s.rename('shared_project'), left_on='user_id', right_index=True)

output:

   user_id  project_id_x  project_id_y  shared_projects
0        1             1             2            False
1        1             3             4            False
2        2             5             6             True
3        2             7             7             True
4        2             8             9             True

回复收藏 0 原文

月寒剑心 2025-01-17 17:35:25

首先，您必须进行复杂的选择，以找到在不同列中的同一项目中工作过的用户：

df['shared_projects'] = (df['project_id_x'] == df['project_id_y'])

这将像您已经完成的那样创建一个新的布尔列。但是，您可以使用这些 True 值的索引来应用于其余值，假设“user_id”是数据帧的索引。

for index in df[df['shared_projects'] == True]].index.unique():
    df.at[index, 'project_id_x'] = True
    df.at[index, 'project_id_y'] = True

First you will have to do a complex selection to find the user that have worked in the same project in different columns:

df['shared_projects'] = (df['project_id_x'] == df['project_id_y'])

That will create a new boolean column as you've already done. But then you can use the index of those True values to apply to the rest, assuming that "user_id" is your index for the dataframe.

for index in df[df['shared_projects'] == True]].index.unique():
    df.at[index, 'project_id_x'] = True
    df.at[index, 'project_id_y'] = True

回复收藏 0 原文

不再见 2025-01-17 17:35:25

更新

另一种无需apply的方法，使用value_counts。

user_id = df.melt('user_id', var_name='project', value_name='project_id') \
            .value_counts(['user_id', 'project_id']) \
            .loc[lambda x: x > 1].index.get_level_values('user_id')
df['shared_projects'] = df['user_id'].isin(user_id)

输出：

>>> df
user_id   project_id_x   project_id_y
   1           1              2 
   1           3              4  
   2           5              6 
   2           7              7 
   2           8              9

# Intermediate result
>>> df.melt('user_id', var_name='project', value_name='project_id') \
            .value_counts(['user_id', 'project_id'])

user_id  project_id
2        7             2  # <- project 7 in multiple places for user 2
1        1             1
         2             1
         3             1
         4             1
2        5             1
         6             1
         8             1
         9             1
dtype: int64

旧答案

您可以使用melt：

shared_projects = lambda x: len(set(x)) != len(x)
user_id = df.melt('user_id').groupby('user_id')['value'].apply(shared_projects)
df['shared_projects'] = df['user_id'].isin(user_id[user_id].index)

输出：

>>> df
   user_id  project_id_x  project_id_y  shared_projects
0        1             1             2            False
1        1             3             4            False
2        2             5             6             True
3        2             7             7             True
4        2             8             9             True

Update

Another approach without apply, using value_counts.

user_id = df.melt('user_id', var_name='project', value_name='project_id') \
            .value_counts(['user_id', 'project_id']) \
            .loc[lambda x: x > 1].index.get_level_values('user_id')
df['shared_projects'] = df['user_id'].isin(user_id)

Output:

>>> df
user_id   project_id_x   project_id_y
   1           1              2 
   1           3              4  
   2           5              6 
   2           7              7 
   2           8              9

# Intermediate result
>>> df.melt('user_id', var_name='project', value_name='project_id') \
            .value_counts(['user_id', 'project_id'])

user_id  project_id
2        7             2  # <- project 7 in multiple places for user 2
1        1             1
         2             1
         3             1
         4             1
2        5             1
         6             1
         8             1
         9             1
dtype: int64

Old answer

You can use melt:

shared_projects = lambda x: len(set(x)) != len(x)
user_id = df.melt('user_id').groupby('user_id')['value'].apply(shared_projects)
df['shared_projects'] = df['user_id'].isin(user_id[user_id].index)

Output:

>>> df
   user_id  project_id_x  project_id_y  shared_projects
0        1             1             2            False
1        1             3             4            False
2        2             5             6             True
3        2             7             7             True
4        2             8             9             True

回复收藏 0 原文

~没有更多了~