将布尔列添加到 pandas 数据框中,其中一行为 true 应该使所有相同的用户行为 true
添加布尔列时,我遇到 pandas 数据框问题。数据用户拥有可以在多个位置打开的项目。我需要有一组在多个地方处理过同一项目的用户。如果同一个用户在不同的地方打开了同一个项目,即使只有一次,它也会使shared_projects为真。那么具有该 user_id 的所有行都应该为 true。
这是一个示例 df:
user_id project_id_x project_id_y
1 1 2
1 3 4
2 5 6
2 7 7
2 8 9
3 10 11
3 12 10
这是我想要得到的一个简单示例。如果一行中的条件为 true,则具有该 user_id 的所有用户都为 true。
user_id project_id_x project_id_y shared_projects
1 1 2 false
1 3 4 false
2 5 6 true
2 7 7 true
2 8 9 true
3 10 11 true
3 12 10 true
我可以根据每一行获取布尔值,但如果某一行为真,如何使其对所有用户都为真,我却陷入困境。
I have problems with pandas dataframe when adding a boolean column. Data has users who have projects they can open in several places. I would need to have a group of users who have worked with the same project in several places. If the same user has opened the same project in different places even once it would make shared_projects true. Then all rows with that user_id should be true.
Here is an example df:
user_id project_id_x project_id_y
1 1 2
1 3 4
2 5 6
2 7 7
2 8 9
3 10 11
3 12 10
This is a simple example what I would like to get out. If the condition is true in one line it will be true in all the users with that user_id.
user_id project_id_x project_id_y shared_projects
1 1 2 false
1 3 4 false
2 5 6 true
2 7 7 true
2 8 9 true
3 10 11 true
3 12 10 true
I can get boolean values based on each row but I am stuck how to make it true to all users if it is true on one row.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
假设您想在同一行上匹配:
如果您想匹配给定用户的任何值 x/y,您可以使用
set
交集:输出:
Assuming you want to match on the same row:
If you want to match on any value x/y for a given user, you can use a
set
intersection:output:
首先,您必须进行复杂的选择,以找到在不同列中的同一项目中工作过的用户:
这将像您已经完成的那样创建一个新的布尔列。但是,您可以使用这些 True 值的索引来应用于其余值,假设“user_id”是数据帧的索引。
First you will have to do a complex selection to find the user that have worked in the same project in different columns:
That will create a new boolean column as you've already done. But then you can use the index of those True values to apply to the rest, assuming that "user_id" is your index for the dataframe.
更新
另一种无需
apply
的方法,使用value_counts
。输出:
旧答案
您可以使用
melt
:输出:
Update
Another approach without
apply
, usingvalue_counts
.Output:
Old answer
You can use
melt
:Output: