如何通过检查另一列来填充一列的值

发布于 2025-01-18 18:52:54 字数 501 浏览 0 评论 0原文

此图片会更好地帮助:

This IS a Screenshot of the data here

标题为passengerId的列描述了团体编号和人员编号,同一群体的人通常是一家人,因此他们来自同一个星球。所以它们是 Home Planet 列中的一些 nan 行,我想通过有关 PassengerID 列中的组号的知识来填充它。

因此,我需要一个代码或者可能是一个循环,通过检查他们是否与某人在一个组中来填充“家乡星球”列中的 na 值(因为他们因此可能位于同一个家乡星球,因为他们可能是一个家庭)。这基本上就是我需要的帮助,通过检查组号并使用有关组成员的 Homeplanet 作为 na 值的替换来填充 Homeplanet 列中的 na 值

我尝试运行 for 循环,但我什至不知道指定什么参数。我将 PassengerId 转换为一个数组,与 Homeplanet 相同,并尝试通过成员进行其他操作,但我不知道如何继续。

This image would help better:

This IS a screenshot of the data here

The column titled passengerId describes the group number and person number, people in the same group are usually families, hence they come from the same planet. So they are some nan rows in the Home planet column and I want to fill it through knowledge about the group number in the PassengerID column.

So I need a code or maybe a loop that'll fill na values in the Home planet column by checking if they're in a group with someone (because they would therefore be in the same homeplanet since they are likely a family) . That's just basically what I need help with Filling the na values in the Homeplanet column by checking the group number and using the Homeplanet of about group member as the replacement for the na value

I've tried running for loops but I didn't even know what parameter to specify. I converted the PassengerId into an array and the same with Homeplanet and tried to other through members but I didn't know how to move forward.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

辞取 2025-01-25 18:52:55

如果我正确理解描述,此示例数据框将展示问题:

df = pd.DataFrame({'passenger_id': ['1', '1', '2', '2'], 'home_planet': ['3', np.nan, '4', np.nan]})
df

   passenger_id | home_planet
0 | 1           |  3
1 | 1           |  NaN
2 | 2           |  4
3 | 2           |  NaN

您希望根据 passenger_id 列中的值将 NaN 值设置为 3 和 4。

您可以通过将 DataFrame 与其经过清理和重复数据删除的自身合并来实现此目的:

pd.merge(df, df.loc[df['home_planet'].notna()].drop_duplicates(), 
on='passenger_id', suffixes=('_x', ''))[['passenger_id', 'home_planet']]


   passenger_id | home_planet
0 | 1           |  3
1 | 1           |  3
2 | 2           |  4
3 | 2           |  4

问题更新后更新

您可以从 PassengerId 中提取 GroupId 字段> 并执行我最初建议的操作:

df = pd.DataFrame({'PassengerId': ['9280_01', '9280_02', '9279_01', '9279_02'], 
'HomePlanet': ['Europa', np.nan, 'Earth', np.nan]})
df

  PassengerId HomePlanet
0     9280_01     Europa
1     9280_02        NaN
2     9279_01      Earth
3     9279_02        NaN

df['GroupId'] = df['PassengerId'].apply(lambda x: x.split('_')[0])
df

  PassengerId HomePlanet GroupId
0     9280_01     Europa    9280
1     9280_02        NaN    9280
2     9279_01      Earth    9279
3     9279_02        NaN    9279

pd.merge(df, df.loc[df['HomePlanet'].notna()].drop_duplicates(), 
on='GroupId', suffixes=('_x', ''))[['PassengerId', 'HomePlanet']]

  PassengerId HomePlanet
0     9280_01     Europa
1     9280_01     Europa
2     9279_01      Earth
3     9279_01      Earth

如果您想进行进一步检查以确定这两名乘客是否确实来自同一家庭(例如检查他们的姓名),您可以在 apply 中执行此操作。

If I understand the description correctly, this example data frame would showcase the problem:

df = pd.DataFrame({'passenger_id': ['1', '1', '2', '2'], 'home_planet': ['3', np.nan, '4', np.nan]})
df

   passenger_id | home_planet
0 | 1           |  3
1 | 1           |  NaN
2 | 2           |  4
3 | 2           |  NaN

where you want the NaN values to be 3 and 4 based on the value in passenger_id column.

You can do this with merging the DataFrame with its cleaned and deduplicated self:

pd.merge(df, df.loc[df['home_planet'].notna()].drop_duplicates(), 
on='passenger_id', suffixes=('_x', ''))[['passenger_id', 'home_planet']]


   passenger_id | home_planet
0 | 1           |  3
1 | 1           |  3
2 | 2           |  4
3 | 2           |  4

Update after the question has been updated

You can extract a GroupId field from PassengerId and do what I originally suggested like this:

df = pd.DataFrame({'PassengerId': ['9280_01', '9280_02', '9279_01', '9279_02'], 
'HomePlanet': ['Europa', np.nan, 'Earth', np.nan]})
df

  PassengerId HomePlanet
0     9280_01     Europa
1     9280_02        NaN
2     9279_01      Earth
3     9279_02        NaN

df['GroupId'] = df['PassengerId'].apply(lambda x: x.split('_')[0])
df

  PassengerId HomePlanet GroupId
0     9280_01     Europa    9280
1     9280_02        NaN    9280
2     9279_01      Earth    9279
3     9279_02        NaN    9279

pd.merge(df, df.loc[df['HomePlanet'].notna()].drop_duplicates(), 
on='GroupId', suffixes=('_x', ''))[['PassengerId', 'HomePlanet']]

  PassengerId HomePlanet
0     9280_01     Europa
1     9280_01     Europa
2     9279_01      Earth
3     9279_01      Earth

If you want to do further checks to determine if the two passengers are indeed from the same family (for example check their names) you can do that in the apply.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文