基于类别代码替换大熊猫分类列的值

发布于 2025-02-06 16:43:45 字数 809 浏览 1 评论 0原文

我正在寻找更优雅的方法来替换基于类别代码的分类列的值。我无法使用map方法,因为原始值未提前知道。

我目前正在使用以下方法:

df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])

这种方法感觉不高,因为我将分类列转换为整数,然后将其转换回分类。完整代码在下面。

import pandas as pd

df = pd.DataFrame({    
    'Name': ['Jack', 'John', 'Jil', 'Jax'],
    'Gender': ['M', 'M', 'F', pd.NA],
})

df['Gender'] = df['Gender'].astype('category')

# don't want to do this as original values may not be known to establish the dict
# df['Gender'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})

# offline, we know 0 = Female, 1 = Male
# what is more elegant way to do below?
df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])

I am looking for more elegant approach to replace the values for categorical column based on category codes. I am not able to use map method as the original values are not known in advance.

I am currently using the following approach:

df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])

This approach feels inelegant because I convert categorical column to integer, and then convert it back to categorical. Full code is below.

import pandas as pd

df = pd.DataFrame({    
    'Name': ['Jack', 'John', 'Jil', 'Jax'],
    'Gender': ['M', 'M', 'F', pd.NA],
})

df['Gender'] = df['Gender'].astype('category')

# don't want to do this as original values may not be known to establish the dict
# df['Gender'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})

# offline, we know 0 = Female, 1 = Male
# what is more elegant way to do below?
df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

土豪 2025-02-13 16:43:45

这是一种创建独特项目字典的方法

,并使用枚举分配索引

d = {item: i for i, item in enumerate(df['Gender'].unique())}

使用映射来映射值

df['cat'] = df['Gender'].map(d)
df
    Name    Gender  cat
0   Jack    M       0
1   John    M       0
2   Jil     F       1
3   Jax     <NA>    2

here is one way to do that

create a dictionary of unique items and using enumerate assign an index

d = {item: i for i, item in enumerate(df['Gender'].unique())}

use map to map the values

df['cat'] = df['Gender'].map(d)
df
    Name    Gender  cat
0   Jack    M       0
1   John    M       0
2   Jil     F       1
3   Jax     <NA>    2
网名女生简单气质 2025-02-13 16:43:45

使用

df['Gender'] = (df['Gender'].astype('category')
                .cat.rename_categories(['Female', 'Male'])
               )

输出:

   Name  Gender
0  Jack    Male
1  John    Male
2   Jil  Female
3   Jax     NaN

What about using cat.rename_categories?

df['Gender'] = (df['Gender'].astype('category')
                .cat.rename_categories(['Female', 'Male'])
               )

output:

   Name  Gender
0  Jack    Male
1  John    Male
2   Jil  Female
3   Jax     NaN
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文