基于类别代码替换大熊猫分类列的值
我正在寻找更优雅的方法来替换基于类别代码的分类列的值。我无法使用map
方法,因为原始值未提前知道。
我目前正在使用以下方法:
df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])
这种方法感觉不高,因为我将分类列转换为整数,然后将其转换回分类。完整代码在下面。
import pandas as pd
df = pd.DataFrame({
'Name': ['Jack', 'John', 'Jil', 'Jax'],
'Gender': ['M', 'M', 'F', pd.NA],
})
df['Gender'] = df['Gender'].astype('category')
# don't want to do this as original values may not be known to establish the dict
# df['Gender'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})
# offline, we know 0 = Female, 1 = Male
# what is more elegant way to do below?
df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])
I am looking for more elegant approach to replace the values for categorical column based on category codes. I am not able to use map
method as the original values are not known in advance.
I am currently using the following approach:
df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])
This approach feels inelegant because I convert categorical column to integer, and then convert it back to categorical. Full code is below.
import pandas as pd
df = pd.DataFrame({
'Name': ['Jack', 'John', 'Jil', 'Jax'],
'Gender': ['M', 'M', 'F', pd.NA],
})
df['Gender'] = df['Gender'].astype('category')
# don't want to do this as original values may not be known to establish the dict
# df['Gender'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})
# offline, we know 0 = Female, 1 = Male
# what is more elegant way to do below?
df['Gender'] = pd.Categorical.from_codes(df['Gender'].cat.codes.fillna(-1), categories=['Female', 'Male'])
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一种创建独特项目字典的方法
,并使用枚举分配索引
使用映射来映射值
here is one way to do that
create a dictionary of unique items and using enumerate assign an index
use map to map the values
使用 ?
输出:
What about using
cat.rename_categories
?output: