熊猫:使用具有变量的组归纳描述性统计数据
我有这样的数据框架:
input_df = pd.DataFrame({"sex": ["M", "F", "F", "M", "M"], "Class": [1, 2, 2, 1, 1], "Age":[40, 30, 30, 50, NaN]})
我想做的是根据性别和班级列将年龄的缺失价值算。 我尝试使用一个函数,条件_impute进行操作。该功能的作用是采用数据框架和条件,然后使用它根据性别和班级分组将年龄归为年龄。但是警告是,条件可以是平均值或中位数,如果不是这两个中的任何一个,则该功能必须引起错误。 所以我这样做了:
### START FUNCTION
def conditional_impute(input_df, choice='median'):
my_df = input_df.copy()
# if choice is not median or mean, raise valueerror
if choice == "mean" or choice == "median":
my_df['Age'] = my_df['Age'].fillna(my_df.groupby(["Sex","Pclass"])['Age'].transform(choice))
else:
raise ValueError()
# round the values in Age colum
my_df['Age'] = round(my_df['Age'], 1)
return my_df
### END FUNCTION
但是当我打电话时我会遇到错误。
conditional_impute(train_df, choice='mean')
我可能做错了什么?我真的无法解决这个问题。
I have a data frame like this:
input_df = pd.DataFrame({"sex": ["M", "F", "F", "M", "M"], "Class": [1, 2, 2, 1, 1], "Age":[40, 30, 30, 50, NaN]})
What I want to do is to impute the missing value for the age based on the sex and class columns.
I have tried doing it with a function, conditional_impute. What the function does is take a data frame and a condition and then use it to impute the age based on the sex and class grouping. Butthe caveat is that the condition can either be a mean or median and if not either of these two, the function has to raise an error.
So I did this:
### START FUNCTION
def conditional_impute(input_df, choice='median'):
my_df = input_df.copy()
# if choice is not median or mean, raise valueerror
if choice == "mean" or choice == "median":
my_df['Age'] = my_df['Age'].fillna(my_df.groupby(["Sex","Pclass"])['Age'].transform(choice))
else:
raise ValueError()
# round the values in Age colum
my_df['Age'] = round(my_df['Age'], 1)
return my_df
### END FUNCTION
But I am getting an error when I call it.
conditional_impute(train_df, choice='mean')
What could I possibly be doing wrong? I really cannot get a handle on this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果给出正确的输入,则输出恰好...
输出:
If you give the right inputs, it outputs just fine...
Output: