使用Groupby Dict填充Nan
我试图使用具有中位模型圆柱体编号的词典来替换“圆柱”列中的NAN。我认为它应该很容易起作用,但是我一直在尝试每种尝试。
cylinders_model_med = df.groupby('model').agg({'cylinders': 'median'})
cylinders_model_med=cylinders_model_med.to_dict()
#output:
'bmw x5': 6.0,
'buick enclave': 6.0,
'cadillac escalade': 8.0,
'chevrolet camaro': 6.0,
'chevrolet camaro lt coupe 2d': 6.0,
'chevrolet colorado': 5.0,
'chevrolet corvette': 8.0,
'chevrolet cruze': 4.0,
'chevrolet equinox': 4.0,
'chevrolet impala': 6.0,
'chevrolet malibu': 4.0,
'chevrolet silverado': 8.0,
'chevrolet silverado 1500': 8.0,
'chevrolet silverado 1500 crew': 8.0,
'chevrolet silverado 2500hd': 8.0,
'chevrolet silverado 3500hd': 8.0,
'chevrolet suburban': 8.0,
'chevrolet tahoe': 8.0,
'chevrolet trailblazer': 6.0,
'chevrolet traverse': 6.0,
'chrysler 200': 4.0,
'chrysler 300': 6.0,
'chrysler town & country': 6.0,
'dodge charger': 6.0,
'dodge dakota': 6.0,
'dodge grand caravan': 6.0,
'ford econoline': 8.0,
'ford edge': 6.0,
'ford escape': 4.0,
'ford expedition': 8.0,
'ford explorer': 6.0,
'ford f-150': 8.0,
'ford f-250': 8.0,
'ford f-250 sd': 8.0,
'ford f-250 super duty': 8.0,
'ford f-350 sd': 8.0,
'ford f150': 8.0,
'ford f150 supercrew cab xlt': 6.0,
'ford f250': 8.0,
'ford f250 super duty': 8.0,
'ford f350': 8.0,
'ford f350 super duty': 8.0,
'ford focus': 4.0,
'ford focus se': 4.0,
'ford fusion': 4.0,
'ford fusion se': 4.0,
'ford mustang': 6.0,
'ford mustang gt coupe 2d': 8.0,
'ford ranger': 6.0,
'ford taurus': 6.0,
'gmc acadia': 6.0,
'gmc sierra': 8.0,
'gmc sierra 1500': 8.0,
'gmc sierra 2500hd': 8.0,
'gmc yukon': 8.0,
'honda accord': 4.0,
'honda civic': 4.0,
'honda civic lx': 4.0,
'honda cr-v': 4.0,
'honda odyssey': 6.0,
'honda pilot': 6.0,
'hyundai elantra': 4.0,
'hyundai santa fe': 6.0,
'hyundai sonata': 4.0,
'jeep cherokee': 6.0,
'jeep grand cherokee': 6.0,
'jeep grand cherokee laredo': 6.0,
'jeep liberty': 6.0,
'jeep wrangler': 6.0,
'jeep wrangler unlimited': 6.0,
'kia sorento': 4.0,
'kia soul': 4.0,
'mercedes-benz benze sprinter 2500': 6.0,
'nissan altima': 4.0,
'nissan frontier': 6.0,
'nissan frontier crew cab sv': 6.0,
'nissan maxima': 6.0,
'nissan murano': 6.0,
'nissan rogue': 4.0,
'nissan sentra': 4.0,
'nissan versa': 4.0,
'ram 1500': 8.0,
'ram 2500': 6.0,
'ram 3500': 6.0,
'subaru forester': 4.0,
'subaru impreza': 4.0,
'subaru outback': 4.0,
'toyota 4runner': 6.0,
'toyota camry': 4.0,
'toyota camry le': 4.0,
'toyota corolla': 4.0,
'toyota highlander': 6.0,
'toyota prius': 4.0,
'toyota rav4': 4.0,
'toyota sienna': 6.0,
'toyota tacoma': 6.0,
'toyota tundra': 8.0,
'volkswagen jetta': 4.0,
'volkswagen passat': 4.0}}
#input:
df['cylinders']=df['cylinders'].fillna(cylinders_model_med)
df['cylinders'].isna().sum()
#output
5260
这是我开始的NAN的数量。 我是新来的,所以让我知道您是否需要更多(或更少)的信息。
谢谢您的宝贵时间!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
从相应值中填充
NAN
值是combine_first
为谋生所做的。您可以按模型计算中位气缸号,然后按模型填充原始数据框架NAN缸数。假设此开始数据
框
Filling in
NaN
values from corresponding values is whatcombine_first
does for a living. You could calculate the median cylinder number by model then fill in the original dataframe NaN cylinder numbers by model.Assume this starting dataframe
Calculate the median cylinders by model and fill in the
NaNs
Result
PANDAS中的对齐是基于索引,因此您需要创建默认值,要么明确与数据框架保持一致,要么将自动对齐。最简单的方法是使用 df 相同的索引。 series.replace.html“ rel =“ nofollow noreferrer”>
替换
:请参阅文档有关更多信息: vectorized操作和标签对齐
Alignment in pandas is based on index, so you need to create defaults which either you explicitly align to your dataframe or which will be aligned automatically. The easiest way to do this would be to create a default series with the same index as
df
usingreplace
:See the docs for more info: vectorized operations and label alignment