使用Groupby Dict填充Nan

发布于 2025-02-10 19:06:28 字数 2979 浏览 2 评论 0 原文

我试图使用具有中位模型圆柱体编号的词典来替换“圆柱”列中的NAN。我认为它应该很容易起作用,但是我一直在尝试每种尝试。

cylinders_model_med = df.groupby('model').agg({'cylinders': 'median'})

cylinders_model_med=cylinders_model_med.to_dict()

#output:

  'bmw x5': 6.0,
  'buick enclave': 6.0,
  'cadillac escalade': 8.0,
  'chevrolet camaro': 6.0,
  'chevrolet camaro lt coupe 2d': 6.0,
  'chevrolet colorado': 5.0,
  'chevrolet corvette': 8.0,
  'chevrolet cruze': 4.0,
  'chevrolet equinox': 4.0,
  'chevrolet impala': 6.0,
  'chevrolet malibu': 4.0,
  'chevrolet silverado': 8.0,
  'chevrolet silverado 1500': 8.0,
  'chevrolet silverado 1500 crew': 8.0,
  'chevrolet silverado 2500hd': 8.0,
  'chevrolet silverado 3500hd': 8.0,
  'chevrolet suburban': 8.0,
  'chevrolet tahoe': 8.0,
  'chevrolet trailblazer': 6.0,
  'chevrolet traverse': 6.0,
  'chrysler 200': 4.0,
  'chrysler 300': 6.0,
  'chrysler town & country': 6.0,
  'dodge charger': 6.0,
  'dodge dakota': 6.0,
  'dodge grand caravan': 6.0,
  'ford econoline': 8.0,
  'ford edge': 6.0,
  'ford escape': 4.0,
  'ford expedition': 8.0,
  'ford explorer': 6.0,
  'ford f-150': 8.0,
  'ford f-250': 8.0,
  'ford f-250 sd': 8.0,
  'ford f-250 super duty': 8.0,
  'ford f-350 sd': 8.0,
  'ford f150': 8.0,
  'ford f150 supercrew cab xlt': 6.0,
  'ford f250': 8.0,
  'ford f250 super duty': 8.0,
  'ford f350': 8.0,
  'ford f350 super duty': 8.0,
  'ford focus': 4.0,
  'ford focus se': 4.0,
  'ford fusion': 4.0,
  'ford fusion se': 4.0,
  'ford mustang': 6.0,
  'ford mustang gt coupe 2d': 8.0,
  'ford ranger': 6.0,
  'ford taurus': 6.0,
  'gmc acadia': 6.0,
  'gmc sierra': 8.0,
  'gmc sierra 1500': 8.0,
  'gmc sierra 2500hd': 8.0,
  'gmc yukon': 8.0,
  'honda accord': 4.0,
  'honda civic': 4.0,
  'honda civic lx': 4.0,
  'honda cr-v': 4.0,
  'honda odyssey': 6.0,
  'honda pilot': 6.0,
  'hyundai elantra': 4.0,
  'hyundai santa fe': 6.0,
  'hyundai sonata': 4.0,
  'jeep cherokee': 6.0,
  'jeep grand cherokee': 6.0,
  'jeep grand cherokee laredo': 6.0,
  'jeep liberty': 6.0,
  'jeep wrangler': 6.0,
  'jeep wrangler unlimited': 6.0,
  'kia sorento': 4.0,
  'kia soul': 4.0,
  'mercedes-benz benze sprinter 2500': 6.0,
  'nissan altima': 4.0,
  'nissan frontier': 6.0,
  'nissan frontier crew cab sv': 6.0,
  'nissan maxima': 6.0,
  'nissan murano': 6.0,
  'nissan rogue': 4.0,
  'nissan sentra': 4.0,
  'nissan versa': 4.0,
  'ram 1500': 8.0,
  'ram 2500': 6.0,
  'ram 3500': 6.0,
  'subaru forester': 4.0,
  'subaru impreza': 4.0,
  'subaru outback': 4.0,
  'toyota 4runner': 6.0,
  'toyota camry': 4.0,
  'toyota camry le': 4.0,
  'toyota corolla': 4.0,
  'toyota highlander': 6.0,
  'toyota prius': 4.0,
  'toyota rav4': 4.0,
  'toyota sienna': 6.0,
  'toyota tacoma': 6.0,
  'toyota tundra': 8.0,
  'volkswagen jetta': 4.0,
  'volkswagen passat': 4.0}} 
#input:
df['cylinders']=df['cylinders'].fillna(cylinders_model_med)
df['cylinders'].isna().sum()
#output
5260

这是我开始的NAN的数量。 我是新来的,所以让我知道您是否需要更多(或更少)的信息。

谢谢您的宝贵时间!

I am attempting to replace the NaN in the 'cylinders' column using a dictionary with the median model cylinder number. I think it should work easily but I have been getting stuck each way I try.

cylinders_model_med = df.groupby('model').agg({'cylinders': 'median'})

cylinders_model_med=cylinders_model_med.to_dict()

#output:

  'bmw x5': 6.0,
  'buick enclave': 6.0,
  'cadillac escalade': 8.0,
  'chevrolet camaro': 6.0,
  'chevrolet camaro lt coupe 2d': 6.0,
  'chevrolet colorado': 5.0,
  'chevrolet corvette': 8.0,
  'chevrolet cruze': 4.0,
  'chevrolet equinox': 4.0,
  'chevrolet impala': 6.0,
  'chevrolet malibu': 4.0,
  'chevrolet silverado': 8.0,
  'chevrolet silverado 1500': 8.0,
  'chevrolet silverado 1500 crew': 8.0,
  'chevrolet silverado 2500hd': 8.0,
  'chevrolet silverado 3500hd': 8.0,
  'chevrolet suburban': 8.0,
  'chevrolet tahoe': 8.0,
  'chevrolet trailblazer': 6.0,
  'chevrolet traverse': 6.0,
  'chrysler 200': 4.0,
  'chrysler 300': 6.0,
  'chrysler town & country': 6.0,
  'dodge charger': 6.0,
  'dodge dakota': 6.0,
  'dodge grand caravan': 6.0,
  'ford econoline': 8.0,
  'ford edge': 6.0,
  'ford escape': 4.0,
  'ford expedition': 8.0,
  'ford explorer': 6.0,
  'ford f-150': 8.0,
  'ford f-250': 8.0,
  'ford f-250 sd': 8.0,
  'ford f-250 super duty': 8.0,
  'ford f-350 sd': 8.0,
  'ford f150': 8.0,
  'ford f150 supercrew cab xlt': 6.0,
  'ford f250': 8.0,
  'ford f250 super duty': 8.0,
  'ford f350': 8.0,
  'ford f350 super duty': 8.0,
  'ford focus': 4.0,
  'ford focus se': 4.0,
  'ford fusion': 4.0,
  'ford fusion se': 4.0,
  'ford mustang': 6.0,
  'ford mustang gt coupe 2d': 8.0,
  'ford ranger': 6.0,
  'ford taurus': 6.0,
  'gmc acadia': 6.0,
  'gmc sierra': 8.0,
  'gmc sierra 1500': 8.0,
  'gmc sierra 2500hd': 8.0,
  'gmc yukon': 8.0,
  'honda accord': 4.0,
  'honda civic': 4.0,
  'honda civic lx': 4.0,
  'honda cr-v': 4.0,
  'honda odyssey': 6.0,
  'honda pilot': 6.0,
  'hyundai elantra': 4.0,
  'hyundai santa fe': 6.0,
  'hyundai sonata': 4.0,
  'jeep cherokee': 6.0,
  'jeep grand cherokee': 6.0,
  'jeep grand cherokee laredo': 6.0,
  'jeep liberty': 6.0,
  'jeep wrangler': 6.0,
  'jeep wrangler unlimited': 6.0,
  'kia sorento': 4.0,
  'kia soul': 4.0,
  'mercedes-benz benze sprinter 2500': 6.0,
  'nissan altima': 4.0,
  'nissan frontier': 6.0,
  'nissan frontier crew cab sv': 6.0,
  'nissan maxima': 6.0,
  'nissan murano': 6.0,
  'nissan rogue': 4.0,
  'nissan sentra': 4.0,
  'nissan versa': 4.0,
  'ram 1500': 8.0,
  'ram 2500': 6.0,
  'ram 3500': 6.0,
  'subaru forester': 4.0,
  'subaru impreza': 4.0,
  'subaru outback': 4.0,
  'toyota 4runner': 6.0,
  'toyota camry': 4.0,
  'toyota camry le': 4.0,
  'toyota corolla': 4.0,
  'toyota highlander': 6.0,
  'toyota prius': 4.0,
  'toyota rav4': 4.0,
  'toyota sienna': 6.0,
  'toyota tacoma': 6.0,
  'toyota tundra': 8.0,
  'volkswagen jetta': 4.0,
  'volkswagen passat': 4.0}} 
#input:
df['cylinders']=df['cylinders'].fillna(cylinders_model_med)
df['cylinders'].isna().sum()
#output
5260

This is the same number of NaN I started with.
I am new here so let me know if you need more (or less) information.

Thank you for your time!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

菊凝晚露 2025-02-17 19:06:28

从相应值中填充 NAN 值是 combine_first 为谋生所做的。您可以按模型计算中位气缸号,然后按模型填充原始数据框架NAN缸数。

假设此开始数据

            model  cylinders
0   nissan maxima        6.0
1   nissan maxima        6.0
2   nissan maxima        4.0
3   nissan murano        6.0
4   nissan murano        NaN
5   nissan murano        4.0
6   nissan murano        6.0
7    nissan rogue        4.0
8    nissan rogue        4.0
9    nissan rogue        NaN
10   nissan rogue        6.0
11  nissan sentra        6.0
12  nissan sentra        4.0
13  nissan sentra        4.0
14   nissan versa        4.0
15   nissan versa        4.0
16   nissan versa        NaN
17   nissan versa        4.0

df.assign(cylinders=df['cylinders'].combine_first(df[['model','cylinders']].groupby('model').transform('median').squeeze()))

            model  cylinders
0   nissan maxima        6.0
1   nissan maxima        6.0
2   nissan maxima        4.0
3   nissan murano        6.0
4   nissan murano        6.0
5   nissan murano        4.0
6   nissan murano        6.0
7    nissan rogue        4.0
8    nissan rogue        4.0
9    nissan rogue        4.0
10   nissan rogue        6.0
11  nissan sentra        6.0
12  nissan sentra        4.0
13  nissan sentra        4.0
14   nissan versa        4.0
15   nissan versa        4.0
16   nissan versa        4.0
17   nissan versa        4.0

Filling in NaN values from corresponding values is what combine_first does for a living. You could calculate the median cylinder number by model then fill in the original dataframe NaN cylinder numbers by model.

Assume this starting dataframe

            model  cylinders
0   nissan maxima        6.0
1   nissan maxima        6.0
2   nissan maxima        4.0
3   nissan murano        6.0
4   nissan murano        NaN
5   nissan murano        4.0
6   nissan murano        6.0
7    nissan rogue        4.0
8    nissan rogue        4.0
9    nissan rogue        NaN
10   nissan rogue        6.0
11  nissan sentra        6.0
12  nissan sentra        4.0
13  nissan sentra        4.0
14   nissan versa        4.0
15   nissan versa        4.0
16   nissan versa        NaN
17   nissan versa        4.0

Calculate the median cylinders by model and fill in the NaNs

df.assign(cylinders=df['cylinders'].combine_first(df[['model','cylinders']].groupby('model').transform('median').squeeze()))

Result

            model  cylinders
0   nissan maxima        6.0
1   nissan maxima        6.0
2   nissan maxima        4.0
3   nissan murano        6.0
4   nissan murano        6.0
5   nissan murano        4.0
6   nissan murano        6.0
7    nissan rogue        4.0
8    nissan rogue        4.0
9    nissan rogue        4.0
10   nissan rogue        6.0
11  nissan sentra        6.0
12  nissan sentra        4.0
13  nissan sentra        4.0
14   nissan versa        4.0
15   nissan versa        4.0
16   nissan versa        4.0
17   nissan versa        4.0
孤千羽 2025-02-17 19:06:28

PANDAS中的对齐是基于索引,因此您需要创建默认值,要么明确与数据框架保持一致,要么将自动对齐。最简单的方法是使用 df 相同的索引。 series.replace.html“ rel =“ nofollow noreferrer”> 替换

defaults = df['model'].replace(cylinders_model_med)
df['cylinders'] = df['cylinders'].fillna(defaults)

请参阅文档有关更多信息: vectorized操作和标签对齐

Alignment in pandas is based on index, so you need to create defaults which either you explicitly align to your dataframe or which will be aligned automatically. The easiest way to do this would be to create a default series with the same index as df using replace:

defaults = df['model'].replace(cylinders_model_med)
df['cylinders'] = df['cylinders'].fillna(defaults)

See the docs for more info: vectorized operations and label alignment

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文