使用熊猫插入nans（）fillna（）将dtype从浮点变为对象

发布于 2025-01-31 04:01:51 字数 532 浏览 4 评论 0原文

因此，我正在为缺少的值推出一些列。这些列以数值dtypes（浮动和整数）为一旦我使用fillna（）用均值等将缺失值算，则列的dtype从float更改为对象。我希望它保持浮动。并发现重做所有dtypes有点低效。请帮助我。

这是一个例子。

ser_original = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0], dtype=float)

ser_imputed = ser_original.fillna(np.mean)
print('After imputation, the dtype is {}'.format(ser_imputed.dtype))

插图后，dtype为 dtype（'o'）

请注意，这只是我在这里创建的示例示例。我正在使用一个大型数据集，并计划将多列算为不同。因此，请建议解决一个有助于立即处理多列的解决方案。

PS我发现部署循环有点天真。如果我在这里不正确，请发表评论。

原文

So I am imputing some of my columns for the missing values. The columns were in numerical dtypes (float and integer)
As soon as I impute the missing values using fillna() with mean etc, the column's dtype is changed from float to object.
I wanted it to remain float. And find it a little inefficient to redo all dtypes.
Kindly help me with this.

Here is an example.

ser_original = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0], dtype=float)

ser_imputed = ser_original.fillna(np.mean)
print('After imputation, the dtype is {}'.format(ser_imputed.dtype))

After imputation, the dtype is dtype('O')

Please note that this is just a sample example I created here. I am working with a large datasets and have planned to impute multiple columns with different imputations. So please suggest the solution that helps handling multiple columns at once.

P.S. I find deploying for loops to be a little naive. Do comment if I am incorrect here.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情话已封尘 2025-02-07 04:01:51

那是因为您使用的是函数而不是值

ser_original = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0], dtype=float)
ser_imputed = ser_original.fillna(np.mean)
print(ser_imputed)
0                                      1.0
1                                      2.0
2    <function mean at 0x000002BCA05020D0>
3                                      4.0
4                                      5.0
dtype: object

使用均值，并且

ser_imputed = ser_original.fillna(ser_original.mean())
print(ser_imputed)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64
print(ser_imputed.dtype)
# dtype('float64')

您有数据框架，则可以使用fillna（）填充nan

df.fillna(df.mean())

如果被该列的平均值代替。

That's because you're using a function rather than values

ser_original = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0], dtype=float)
ser_imputed = ser_original.fillna(np.mean)
print(ser_imputed)
0                                      1.0
1                                      2.0
2    <function mean at 0x000002BCA05020D0>
3                                      4.0
4                                      5.0
dtype: object

Use the mean instead and it works fine

ser_imputed = ser_original.fillna(ser_original.mean())
print(ser_imputed)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64
print(ser_imputed.dtype)
# dtype('float64')

If you have a dataframe, you can fill in NaNs in it by using fillna() as