pandas`to_numeric`整数downcast cast cast floats not to Integer
使用此示例数据框:
>>> d = pd.DataFrame({'si': ['1', '2', 'NA'], 's': ['a', 'b', 'c']})
>>> d.dtypes
#
si object
s object
dtype: object
我的第一次尝试是使用ASTYPE和“ Int64” NA Aware Int类型,但是我得到了
回溯
>>> d.si.astype('Int64')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-144-ed289e0c95aa> in <module>
----> 1 d.si.astype('Int64')
...
,然后我尝试to_numeric
方法:
pandas to_numeric
整数降落的铸件cast floats
In [112]: d.loc[:, 'ii'] = pd.to_numeric(d.si, errors='coerce', downcast='integer')
In [113]: d.dtypes
Out[113]:
si object
s object
ii float64
dtype: object
In [114]: d
Out[114]:
si s ii
0 1 a 1.0
1 2 b 2.0
2 NA c NA
在上述我期望具有ii
列带有整数和整数NAN
文档说:
downcast : {'integer', 'signed', 'unsigned', 'float'}, default None
If not None, and if the data has been successfully cast to a
numerical dtype (or if the data was numeric to begin with),
downcast that resulting data to the smallest numerical dtype
possible according to the following rules:
- 'integer' or 'signed': smallest signed int dtype (min.: np.int8)
- 'unsigned': smallest unsigned int dtype (min.: np.uint8)
- 'float': smallest float dtype (min.: np.float32)
With this sample dataframe:
>>> d = pd.DataFrame({'si': ['1', '2', 'NA'], 's': ['a', 'b', 'c']})
>>> d.dtypes
#
si object
s object
dtype: object
My first attempt was to use astype and the 'Int64' NA aware int type, but I got a
traceback
>>> d.si.astype('Int64')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-144-ed289e0c95aa> in <module>
----> 1 d.si.astype('Int64')
...
then I try the to_numeric
method:
pandas to_numeric
integer downcast cast floats
In [112]: d.loc[:, 'ii'] = pd.to_numeric(d.si, errors='coerce', downcast='integer')
In [113]: d.dtypes
Out[113]:
si object
s object
ii float64
dtype: object
In [114]: d
Out[114]:
si s ii
0 1 a 1.0
1 2 b 2.0
2 NA c NA
In the above I expect to have ii
column with integers and integer nan
Documentation say:
downcast : {'integer', 'signed', 'unsigned', 'float'}, default None
If not None, and if the data has been successfully cast to a
numerical dtype (or if the data was numeric to begin with),
downcast that resulting data to the smallest numerical dtype
possible according to the following rules:
- 'integer' or 'signed': smallest signed int dtype (min.: np.int8)
- 'unsigned': smallest unsigned int dtype (min.: np.uint8)
- 'float': smallest float dtype (min.: np.float32)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不幸的是,
pandas
仍在调整/过渡到完全支持整数nan
。为此,您必须在pd.to_numeric
操作后明确将其转换为int64
。无需降低。
Unfortunately,
pandas
is still adapting/transitioning to fully supporting integerNaN
. For that, you have to explicitly convert it toInt64
after yourpd.to_numeric
operation.No need to downcast.
您有
errors ='coerce'
设置,该选项的文档说(强调我):由于您的
si
列包含nan
s,因此您无法将其转换为整数列,因为nan
是浮点,因此所有其他值在列中,将升至float64
dtype。You have
errors='coerce'
set, and the documentation for that option says (emphasis mine):Since your
si
column containsNaN
s, you can't convert it to an integer column becauseNaN
is a float, and therefore all other values in the column are upcasted to thefloat64
dtype.