pandas`to_numeric`整数downcast cast cast floats not to Integer

发布于 2025-02-05 11:53:04 字数 1517 浏览 5 评论 0原文

使用此示例数据框:

>>> d = pd.DataFrame({'si': ['1', '2', 'NA'], 's': ['a', 'b', 'c']})

>>> d.dtypes
#
si    object
s     object
dtype: object

我的第一次尝试是使用ASTYPE和“ Int64” NA Aware Int类型,但是我得到了

回溯

>>> d.si.astype('Int64')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-144-ed289e0c95aa> in <module>
----> 1 d.si.astype('Int64')
...

,然后我尝试to_numeric方法:

pandas to_numeric整数降落的铸件cast floats

In [112]: d.loc[:, 'ii'] = pd.to_numeric(d.si, errors='coerce', downcast='integer')

In [113]: d.dtypes
Out[113]: 
si     object
s      object
ii    float64
dtype: object

In [114]: d
Out[114]: 
    si  s   ii
0    1  a  1.0
1    2  b  2.0
2   NA  c  NA

在上述我期望具有ii列带有整数和整数NAN

文档说:

downcast : {'integer', 'signed', 'unsigned', 'float'}, default None
    If not None, and if the data has been successfully cast to a
    numerical dtype (or if the data was numeric to begin with),
    downcast that resulting data to the smallest numerical dtype
    possible according to the following rules:

    - 'integer' or 'signed': smallest signed int dtype (min.: np.int8)
    - 'unsigned': smallest unsigned int dtype (min.: np.uint8)
    - 'float': smallest float dtype (min.: np.float32)

With this sample dataframe:

>>> d = pd.DataFrame({'si': ['1', '2', 'NA'], 's': ['a', 'b', 'c']})

>>> d.dtypes
#
si    object
s     object
dtype: object

My first attempt was to use astype and the 'Int64' NA aware int type, but I got a

traceback

>>> d.si.astype('Int64')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-144-ed289e0c95aa> in <module>
----> 1 d.si.astype('Int64')
...

then I try the to_numeric method:

pandas to_numeric integer downcast cast floats

In [112]: d.loc[:, 'ii'] = pd.to_numeric(d.si, errors='coerce', downcast='integer')

In [113]: d.dtypes
Out[113]: 
si     object
s      object
ii    float64
dtype: object

In [114]: d
Out[114]: 
    si  s   ii
0    1  a  1.0
1    2  b  2.0
2   NA  c  NA

In the above I expect to have ii column with integers and integer nan

Documentation say:

downcast : {'integer', 'signed', 'unsigned', 'float'}, default None
    If not None, and if the data has been successfully cast to a
    numerical dtype (or if the data was numeric to begin with),
    downcast that resulting data to the smallest numerical dtype
    possible according to the following rules:

    - 'integer' or 'signed': smallest signed int dtype (min.: np.int8)
    - 'unsigned': smallest unsigned int dtype (min.: np.uint8)
    - 'float': smallest float dtype (min.: np.float32)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

薄凉少年不暖心 2025-02-12 11:53:04

不幸的是,pandas仍在调整/过渡到完全支持整数nan。为此,您必须在pd.to_numeric操作后明确将其转换为int64

无需降低。

# Can also use `'Int64' as dtype below.
>>> pd.to_numeric(df['col'], errors='coerce').astype(pd.Int64Dtype())

# or

>>> pd.to_numeric(df['col'], errors='coerce').astype('Int64')

0       1
1       2
2       3
3    <NA>
Name: col, dtype: Int64

Unfortunately, pandas is still adapting/transitioning to fully supporting integer NaN. For that, you have to explicitly convert it to Int64 after your pd.to_numeric operation.

No need to downcast.

# Can also use `'Int64' as dtype below.
>>> pd.to_numeric(df['col'], errors='coerce').astype(pd.Int64Dtype())

# or

>>> pd.to_numeric(df['col'], errors='coerce').astype('Int64')

0       1
1       2
2       3
3    <NA>
Name: col, dtype: Int64
你列表最软的妹 2025-02-12 11:53:04

您有errors ='coerce'设置,该选项的文档说(强调我):

errors:{'忽略''rive'''coerce'},默认“提高”

  • 如果'提升',则无效解析将引起例外。
  • 如果'coerce',则无效解析将以Nan。
  • 如果'impanore',则无效解析将返回输入。

由于您的si列包含nan s,因此您无法将其转换为整数列,因为nan是浮点,因此所有其他值在列中,将升至float64 dtype。

You have errors='coerce' set, and the documentation for that option says (emphasis mine):

errors : {'ignore', 'raise', 'coerce'}, default 'raise'

  • If 'raise', then invalid parsing will raise an exception.
  • If 'coerce', then invalid parsing will be set as NaN.
  • If 'ignore', then invalid parsing will return the input.

Since your si column contains NaNs, you can't convert it to an integer column because NaN is a float, and therefore all other values in the column are upcasted to the float64 dtype.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文