读取 csv 文件时更改列格式

发布于 2025-01-19 03:15:42 字数 1190 浏览 1 评论 0原文

我有这个 csv 文件(名为 df.csv):

在此处输入图像描述

我使用此代码阅读了它:

import pandas as pd
df = pd.read_csv('df.csv')

并使用此代码将其打印出来:

print(df)

以及打印的输出看起来像这样:

  employment_type    ltv
0                       
1                       
2        Salaried  77.13
3        Salaried   77.4
4        Salaried  76.42
5        Salaried  71.89

尽你所能看,前两条记录是空的。 我使用以下代码检查数据帧信息:

print(df.info())

输出如下所示:

 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   employment_type  6 non-null      object
 1   ltv              6 non-null      object

现在,我希望:

  • employment_type 将作为对象读入(并且满足我的期望)
  • ltv< /code> 会被读入为 float

我猜这两个字段都被读入为对象的原因是因为第一个空记录,对吗?

虽然我很高兴将 employment_type 作为对象读入,但如何以数字形式读入 ltv 字段? 我不想在读入文件后修改格式。我需要找到一种方法在读入文件时自动分配正确的格式:我将不得不读入一些具有数百列的类似文件,并且我无法手动为每一列分配正确的格式。

I have this csv file (called df.csv):

enter image description here

I read it in using this code:

import pandas as pd
df = pd.read_csv('df.csv')

and I print it out using this code:

print(df)

and the output of the print looks like this:

  employment_type    ltv
0                       
1                       
2        Salaried  77.13
3        Salaried   77.4
4        Salaried  76.42
5        Salaried  71.89

As you can see, the first two records are empty.
I check the dataframe info with this code:

print(df.info())

and the output looks like this:

 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   employment_type  6 non-null      object
 1   ltv              6 non-null      object

Now, I would expect that:

  • employment_type would have been read in as object (and that meets my expectations)
  • ltv would have been read in as float

I guess that the reason why both fields have been read in as objects is because of the first empty record, correct?

Whilst I am happy for employment_type to be read in as an object, how can I read in the ltv field as numeric?
I don't want to modify the format after I have read the file in. I need to find a way to automatically assign the correct format whilst reading in the file: I will have to read in some similar files with hundreds of columns and I can't manually assign the correct format to each column.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

冷了相思 2025-01-26 03:15:42

我猜这两个字段都被作为对象读入的原因是因为第一个空记录,对吗?
是的,pandas 非常擅长推断数据类型,并且空单元格不能是 int 或 float。
要解决您的问题,只需删除这些空行(使用 dropna),然后您就可以编写

df['ltv']=df['ltv'].astype(float)

I guess that the reason why both fields have been read in as objects is because of the first empty record, correct?
Yes, pandas is pretty good at infering data types, and an empty cell can't be an int or a float.
To fix your issue, just remove these empty rows (with dropna) and you can then write

df['ltv']=df['ltv'].astype(float)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文