ValueError:无法将字符串转换为浮点数:id
我正在运行以下 Python 脚本:
#!/usr/bin/python
import os,sys
from scipy import stats
import numpy as np
f = open('data2.txt', 'r').readlines()
for i in range(0, len(f)-1):
l1 = f[i].split()
list1 = [float(x) for x in l1]
但出现以下错误:
ValueError:无法将字符串转换为浮点数:id
我对此感到困惑,因为当我在交互部分仅尝试这一行,而不是使用脚本进行 for 循环时,它效果很好:
from scipy import stats
import numpy as np
f = open('data2.txt','r').readlines()
l1 = f[1].split()
list1 = [float(x) for x in l1]
list1
# [5.3209183842, 4.6422726719, 4.3788135547]
对此有什么解释吗?
I'm running the following Python script:
#!/usr/bin/python
import os,sys
from scipy import stats
import numpy as np
f = open('data2.txt', 'r').readlines()
for i in range(0, len(f)-1):
l1 = f[i].split()
list1 = [float(x) for x in l1]
But I got the error below:
ValueError: could not convert string to float: id
I'm confused by this because when I try this for only one line in interactive section, instead of for loop using script, it works well:
from scipy import stats
import numpy as np
f = open('data2.txt','r').readlines()
l1 = f[1].split()
list1 = [float(x) for x in l1]
list1
# [5.3209183842, 4.6422726719, 4.3788135547]
What is the explanation a little bit about this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
显然,您的某些行没有有效的浮点数据,特别是某些行具有无法转换为浮点的文本
id
。当您在交互式提示中尝试时,您只尝试第一行,因此最好的方法是打印出现此错误的行,您就会知道错误的行,例如
Obviously some of your lines don't have valid float data, specifically some line have text
id
which can't be converted to float.When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.
我的错误非常简单:包含数据的文本文件的最后一行有一些空格(因此不可见)字符。
作为 grep 的输出,我得到了
45
而不仅仅是45
。My error was very simple: the text file containing the data had some space (so not visible) character on the last line.
As an output of grep, I had
45
instead of just45
.这个错误非常详细:
在文本文件的某处,一行中包含单词
id
,它实际上无法转换为数字。您的测试代码之所以有效,是因为
line 2
中不存在id
一词。如果您想捕获该行,请尝试此代码。我清理了你的代码:
This error is pretty verbose:
Somewhere in your text file, a line has the word
id
in it, which can't really be converted to a number.Your test code works because the word
id
isn't present inline 2
.If you want to catch that line, try this code. I cleaned your code up a tad:
对于包含一列带逗号的数字的 Pandas 数据框,请使用:
因此像
4,200.42
这样的值将被转换为4200.42
作为浮点数。奖励 1:这很快。
奖励 2:如果将该数据帧保存在类似 Apache Parquet 格式。
For a Pandas dataframe with a column of numbers with commas, use this:
So values like
4,200.42
would be converted to4200.42
as a float.Bonus 1: This is fast.
Bonus 2: More space efficient if saving that dataframe in something like Apache Parquet format.
也许你的数字实际上并不是数字,而是伪装成数字的字母?
就我而言,我使用的字体意味着“l”和“1”看起来非常相似。我有一个像“l1919”这样的字符串,我认为它是“11919”,这把事情搞砸了。
Perhaps your numbers aren't actually numbers, but letters masquerading as numbers?
In my case, the font I was using meant that "l" and "1" looked very similar. I had a string like 'l1919' which I thought was '11919' and that messed things up.
您的数据可能不是您所期望的——看起来您正在期望浮动,但没有得到浮动。
找出发生这种情况的一个简单解决方案是在 for 循环中添加 try/ except :
Your data may not be what you expect -- it seems you're expecting, but not getting, floats.
A simple solution to figuring out where this occurs would be to add a try/except to the for-loop:
最短的方法:
df["id"] = df['id'].str.replace(',', '').astype(float)
- 如果','是问题df["id"] = df['id'].str.replace(' ', '').astype(float)
- 如果空格是问题所在Shortest way:
df["id"] = df['id'].str.replace(',', '').astype(float)
- if ',' is the problemdf["id"] = df['id'].str.replace(' ', '').astype(float)
- if blank space is the problem在 pandas 中,
当使用
astype()
将 pandas 列的 dtype 从object
更改为float
时,通常会出现此错误(或非常类似的错误)或apply()
。原因是存在无法转换为浮点数的非数字字符串。一种解决方案是使用pd.to_numeric()
代替,并传递errors='coerce'
。这会将非数字值(例如文字字符串'id'
)替换为 NaN。pd.to_numeric()
仅适用于单个列,因此如果您需要一次性更改多个列的数据类型(类似于.astype(float)
可能是使用),然后将其传递给apply()
应该可以完成这项工作。有时存在数千个分隔符逗号,这会引发类似的错误:
在这种情况下,首先在
pd.to_numeric()
调用之前删除它们可以解决问题。在 scikit-learn 中,
当您将包含字符串的数据拟合到需要数字数据的模型时,也会引发此错误。一个例子是各种缩放器,例如StandardScaler()。在这种情况下,解决方案是通过 one-hot 或标签将文本输入编码为数字输入来处理数据。下面是一个示例,其中字符串输入首先进行单热编码,然后输入缩放器模型。
In pandas
This error (or a very similar error) commonly appears when changing the dtype of a pandas column from
object
tofloat
usingastype()
orapply()
. The cause is there are non-numeric strings that cannot be converted into floats. One solution is to usepd.to_numeric()
instead, witherrors='coerce'
passed. This replaces non-numeric values such as the literal string'id'
to NaN.pd.to_numeric()
works only on individual columns, so if you need to change the dtype of multiple columns in one go (similar to how.astype(float)
may be used), then passing it toapply()
should do the job.Sometimes there are thousands separator commas, which throws a similar error:
in which case, first removing them before the
pd.to_numeric()
call solves the issue.In scikit-learn
This error is also raised when you fit data containing strings to models that expects numeric data. One example is various scalers e.g.
StandardScaler()
. In that case, the solution is to process the data by one-hot or label encoding the text input into a numeric input. Below is an example where a string input is one-hot encoded first and fed into a scaler model.将空字符串值更新为 0.0 值:
如果您知道可能的非浮点值,则更新它。
Update empty string values with 0.0 values:
if you know the possible non-float values then update it.
我使用 pandas 使用基本技术解决了类似的情况。首先使用 pandas 加载 csv 或文本文件。这非常简单,
然后将数据索引设置为需要更改的相关列。例如,如果您的数据将 ID 作为一个属性或列,则将索引设置为 ID。
然后使用以下命令删除所有以“id”作为值而不是数字的行。
I solved the similar situation with basic technique using pandas. First load the csv or text file using pandas.It's pretty simple
Then set the index of data to the respected column that needs to be changed. For example, if your data has ID as one attribute or column, then set index to ID.
Then delete all the rows with "id" as the value instead of number using following command.
对于 pandas 数据框或系列,当您收到此错误时,请执行以下操作:
For a pandas data frame or series when you get this error do this:
处理数据中这些类型的错误值的一个好选择是在 read_csv 步骤中通过指定 na_values 将其删除。这将识别字符串以识别为 NA/NaN。
默认情况下,以下值被解释为 NaN:''、'#N/A'、'#N/AN/A'、'#NA'、'-1.#IND'、'-1.#QNAN'、 '-NaN', '-nan', '1.#IND', '1.#QNAN', '', 'N/A', 'NA', 'NULL', 'NaN', “无”、“不适用”、“南”、“空”。所以在你的情况下,因为它抱怨数据中的字符串“id”。您可以执行以下操作:
这会将其中包含“id”的列的值指定为空,并解决在感兴趣的列上运行分析时的值错误
A good option to handle these types of erroneous values in the data is to remove it at the read_csv step by specifying na_values. This will identify strings to recognize as NA/NaN.
By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘None’, ‘n/a’, ‘nan’, ‘null’. So in your case, since it's complaining about the string 'id' in the data. you could do the following:
This will specify values the columns with 'id' in them as null and resolve the value error when running analysis on the column of interest