ValueError:无法将字符串转换为浮点数:id

发布于 2024-12-20 05:08:31 字数 683 浏览 2 评论 0原文

我正在运行以下 Python 脚本:

#!/usr/bin/python

import os,sys
from scipy import stats
import numpy as np

f = open('data2.txt', 'r').readlines()
for i in range(0, len(f)-1):
    l1 = f[i].split()
    list1 = [float(x) for x in l1]

但出现以下错误:

ValueError:无法将字符串转换为浮点数:id

我对此感到困惑,因为当我在交互部分仅尝试这一行,而不是使用脚本进行 for 循环时,它效果很好:

from scipy import stats
import numpy as np

f = open('data2.txt','r').readlines()
l1 = f[1].split()
list1 = [float(x) for x in l1]
list1
# [5.3209183842, 4.6422726719, 4.3788135547]

对此有什么解释吗?

I'm running the following Python script:

#!/usr/bin/python

import os,sys
from scipy import stats
import numpy as np

f = open('data2.txt', 'r').readlines()
for i in range(0, len(f)-1):
    l1 = f[i].split()
    list1 = [float(x) for x in l1]

But I got the error below:

ValueError: could not convert string to float: id

I'm confused by this because when I try this for only one line in interactive section, instead of for loop using script, it works well:

from scipy import stats
import numpy as np

f = open('data2.txt','r').readlines()
l1 = f[1].split()
list1 = [float(x) for x in l1]
list1
# [5.3209183842, 4.6422726719, 4.3788135547]

What is the explanation a little bit about this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

讽刺将军 2024-12-27 05:08:31

显然,您的某些行没有有效的浮点数据,特别是某些行具有无法转换为浮点的文本id

当您在交互式提示中尝试时,您只尝试第一行,因此最好的方法是打印出现此错误的行,您就会知道错误的行,例如

#!/usr/bin/python

import os,sys
from scipy import stats
import numpy as np

f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    try:
        list1=[float(x) for x in l1]
        list2=[float(x) for x in l2]
    except ValueError,e:
        print "error",e,"on line",i
    result=stats.ttest_ind(list1,list2)
    print result[1]

Obviously some of your lines don't have valid float data, specifically some line have text id which can't be converted to float.

When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.

#!/usr/bin/python

import os,sys
from scipy import stats
import numpy as np

f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    try:
        list1=[float(x) for x in l1]
        list2=[float(x) for x in l2]
    except ValueError,e:
        print "error",e,"on line",i
    result=stats.ttest_ind(list1,list2)
    print result[1]
强辩 2024-12-27 05:08:31

我的错误非常简单:包含数据的文本文件的最后一行有一些空格(因此不可见)字符。

作为 grep 的输出,我得到了 45 而不仅仅是 45

My error was very simple: the text file containing the data had some space (so not visible) character on the last line.

As an output of grep, I had 45  instead of just 45

一片旧的回忆 2024-12-27 05:08:31

这个错误非常详细:

ValueError: could not convert string to float: id

在文本文件的某处,一行中包含单词id,它实际上无法转换为数字。

您的测试代码之所以有效,是因为 line 2 中不存在 id 一词。


如果您想捕获该行,请尝试此代码。我清理了你的代码:

#!/usr/bin/python

import os, sys
from scipy import stats
import numpy as np

for index, line in enumerate(open('data2.txt', 'r').readlines()):
    w = line.split(' ')
    l1 = w[1:8]
    l2 = w[8:15]

    try:
        list1 = map(float, l1)
        list2 = map(float, l2)
    except ValueError:
        print 'Line {i} is corrupt!'.format(i = index)'
        break

    result = stats.ttest_ind(list1, list2)
    print result[1]

This error is pretty verbose:

ValueError: could not convert string to float: id

Somewhere in your text file, a line has the word id in it, which can't really be converted to a number.

Your test code works because the word id isn't present in line 2.


If you want to catch that line, try this code. I cleaned your code up a tad:

#!/usr/bin/python

import os, sys
from scipy import stats
import numpy as np

for index, line in enumerate(open('data2.txt', 'r').readlines()):
    w = line.split(' ')
    l1 = w[1:8]
    l2 = w[8:15]

    try:
        list1 = map(float, l1)
        list2 = map(float, l2)
    except ValueError:
        print 'Line {i} is corrupt!'.format(i = index)'
        break

    result = stats.ttest_ind(list1, list2)
    print result[1]
給妳壹絲溫柔 2024-12-27 05:08:31

对于包含一列带逗号的数字的 Pandas 数据框,请使用:

df["Numbers"] = [float(str(i).replace(",", "")) for i in df["Numbers"]]

因此像 4,200.42 这样的值将被转换为 4200.42 作为浮点数。

奖励 1:这很快

奖励 2:如果将该数据帧保存在类似 Apache Parquet 格式。

For a Pandas dataframe with a column of numbers with commas, use this:

df["Numbers"] = [float(str(i).replace(",", "")) for i in df["Numbers"]]

So values like 4,200.42 would be converted to 4200.42 as a float.

Bonus 1: This is fast.

Bonus 2: More space efficient if saving that dataframe in something like Apache Parquet format.

提笔书几行 2024-12-27 05:08:31

也许你的数字实际上并不是数字,而是伪装成数字的字母?

就我而言,我使用的字体意味着“l”和“1”看起来非常相似。我有一个像“l1919”这样的字符串,我认为它是“11919”,这把事情搞砸了。

Perhaps your numbers aren't actually numbers, but letters masquerading as numbers?

In my case, the font I was using meant that "l" and "1" looked very similar. I had a string like 'l1919' which I thought was '11919' and that messed things up.

思念满溢 2024-12-27 05:08:31

您的数据可能不是您所期望的——看起来您正在期望浮动,但没有得到浮动。

找出发生这种情况的一个简单解决方案是在 for 循环中添加 try/ except :

for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    try:
      list1=[float(x) for x in l1]
      list2=[float(x) for x in l2]
    except ValueError, e:
      # report the error in some way that is helpful -- maybe print out i
    result=stats.ttest_ind(list1,list2)
    print result[1]

Your data may not be what you expect -- it seems you're expecting, but not getting, floats.

A simple solution to figuring out where this occurs would be to add a try/except to the for-loop:

for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    try:
      list1=[float(x) for x in l1]
      list2=[float(x) for x in l2]
    except ValueError, e:
      # report the error in some way that is helpful -- maybe print out i
    result=stats.ttest_ind(list1,list2)
    print result[1]
隔岸观火 2024-12-27 05:08:31

最短的方法:

df["id"] = df['id'].str.replace(',', '').astype(float) - 如果','是问题

df["id"] = df['id'].str.replace(' ', '').astype(float) - 如果空格是问题所在

Shortest way:

df["id"] = df['id'].str.replace(',', '').astype(float) - if ',' is the problem

df["id"] = df['id'].str.replace(' ', '').astype(float) - if blank space is the problem

伏妖词 2024-12-27 05:08:31

在 pandas 中,

当使用 astype() 将 pandas 列的 dtype 从 object 更改为 float 时,通常会出现此错误(或非常类似的错误)或apply()。原因是存在无法转换为浮点数的非数字字符串。一种解决方案是使用 pd.to_numeric() 代替,并传递 errors='coerce' 。这会将非数字值(例如文字字符串 'id')替换为 NaN。

df = pd.DataFrame({'col': ['id', '1.5', '2.4']})

df['col'] = df['col'].astype(float)                     # <---- ValueError: could not convert string to float: 'id'
df['col'] = df['col'].apply(lambda x: float(x))         # <---- ValueError

df['col'] = pd.to_numeric(df['col'], errors='coerce')   # <---- OK
#                                    ^^^^^^^^^^^^^^^ <--- converts non-numbers to NaN


0    NaN
1    1.5
2    2.4
Name: col, dtype: float64

pd.to_numeric() 仅适用于单个列,因此如果您需要一次性更改多个列的数据类型(类似于 .astype(float) 可能是使用),然后将其传递给 apply() 应该可以完成这项工作。

df = pd.DataFrame({'col1': ['id', '1.5', '2.4'], 'col2': ['10.2', '21.3', '20.6']})
df[['col1', 'col2']] = df.apply(pd.to_numeric, errors='coerce')


   col1  col2
0   NaN  10.2
1   1.5  21.3
2   2.4  20.6

有时存在数千个分隔符逗号,这会引发类似的错误:

ValueError: could not convert string to float: '2,000.4'

在这种情况下,首先在 pd.to_numeric() 调用之前删除它们可以解决问题。

df = pd.DataFrame({'col': ['id', '1.5', '2,000.4']})
df['col'] = df['col'].replace(regex=',', value='')
#                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^  <--- remove commas
df['col'] = pd.to_numeric(df['col'], errors='coerce')


0       NaN
1       1.5
2    2000.4
Name: col, dtype: float64

在 scikit-learn 中,

当您将包含字符串的数据拟合到需要数字数据的模型时,也会引发此错误。一个例子是各种缩放器,例如StandardScaler()。在这种情况下,解决方案是通过 one-hot 或标签将文本输入编码为数字输入来处理数据。下面是一个示例,其中字符串输入首先进行单热编码,然后输入缩放器模型。

from sklearn.preprocessing import StandardScaler, OneHotEncoder
data = [['a'], ['b'], ['c']]
sc = StandardScaler().fit(data)  # <--- ValueError: could not convert string to float: 'a'


data = OneHotEncoder().fit_transform(data).toarray()
sc = StandardScaler().fit(data)  # <--- OK

In pandas

This error (or a very similar error) commonly appears when changing the dtype of a pandas column from object to float using astype() or apply(). The cause is there are non-numeric strings that cannot be converted into floats. One solution is to use pd.to_numeric() instead, with errors='coerce' passed. This replaces non-numeric values such as the literal string 'id' to NaN.

df = pd.DataFrame({'col': ['id', '1.5', '2.4']})

df['col'] = df['col'].astype(float)                     # <---- ValueError: could not convert string to float: 'id'
df['col'] = df['col'].apply(lambda x: float(x))         # <---- ValueError

df['col'] = pd.to_numeric(df['col'], errors='coerce')   # <---- OK
#                                    ^^^^^^^^^^^^^^^ <--- converts non-numbers to NaN


0    NaN
1    1.5
2    2.4
Name: col, dtype: float64

pd.to_numeric() works only on individual columns, so if you need to change the dtype of multiple columns in one go (similar to how .astype(float) may be used), then passing it to apply() should do the job.

df = pd.DataFrame({'col1': ['id', '1.5', '2.4'], 'col2': ['10.2', '21.3', '20.6']})
df[['col1', 'col2']] = df.apply(pd.to_numeric, errors='coerce')


   col1  col2
0   NaN  10.2
1   1.5  21.3
2   2.4  20.6

Sometimes there are thousands separator commas, which throws a similar error:

ValueError: could not convert string to float: '2,000.4'

in which case, first removing them before the pd.to_numeric() call solves the issue.

df = pd.DataFrame({'col': ['id', '1.5', '2,000.4']})
df['col'] = df['col'].replace(regex=',', value='')
#                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^  <--- remove commas
df['col'] = pd.to_numeric(df['col'], errors='coerce')


0       NaN
1       1.5
2    2000.4
Name: col, dtype: float64

In scikit-learn

This error is also raised when you fit data containing strings to models that expects numeric data. One example is various scalers e.g. StandardScaler(). In that case, the solution is to process the data by one-hot or label encoding the text input into a numeric input. Below is an example where a string input is one-hot encoded first and fed into a scaler model.

from sklearn.preprocessing import StandardScaler, OneHotEncoder
data = [['a'], ['b'], ['c']]
sc = StandardScaler().fit(data)  # <--- ValueError: could not convert string to float: 'a'


data = OneHotEncoder().fit_transform(data).toarray()
sc = StandardScaler().fit(data)  # <--- OK
诗笺 2024-12-27 05:08:31

将空字符串值更新为 0.0 值:
如果您知道可能的非浮点值,则更新它。

df.loc[df['score'] == '', 'score'] = 0.0


df['score']=df['score'].astype(float)

Update empty string values with 0.0 values:
if you know the possible non-float values then update it.

df.loc[df['score'] == '', 'score'] = 0.0


df['score']=df['score'].astype(float)
悲欢浪云 2024-12-27 05:08:31

我使用 pandas 使用基本技术解决了类似的情况。首先使用 pandas 加载 csv 或文本文件。这非常简单,

data = pd.read_excel('link to the file')

然后将数据索引设置为需要更改的相关列。例如,如果您的数据将 ID 作为一个属性或列,则将索引设置为 ID。

data = data.set_index("ID")

然后使用以下命令删除所有以“id”作为值而不是数字的行。

data = data.drop("id", axis=0)

I solved the similar situation with basic technique using pandas. First load the csv or text file using pandas.It's pretty simple

data = pd.read_excel('link to the file')

Then set the index of data to the respected column that needs to be changed. For example, if your data has ID as one attribute or column, then set index to ID.

data = data.set_index("ID")

Then delete all the rows with "id" as the value instead of number using following command.

data = data.drop("id", axis=0)
或十年 2024-12-27 05:08:31

对于 pandas 数据框或系列,当您收到此错误时,请执行以下操作:

import pandas as pd

df["columns1"] = pd.to_number(df["column1"] , errors='coerce')

For a pandas data frame or series when you get this error do this:

import pandas as pd

df["columns1"] = pd.to_number(df["column1"] , errors='coerce')
Spring初心 2024-12-27 05:08:31

处理数据中这些类型的错误值的一个好选择是在 read_csv 步骤中通过指定 na_values 将其删除。这将识别字符串以识别为 NA/NaN。

默认情况下,以下值被解释为 NaN:''、'#N/A'、'#N/AN/A'、'#NA'、'-1.#IND'、'-1.#QNAN'、 '-NaN', '-nan', '1.#IND', '1.#QNAN', '', 'N/A', 'NA', 'NULL', 'NaN', “无”、“不适用”、“南”、“空”。所以在你的情况下,因为它抱怨数据中的字符串“id”。您可以执行以下操作:

df = pd.read_csv('file.csv', na_values = ['id'])

这会将其中包含“id”的列的值指定为空,并解决在感兴趣的列上运行分析时的值错误

A good option to handle these types of erroneous values in the data is to remove it at the read_csv step by specifying na_values. This will identify strings to recognize as NA/NaN.

By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘None’, ‘n/a’, ‘nan’, ‘null’. So in your case, since it's complaining about the string 'id' in the data. you could do the following:

df = pd.read_csv('file.csv', na_values = ['id'])

This will specify values the columns with 'id' in them as null and resolve the value error when running analysis on the column of interest

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文