ValueError:在拟合期间在第 2 列中发现未知类别 [nan]

发布于 2025-01-20 11:45:59 字数 1744 浏览 0 评论 0 原文

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split,cross_val_score
from sklearn.tree import DecisionTreeClassifier

path = r"C:\Users\thund\Downloads\Boat.csv"
data = pd.read_csv(path)  # pip install xlrd

print(data.shape)

print(data.columns)

print(data.isnull().sum())
print (data.dropna(axis=0))  #dropping rows that have missing values

print (data['Class'].value_counts())

print(data['Class'].value_counts().plot(kind = 'bar'))
#plt.show()

data['safety'].value_counts().plot(kind = 'bar')
#plt.show()


import seaborn as sns
sns.countplot(data['demand'], hue = data['Class'])
#plt.show()

X = data.drop(['Class'], axis = 1)
y = data['Class']

from sklearn.preprocessing import OrdinalEncoder
demand_category = ['low', 'med', 'high', 'vhigh']
maint_category = ['low', 'med', 'high', 'vhigh']
seats_category = ['2', '3', '4', '5more']
passenger_category = ['2', '4', 'more']
storage_category = ['Nostorage', 'small', 'med']
safety_category = ['poor', 'good', 'vgood']
all_categories = [demand_category, maint_category,seats_category,passenger_category,storage_category,safety_category]


oe = OrdinalEncoder(categories= all_categories)
X = oe.fit_transform( data[['demand','maint', 'seats', 'passenger', 'storage', 'safety']])

数据集:

对于上述代码,我一直在拟合期间在第2列中获得此“ ValueError:valueerror:发现的未知类别[NAN]”。我尝试删除所有缺失的值。我尝试搜索修复程序,发现某人关于使用handing_unknown =“ imploore”的建议,但我认为它不适用于序数编码。 我对Python是个新手,因此,如果有人可以对为什么会发生这种情况以及如何努力解决它的深入分析,这将非常感谢它。

PS:这是用于预处理数据。

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split,cross_val_score
from sklearn.tree import DecisionTreeClassifier

path = r"C:\Users\thund\Downloads\Boat.csv"
data = pd.read_csv(path)  # pip install xlrd

print(data.shape)

print(data.columns)

print(data.isnull().sum())
print (data.dropna(axis=0))  #dropping rows that have missing values

print (data['Class'].value_counts())

print(data['Class'].value_counts().plot(kind = 'bar'))
#plt.show()

data['safety'].value_counts().plot(kind = 'bar')
#plt.show()


import seaborn as sns
sns.countplot(data['demand'], hue = data['Class'])
#plt.show()

X = data.drop(['Class'], axis = 1)
y = data['Class']

from sklearn.preprocessing import OrdinalEncoder
demand_category = ['low', 'med', 'high', 'vhigh']
maint_category = ['low', 'med', 'high', 'vhigh']
seats_category = ['2', '3', '4', '5more']
passenger_category = ['2', '4', 'more']
storage_category = ['Nostorage', 'small', 'med']
safety_category = ['poor', 'good', 'vgood']
all_categories = [demand_category, maint_category,seats_category,passenger_category,storage_category,safety_category]


oe = OrdinalEncoder(categories= all_categories)
X = oe.fit_transform( data[['demand','maint', 'seats', 'passenger', 'storage', 'safety']])

Dataset: https://drive.google.com/file/d/1O0sYZGJep4JkrSgGeJc5e_Nlao2bmegV/view?usp=sharing

For the mentioned code I keep getting this 'ValueError: Found unknown categories [nan] in column 2 during fit'. I have tried dropping all missing values. I tried searching for a fix and I found someone's suggestion on using handle_unknown="ignore", but I don't think it works for ordinal encoding.
I am fairly new to python so would deeply appreciate it if someone could give me an in-depth analysis of why this is happening and how can I work to fix it.

Ps: This is for pre-processing the data.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

似狗非友 2025-01-27 11:45:59

为了解释错误,您已经删除了“ NAN”,只是用丢弃的数据打印了数据帧。

根据您的数据集或错误,您在“座位”中具有“ NAN”值。

打印出 data ['seats']。

['2' '3' '4' '5more' nan]

  1.  `data.dropna(inplace = true)`
     

    这是什么是,它将原始数据框更新为其更新的值

  2. 手动分配:

     `data = data.dropna()`
     

    这确实做了“ Intplate”的作用,但不是那么有效,而是更易于理解的。

    希望这回答您的问题。

To explain the error, You have dropped the "NaN" and just printed the DataFrame with dropped data.

According to your dataset or the ERROR you have a value "NaN" in column "seats".

When you print out the data['seats'].unique(), You get something like this:

['2' '3' '4' '5more' nan]

There are two solutions:

  1. Using inplace :

    `data.dropna(inplace=True)`
    

    What this does is , it updates the original DataFrame to its updated value

  2. Manually assigning:

    `data = data.dropna()`
    

    This exactly does what 'inplace' does but its not that effecient but more understandable.

    Hope this answers your question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文