逻辑回归can; t使用分类变量来训练我的模型
我想使用此分类变量训练我的模型是我的目标变量,
SelectedColumns=['workOrganiz' , 'education', 'maritalSt','jobType','ageGroup','workHoursPeriod','sex','lifequality']
我尝试运行这样的逻辑回归,
dfML=df[SelectedColumns]
list_of_results=[]
#train and test set stratified
X=dfML.iloc[:,:-1] #all features except last
y=dfML.iloc[:,-1] #target last column
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=15,stratify=y)
clf=LogisticRegression()
lrm=clf.fit(X_train,y_train)
y_pred=lrm.predict(X_test)
但是我会收到以下错误,
ValueError: could not convert string to float: 'Private'
我在做什么错? 使用假人使我的模型具有100%的精度和精度。
dfML=df[SelectedColumns]
dfML=pd.get_dummies(dfML)
如果我删除dfml = df [selectedcolumns],
I want to train my model using this categorical variables being lifequality my objective variable
SelectedColumns=['workOrganiz' , 'education', 'maritalSt','jobType','ageGroup','workHoursPeriod','sex','lifequality']
I try to run a logistic regression like this
dfML=df[SelectedColumns]
list_of_results=[]
#train and test set stratified
X=dfML.iloc[:,:-1] #all features except last
y=dfML.iloc[:,-1] #target last column
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=15,stratify=y)
clf=LogisticRegression()
lrm=clf.fit(X_train,y_train)
y_pred=lrm.predict(X_test)
but I get the following error
ValueError: could not convert string to float: 'Private'
What am I doing wrong?
Using dummies makes my model have a precision and accuracy of 100%
dfML=df[SelectedColumns]
dfML=pd.get_dummies(dfML)
If I remove the dfml=df[SelectedColumns] the 100% doesn't happen
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
回归算法只能使用“数字”来计算分类预测。您可以进行工作,并且仍然使用分类变量作为预测指标。有不同的方式,但一种简单的方式称为“虚拟编码”。您可以使用功能get_dummies()将分类伏击更改为多个0和1列。参见 https> https:https:// www。 geeksforgeeks.org/how-to-to-create-dummy-variables in-python-with-with-pandas/amp/
Regression algorithms can only use ‘numbers’ to calculate the categorical prediction. You can tho make a work around and still use categorical variables as predictors. There are different ways but a simple one is called ‘Dummy Coding’. You can use the functionality get_dummies() to change the categorical volumns into multiple 0 an 1 columns. See https://www.geeksforgeeks.org/how-to-create-dummy-variables-in-python-with-pandas/amp/