Kaggle竞赛：分类变量

发布于 2025-02-12 17:57:58 字数 3218 浏览 1 评论 0原文

在分类变量练习下，生成测试预测的最后一部分。我已经编写了以下代码，但是遇到了错误。我无法理解该错误，以及为什么它说X具有148个功能，而随机森林期望有155个功能。

我的代码：

ohencoder=OneHotEncoder(handle_unknown='ignore', sparse=False)

# X_test.dropna(axis=0, inplace=True)
h_cols_test = pd.DataFrame(ohencoder.fit_transform(X_test[low_cardinality_cols])) # Your code here

h_cols_test.index=X_test.index

num_X_test= X_test.drop(object_cols, axis=1)

OH_X_test=pd.concat([num_X_test, h_cols_test], axis=1)
#randomforest mode-----------------------------

model=RandomForestRegressor(n_estimators=100,  random_state=0)
model.fit(OH_X_train, y_train)

preds_test= model.predict(OH_X_test)
#output---------------

output=pd.DataFrame({'Id': X_test.index,
               'SalePrice': preds_test})
output.to_csv('submission.csv', index=False)

错误消息：

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
  FutureWarning,
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
  FutureWarning,
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_33/1524045498.py in <module>
     12 model.fit(OH_X_train, y_train)
     13 
---> 14 preds_test= model.predict(OH_X_test)
     15 
     16 output=pd.DataFrame({'Id': X_test.index,

/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in predict(self, X)
    969         check_is_fitted(self)
    970         # Check data
--> 971         X = self._validate_X_predict(X)
    972 
    973         # Assign chunk of trees to jobs

/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in _validate_X_predict(self, X)
    577         Validate X whenever one tries to predict, apply, predict_proba."""
    578         check_is_fitted(self)
--> 579         X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
    580         if issparse(X) and (X.indices.dtype != np.intc or X.indptr.dtype != np.intc):
    581             raise ValueError("No support for np.int64 index based sparse matrices")

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    583 
    584         if not no_val_X and check_params.get("ensure_2d", True):
--> 585             self._check_n_features(X, reset=reset)
    586 
    587         return out

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in _check_n_features(self, X, reset)
    399         if n_features != self.n_features_in_:
    400             raise ValueError(
--> 401                 f"X has {n_features} features, but {self.__class__.__name__} "
    402                 f"is expecting {self.n_features_in_} features as input."
    403             )

ValueError: X has 148 features, but RandomForestRegressor is expecting 155 features as input.

原文

Under the categorical variables exercise there is the last part of generating test predictions. I have written the following code but getting an error. I am unable to understand the error and why it's saying X has 148 features and random forest is expecting 155 features.

My code:

ohencoder=OneHotEncoder(handle_unknown='ignore', sparse=False)

# X_test.dropna(axis=0, inplace=True)
h_cols_test = pd.DataFrame(ohencoder.fit_transform(X_test[low_cardinality_cols])) # Your code here

h_cols_test.index=X_test.index

num_X_test= X_test.drop(object_cols, axis=1)

OH_X_test=pd.concat([num_X_test, h_cols_test], axis=1)
#randomforest mode-----------------------------

model=RandomForestRegressor(n_estimators=100,  random_state=0)
model.fit(OH_X_train, y_train)

preds_test= model.predict(OH_X_test)
#output---------------

output=pd.DataFrame({'Id': X_test.index,
               'SalePrice': preds_test})
output.to_csv('submission.csv', index=False)

Error message:

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
  FutureWarning,
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py:1692: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['int', 'str']. An error will be raised in 1.2.
  FutureWarning,
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_33/1524045498.py in <module>
     12 model.fit(OH_X_train, y_train)
     13 
---> 14 preds_test= model.predict(OH_X_test)
     15 
     16 output=pd.DataFrame({'Id': X_test.index,

/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in predict(self, X)
    969         check_is_fitted(self)
    970         # Check data
--> 971         X = self._validate_X_predict(X)
    972 
    973         # Assign chunk of trees to jobs

/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in _validate_X_predict(self, X)
    577         Validate X whenever one tries to predict, apply, predict_proba."""
    578         check_is_fitted(self)
--> 579         X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
    580         if issparse(X) and (X.indices.dtype != np.intc or X.indptr.dtype != np.intc):
    581             raise ValueError("No support for np.int64 index based sparse matrices")

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    583 
    584         if not no_val_X and check_params.get("ensure_2d", True):
--> 585             self._check_n_features(X, reset=reset)
    586 
    587         return out

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in _check_n_features(self, X, reset)
    399         if n_features != self.n_features_in_:
    400             raise ValueError(
--> 401                 f"X has {n_features} features, but {self.__class__.__name__} "
    402                 f"is expecting {self.n_features_in_} features as input."
    403             )

ValueError: X has 148 features, but RandomForestRegressor is expecting 155 features as input.

分享到QQ

分享到微博