如何确保Make_Column_TransFormer正确标记对象?

发布于 2025-02-06 12:08:47 字数 1114 浏览 3 评论 0原文

我建立了XGBoost模型,进行了预测,并评估了该模型的准确性。但是,我遇到了在新数据框架上使用该模型的问题。

新的数据框架代码:

new_data = [['Academic', 'A', 'Male', 'Less Interested', 'Urban', 56, 6950000, 83.0, 84.09, 
False]]

new = pd.DataFrame(data=new_data, columns = ['type_school', 'school_accreditation', 'gender', 
'interest', 'residence', 'parent_age', 'parent_salary', 'house_area', 'average_grades', 
'parent_was_in_college'])

column_trans = make_column_transformer(
(OneHotEncoder(), ['type_school','school_accreditation',
              'gender','interest','residence','parent_was_in_college']),
     remainder='passthrough')

X_new = column_trans.fit_transform(new)

preds = optimal_params.predict(X_new)

运行上述代码后,我会收到以下错误:

"ValueError: feature_names mismatch: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 
'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18'] ['f0', 'f1', 'f2', 'f3', 
'f4', 'f5', 'f6', 'f7', 'f8', 'f9']
expected f17, f13, f18, f15, f10, f12, f16, f14, f11 in input data"

但是,column_trans是训练数据框架上使用的完全相同的错误,因此我不确定发生了什么。我的column_trans有什么问题吗?

I built an XGBoost model, made predictions, and evaluated the model's accuracy; however, I'm running into issues with using the model on a new DataFrame.

New DataFrame code:

new_data = [['Academic', 'A', 'Male', 'Less Interested', 'Urban', 56, 6950000, 83.0, 84.09, 
False]]

new = pd.DataFrame(data=new_data, columns = ['type_school', 'school_accreditation', 'gender', 
'interest', 'residence', 'parent_age', 'parent_salary', 'house_area', 'average_grades', 
'parent_was_in_college'])

column_trans = make_column_transformer(
(OneHotEncoder(), ['type_school','school_accreditation',
              'gender','interest','residence','parent_was_in_college']),
     remainder='passthrough')

X_new = column_trans.fit_transform(new)

preds = optimal_params.predict(X_new)

After running the above code, I get the following error:

"ValueError: feature_names mismatch: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 
'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18'] ['f0', 'f1', 'f2', 'f3', 
'f4', 'f5', 'f6', 'f7', 'f8', 'f9']
expected f17, f13, f18, f15, f10, f12, f16, f14, f11 in input data"

However, the column_trans is the exact same used on the training DataFrame, so I'm not sure what's going on. Is there something off about my column_trans?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

春夜浅 2025-02-13 12:08:48

据我了解,您不会保存column_trans,哪个fit在您的培训模型上。

这里的机制是

    1. 适合培训数据集
    1. 保存预处理程序(这是column_trans
    1. 当您进行推理(预测新数据)时,您会加载预处理器并进行变换

您可以在此上找到有关这些内容的更多信息。 70211411/19275378“>链接

As I understand, you dont save your column_trans, which fit on your training model.

The mechanism here is

    1. Fit on training dataset
    1. Save your preprocessor (here is column_trans)
    1. When you make inference (predict on new data), you load your preprocessor and make transform

You can find more information about these things on this link

于我来说 2025-02-13 12:08:47

运行预测时,应将新数据仅使用.transform(不是.fit_transform)进行转换。这是伪代码:

model = ... # some specification
model.fit(old_data) # learns the parameters
transformed_new_data = model.transform(new_data)

When running prediction, then new data should be just transformed with .transform (not .fit_transform). Here's pseudocode:

model = ... # some specification
model.fit(old_data) # learns the parameters
transformed_new_data = model.transform(new_data)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文