当前位置：文江博客话题详情

K折的迭代训练模型吗

发布于 2025-02-01 21:51:01 字数 224 浏览 2 评论 0原文

如果您在数据集上运行cross-val_score（）或cross_validate（），是否在运行结束时使用所有折叠训练了估算器？

我在某个地方阅读了cross-val_score获取估算器的副本。而我认为这是您使用k折训练模型的方式。

或者，在cross_validate（）或cross_val_score（）的末尾，您有一个估计器，然后将其用于preditive（）

我的想法正确吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瘫痪情歌 2025-02-08 21:51:01

您可以参考 norelowl noreferrer“> sklearn-document skein> in。

如果您进行3倍的交叉验证，则

Sklearn将使您的数据集分为3个部分。（例如，第一部分包含第一行，第二部分包含第4-6行，等等）
Sklearn itterate将3次训练新型号的训练集和不同的训练集和验证集
- 在第一轮中，它将第一部分和第二部分组合在一起，并将其用作训练集，并使用第三部分测试模型。
- 在第二轮中，它将第一和第三部分组合在一起，并将其用作训练集，并使用第二部分测试模型。
- 等等。

因此，在使用跨validate之后，您将获得三个模型。如果需要每个回合的模型对象，则可以添加参数return_estimato = true。结果是字典将具有另一个名为估算器的密钥，其中包含每个培训的估计器列表。

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.metrics import confusion_matrix
from sklearn.svm import LinearSVC
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cv_results = cross_validate(lasso, X, y, cv=3, return_estimator=True)
print(sorted(cv_results.keys()))
#Output: ['estimator', 'fit_time', 'score_time', 'test_score']
cv_results['estimator']
#Output: [Lasso(), Lasso(), Lasso()]

但是，实际上，交叉验证方法仅用于测试模型。找到良好的模型和参数设置后，可以为您提供高跨验证评分。如果您再次使用整个训练集合并使用测试集测试模型，那将是更好的。

You can refer to sklearn-document here.

If you do 3-Fold cross validation,

the sklearn will split your dataset to 3 parts. (For example, the 1st part contains 1st-3rd rows, 2nd part contains 4th-6th rows, and so on)
sklearn iterate to train new model 3 times with different training set and validation set
- In the first round, it combine 1st and 2nd part together and use it as training set and test the model with 3rd part.
- In the second round, it combine 1st and 3rd part together and use it as training set and test the model with 2nd part.
- and so on.

So, after using cross-validate, you will get three models. If you want the model objects of each round, you can add parameter return_estimato=True. The result which is the dictionary will have another key named estimator containing the list of estimator of each training.

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.metrics import confusion_matrix
from sklearn.svm import LinearSVC
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cv_results = cross_validate(lasso, X, y, cv=3, return_estimator=True)
print(sorted(cv_results.keys()))
#Output: ['estimator', 'fit_time', 'score_time', 'test_score']
cv_results['estimator']
#Output: [Lasso(), Lasso(), Lasso()]

However, in practice, the cross validation method is used only for testing the model. After you found the good model and parameter setting that give you the high cross-validation score. It will be better if you fit the model with the whole training set again and test the model with the testing set.

回复收藏 0 原文

~没有更多了~