K折的迭代训练模型吗

发布于 2025-02-01 21:51:01 字数 224 浏览 2 评论 0原文

如果您在数据集上运行cross-val_score()或cross_validate(),是否在运行结束时使用所有折叠训练了估算器?

我在某个地方阅读了cross-val_score获取估算器的副本。而我认为这是您使用k折训练模型的方式。

或者,在cross_validate()或cross_val_score()的末尾,您有一个估计器,然后将其用于preditive()

我的想法正确吗?

If you run cross-val_score() or cross_validate() on a dataset, is the estimator trained using all the folds at the end of the run?

I read somewhere that cross-val_score takes a copy of the estimator. Whereas I thought this was how you train a model using k-fold.

Or, at the end of the cross_validate() or cross_val_score() you have a single estimator and then use that for predict()

Is my thinking correct?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

瘫痪情歌 2025-02-08 21:51:01

您可以参考 norelowl noreferrer“> sklearn-document skein> in。

如果您进行3倍的交叉验证,则

  • Sklearn将使您的数据集分为3个部分。 (例如,第一部分包含第一行,第二部分包含第4-6行,等等)
  • Sklearn itterate将3次训练新型号的训练集和不同的训练集和验证集
    • 在第一轮中,它将第一部分和第二部分组合在一起,并将其用作训练集,并使用第三部分测试模型。
    • 在第二轮中,它将第一和第三部分组合在一起,并将其用作训练集,并使用第二部分测试模型。
    • 等等。

因此,在使用跨validate之后,您将获得三个模型。如果需要每个回合的模型对象,则可以添加参数return_estimato = true。结果是字典将具有另一个名为估算器的密钥,其中包含每个培训的估计器列表。

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.metrics import confusion_matrix
from sklearn.svm import LinearSVC
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cv_results = cross_validate(lasso, X, y, cv=3, return_estimator=True)
print(sorted(cv_results.keys()))
#Output: ['estimator', 'fit_time', 'score_time', 'test_score']
cv_results['estimator']
#Output: [Lasso(), Lasso(), Lasso()]

但是,实际上,交叉验证方法仅用于测试模型。找到良好的模型和参数设置后,可以为您提供高跨验证评分。如果您再次使用整个训练集合并使用测试集测试模型,那将是更好的。

You can refer to sklearn-document here.

If you do 3-Fold cross validation,

  • the sklearn will split your dataset to 3 parts. (For example, the 1st part contains 1st-3rd rows, 2nd part contains 4th-6th rows, and so on)
  • sklearn iterate to train new model 3 times with different training set and validation set
    • In the first round, it combine 1st and 2nd part together and use it as training set and test the model with 3rd part.
    • In the second round, it combine 1st and 3rd part together and use it as training set and test the model with 2nd part.
    • and so on.

So, after using cross-validate, you will get three models. If you want the model objects of each round, you can add parameter return_estimato=True. The result which is the dictionary will have another key named estimator containing the list of estimator of each training.

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.metrics import confusion_matrix
from sklearn.svm import LinearSVC
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cv_results = cross_validate(lasso, X, y, cv=3, return_estimator=True)
print(sorted(cv_results.keys()))
#Output: ['estimator', 'fit_time', 'score_time', 'test_score']
cv_results['estimator']
#Output: [Lasso(), Lasso(), Lasso()]

However, in practice, the cross validation method is used only for testing the model. After you found the good model and parameter setting that give you the high cross-validation score. It will be better if you fit the model with the whole training set again and test the model with the testing set.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文