如何做GridSearchCV选择Skorch中的BEST_SCORE

发布于 2025-02-07 02:55:48 字数 4832 浏览 0 评论 0原文

我有一个教练课,并且我定义了我的分数方法,

class Trainer(skorch.NeuralNet):
     """
     Other methods such as train_step_single,validation_step,infer,predict e.tc.
     """
        def score(self, X, Y):
           true_text_list,pred_text_list =[],[]
           for src,tgt in zip(X,Y):
               true_text, pred_text = self.predict(src,tgt)
               true_text_list.append(true_text),pred_text_list.append(pred_text)
           result = bleu.corpus_score(pred_text_list,[true_text_list])
           score = result.score
           print(score)
           return score

初始化了培训师类,

epoch_bleu = EpochScoring(scoring=None, lower_is_better=False)

trainer = Trainer(
            module = Seq2SeqTransformer,
            module__num_encoder_layers = 1,
            module__num_decoder_layers = 1,
            module__emb_size = 8,
            module__nhead = 2,
            module__src_vocab_size = SRC_VOCAB_SIZE,
            module__tgt_vocab_size =TGT_VOCAB_SIZE,
            module__dim_feedforward = 8,
            module__dropout = 0.2,
            criterion = criterion,
            optimizer=torch.optim.Adam,
            lr=0.001,
            batch_size = 5, 
            dataset= translate_dataset,
            optimizer__betas=(0.9, 0.98),
            optimizer__eps=1e-9,
            train_split= split,
            max_epochs=3,
            callbacks=[epoch_bleu],
            device=DEVICE,
           )

我使用fit方法

trainer.fit(translate_dataset)
gs = GridSearchCV(trainer,params,scoring= None)
gs.fit(train_X ,train_y)

结果是我的结果

 0.026192733573541495
  epoch    score    train_loss    valid_loss     dur
-------  -------  ------------  ------------  ------
      1   0.0262       10.7645       10.7903  1.1093
0.026192733573541495
      2   0.0262       10.7644       10.7906  1.0927
0.026192733573541495
      3   0.0262       10.7645       10.7903  1.0986
0.026192733573541495

print(gs.best_score_, gs.best_params_)
0.010477093429416598 {'lr': 0.003, 'module__dropout': 0.2}

是我的gs.cv_results _

{'mean_fit_time': array([11.94453502, 13.77196174, 14.05028057, 14.40144386, 14.16046715,
        12.6896481 , 11.84360743, 15.2292047 , 12.22255855]),
 'std_fit_time': array([0.2618251 , 1.07753732, 2.36808454, 1.23219222, 1.86023803,
        1.01203765, 0.28616016, 0.74719985, 0.61968229]),
 'mean_score_time': array([2.27710428, 2.91913524, 2.32345171, 2.53260927, 2.43208547,
        2.31732111, 2.22918859, 2.57366171, 2.31969309]),
 'std_score_time': array([0.1341744 , 0.34804386, 0.1126    , 0.28583804, 0.15907173,
        0.25955979, 0.0675016 , 0.30772863, 0.16703222]),
 'param_lr': masked_array(data=[0.001, 0.001, 0.001, 0.002, 0.002, 0.002, 0.003, 0.003,
                    0.003],
              mask=[False, False, False, False, False, False, False, False,
                    False],
        fill_value='?',
             dtype=object),
 'param_module__dropout': masked_array(data=[0.1, 0.2, 0.3, 0.1, 0.2, 0.3, 0.1, 0.2, 0.3],
              mask=[False, False, False, False, False, False, False, False,
                    False],
        fill_value='?',
             dtype=object),
 'params': [{'lr': 0.001, 'module__dropout': 0.1},
  {'lr': 0.001, 'module__dropout': 0.2},
  {'lr': 0.001, 'module__dropout': 0.3},
  {'lr': 0.002, 'module__dropout': 0.1},
  {'lr': 0.002, 'module__dropout': 0.2},
  {'lr': 0.002, 'module__dropout': 0.3},
  {'lr': 0.003, 'module__dropout': 0.1},
  {'lr': 0.003, 'module__dropout': 0.2},
  {'lr': 0.003, 'module__dropout': 0.3}],
 'split0_test_score': array([0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 'split1_test_score': array([0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 'split2_test_score': array([0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.03447158, 0.        ]),
 'split3_test_score': array([0.02619273, 0.        , 0.02619273, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        ]),
 'split4_test_score': array([0.02619273, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        ]),
 'mean_test_score': array([0.01047709, 0.        , 0.00523855, 0.        , 0.        ,
        0.        , 0.        , 0.00689432, 0.        ]),
 'std_test_score': array([0.01283177, 0.        , 0.01047709, 0.        , 0.        ,
        0.        , 0.        , 0.01378863, 0.        ]),
 'rank_test_score': array([1, 4, 3, 4, 4, 4, 4, 2, 4], dtype=int32)}

best_score _不使用我的分数方法选择最佳分数。我对我做错了什么有些困惑。

我的问题是

  1. gridSearchCV使用哪些函数来选择最佳分数,如果我使用评分= none,它使用我的score()函数在培训师中定义班级。
  2. 有什么方法可以使GridSearchCV使用我定义的方法选择最佳分数。

I have a trainer class and I define my score method

class Trainer(skorch.NeuralNet):
     """
     Other methods such as train_step_single,validation_step,infer,predict e.tc.
     """
        def score(self, X, Y):
           true_text_list,pred_text_list =[],[]
           for src,tgt in zip(X,Y):
               true_text, pred_text = self.predict(src,tgt)
               true_text_list.append(true_text),pred_text_list.append(pred_text)
           result = bleu.corpus_score(pred_text_list,[true_text_list])
           score = result.score
           print(score)
           return score

I initialized the trainer class with

epoch_bleu = EpochScoring(scoring=None, lower_is_better=False)

trainer = Trainer(
            module = Seq2SeqTransformer,
            module__num_encoder_layers = 1,
            module__num_decoder_layers = 1,
            module__emb_size = 8,
            module__nhead = 2,
            module__src_vocab_size = SRC_VOCAB_SIZE,
            module__tgt_vocab_size =TGT_VOCAB_SIZE,
            module__dim_feedforward = 8,
            module__dropout = 0.2,
            criterion = criterion,
            optimizer=torch.optim.Adam,
            lr=0.001,
            batch_size = 5, 
            dataset= translate_dataset,
            optimizer__betas=(0.9, 0.98),
            optimizer__eps=1e-9,
            train_split= split,
            max_epochs=3,
            callbacks=[epoch_bleu],
            device=DEVICE,
           )

When I use the fit method

trainer.fit(translate_dataset)
gs = GridSearchCV(trainer,params,scoring= None)
gs.fit(train_X ,train_y)

My result is

 0.026192733573541495
  epoch    score    train_loss    valid_loss     dur
-------  -------  ------------  ------------  ------
      1   0.0262       10.7645       10.7903  1.1093
0.026192733573541495
      2   0.0262       10.7644       10.7906  1.0927
0.026192733573541495
      3   0.0262       10.7645       10.7903  1.0986
0.026192733573541495

print(gs.best_score_, gs.best_params_)
0.010477093429416598 {'lr': 0.003, 'module__dropout': 0.2}

This is my gs.cv_results_

{'mean_fit_time': array([11.94453502, 13.77196174, 14.05028057, 14.40144386, 14.16046715,
        12.6896481 , 11.84360743, 15.2292047 , 12.22255855]),
 'std_fit_time': array([0.2618251 , 1.07753732, 2.36808454, 1.23219222, 1.86023803,
        1.01203765, 0.28616016, 0.74719985, 0.61968229]),
 'mean_score_time': array([2.27710428, 2.91913524, 2.32345171, 2.53260927, 2.43208547,
        2.31732111, 2.22918859, 2.57366171, 2.31969309]),
 'std_score_time': array([0.1341744 , 0.34804386, 0.1126    , 0.28583804, 0.15907173,
        0.25955979, 0.0675016 , 0.30772863, 0.16703222]),
 'param_lr': masked_array(data=[0.001, 0.001, 0.001, 0.002, 0.002, 0.002, 0.003, 0.003,
                    0.003],
              mask=[False, False, False, False, False, False, False, False,
                    False],
        fill_value='?',
             dtype=object),
 'param_module__dropout': masked_array(data=[0.1, 0.2, 0.3, 0.1, 0.2, 0.3, 0.1, 0.2, 0.3],
              mask=[False, False, False, False, False, False, False, False,
                    False],
        fill_value='?',
             dtype=object),
 'params': [{'lr': 0.001, 'module__dropout': 0.1},
  {'lr': 0.001, 'module__dropout': 0.2},
  {'lr': 0.001, 'module__dropout': 0.3},
  {'lr': 0.002, 'module__dropout': 0.1},
  {'lr': 0.002, 'module__dropout': 0.2},
  {'lr': 0.002, 'module__dropout': 0.3},
  {'lr': 0.003, 'module__dropout': 0.1},
  {'lr': 0.003, 'module__dropout': 0.2},
  {'lr': 0.003, 'module__dropout': 0.3}],
 'split0_test_score': array([0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 'split1_test_score': array([0., 0., 0., 0., 0., 0., 0., 0., 0.]),
 'split2_test_score': array([0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.03447158, 0.        ]),
 'split3_test_score': array([0.02619273, 0.        , 0.02619273, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        ]),
 'split4_test_score': array([0.02619273, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        ]),
 'mean_test_score': array([0.01047709, 0.        , 0.00523855, 0.        , 0.        ,
        0.        , 0.        , 0.00689432, 0.        ]),
 'std_test_score': array([0.01283177, 0.        , 0.01047709, 0.        , 0.        ,
        0.        , 0.        , 0.01378863, 0.        ]),
 'rank_test_score': array([1, 4, 3, 4, 4, 4, 4, 2, 4], dtype=int32)}

best_score_ doesn't use my score method to choose the best score. I am a little confused, about what I am doing wrong.

My questions are

  1. What function is GridSearchCV using to select the best score, according to Skorch documentation if I use scoring = None, it uses my score() function defined in the trainer class.
  2. Is there a way I can make GridSearchCV use my defined method to choose the best score.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文