from_config和在huggingface中的差异

发布于 2025-02-09 03:13:06 字数 1315 浏览 2 评论 0原文

num_labels = 3 if task.startswith("mnli") else 1 if task=="stsb" else 2
preconfig = DistilBertConfig(n_layers=6)
    
model1 = AutoModelForSequenceClassification.from_config(preconfig)
model2 = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

我正在修改(上面提供了修改的代码),以通过from_config来测试Distilbert变形金刚层的深度大小,因为从我的知识中from_pretretaining使用6层,因为在论文第3部分中,他们说:

我们通过将两层

中的一层从老师那里初始化。

而我要测试的是各种尺寸的层。要测试两者是否相同,我尝试运行from_config 使用n_layers = 6因为基于文档 n_layers用于确定变压器块深度。但是,当我运行model1model2时,我发现使用SST-2数据集,准确性:

  • model1仅实现0.8073
  • model2实现的0.901

如果它们的行为相同,我希望结果有些相似,但是10%的下降是显着的下降,因此我相信HA会有差异在功能之间。该方法差异的背后有原因(例如model1尚未应用超参数搜索),并且有没有办法使这两个功能的行为相同?谢谢你!

num_labels = 3 if task.startswith("mnli") else 1 if task=="stsb" else 2
preconfig = DistilBertConfig(n_layers=6)
    
model1 = AutoModelForSequenceClassification.from_config(preconfig)
model2 = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

I am modifying this code (modified code is provided above) to test DistilBERT transformer layer depth size via from_config since from my knowledge from_pretrained uses 6 layers because in the paper section 3 they said:

we initialize the student from the teacher by taking one layer out of two

While what I want to test is various sizes of layers. To test whether both are the same, I tried running the from_config
with n_layers=6 because based on the documentation DistilBertConfig the n_layers is used to determine the transformer block depth. However as I run model1 and model2 I found that with SST-2 dataset, in accuracy:

  • model1 achieved only 0.8073
  • model2 achieved 0.901

If they both behave the same I expect the result to be somewhat similar but 10% drop is a significant drop, therefore I believe there ha to be a difference between the functions. Is there a reason behind the difference of the approach (for example model1 has not yet applied hyperparameter search) and is there a way to make both functions behave the same? Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

江城子 2025-02-16 03:13:06

您描述的两个函数,from_configfrom_pretrentaining,行为不相同。对于模型m,具有参考r:

  • from_config允许您实例化a blank 模型,该模型具有与您选择的模型相同的配置(相同的形状):M在训练
  • from_pretretaining允许您加载A 预读模型之前,是 ,该模型已经在特定数据集上进行了培训时期:M训练后的r

引用文档,注意:从其配置文件中加载模型不会加载模型权重。它仅影响模型的配置。使用from_pretrated()加载模型权重。

The two functions you described, from_config and from_pretrained, do not behave the same. For a model M, with a reference R:

  • from_config allows you to instantiate a blank model, which has the same configuration (the same shape) as your model of choice: M is as R was before training
  • from_pretrained allows you to load a pretrained model, which has already been trained on a specific dataset for a given number of epochs: M is as R after training.

To cite the doc, Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文