from_config和在huggingface中的差异
num_labels = 3 if task.startswith("mnli") else 1 if task=="stsb" else 2
preconfig = DistilBertConfig(n_layers=6)
model1 = AutoModelForSequenceClassification.from_config(preconfig)
model2 = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)
我正在修改(上面提供了修改的代码),以通过from_config
来测试Distilbert变形金刚层的深度大小,因为从我的知识中from_pretretaining
使用6层,因为在论文第3部分中,他们说:
我们通过将两层
中的一层从老师那里初始化。
而我要测试的是各种尺寸的层。要测试两者是否相同,我尝试运行from_config
使用n_layers = 6
因为基于文档 n_layers
用于确定变压器块深度。但是,当我运行model1
和model2
时,我发现使用SST-2数据集,准确性:
model1
仅实现0.8073
model2
实现的0.901
如果它们的行为相同,我希望结果有些相似,但是10%的下降是显着的下降,因此我相信HA会有差异在功能之间。该方法差异的背后有原因(例如model1
尚未应用超参数搜索),并且有没有办法使这两个功能的行为相同?谢谢你!
num_labels = 3 if task.startswith("mnli") else 1 if task=="stsb" else 2
preconfig = DistilBertConfig(n_layers=6)
model1 = AutoModelForSequenceClassification.from_config(preconfig)
model2 = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)
I am modifying this code (modified code is provided above) to test DistilBERT transformer layer depth size via from_config
since from my knowledge from_pretrained
uses 6 layers because in the paper section 3 they said:
we initialize the student from the teacher by taking one layer out of two
While what I want to test is various sizes of layers. To test whether both are the same, I tried running the from_config
with n_layers=6
because based on the documentation DistilBertConfig the n_layers
is used to determine the transformer block depth. However as I run model1
and model2
I found that with SST-2 dataset, in accuracy:
model1
achieved only0.8073
model2
achieved0.901
If they both behave the same I expect the result to be somewhat similar but 10% drop is a significant drop, therefore I believe there ha to be a difference between the functions. Is there a reason behind the difference of the approach (for example model1
has not yet applied hyperparameter search) and is there a way to make both functions behave the same? Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您描述的两个函数,
from_config
和from_pretrentaining
,行为不相同。对于模型m,具有参考r:from_config
允许您实例化a blank 模型,该模型具有与您选择的模型相同的配置(相同的形状):M在训练from_pretretaining
允许您加载A 预读模型之前,是 ,该模型已经在特定数据集上进行了培训时期:M训练后的r 。引用文档,
注意:从其配置文件中加载模型不会加载模型权重。它仅影响模型的配置。使用from_pretrated()加载模型权重。
The two functions you described,
from_config
andfrom_pretrained
, do not behave the same. For a model M, with a reference R:from_config
allows you to instantiate a blank model, which has the same configuration (the same shape) as your model of choice: M is as R was before trainingfrom_pretrained
allows you to load a pretrained model, which has already been trained on a specific dataset for a given number of epochs: M is as R after training.To cite the doc,
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.