from_config和在huggingface中的差异

发布于 2025-02-09 03:13:06 字数 1315 浏览 2 评论 0原文

num_labels = 3 if task.startswith("mnli") else 1 if task=="stsb" else 2
preconfig = DistilBertConfig(n_layers=6)
    
model1 = AutoModelForSequenceClassification.from_config(preconfig)
model2 = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

我正在修改（上面提供了修改的代码），以通过from_config来测试Distilbert变形金刚层的深度大小，因为从我的知识中from_pretretaining使用6层，因为在论文第3部分中，他们说：

我们通过将两层
中的一层从老师那里初始化。

而我要测试的是各种尺寸的层。要测试两者是否相同，我尝试运行from_config 使用n_layers = 6因为基于文档 n_layers用于确定变压器块深度。但是，当我运行model1和model2时，我发现使用SST-2数据集，准确性：

model1仅实现0.8073
model2实现的0.901

如果它们的行为相同，我希望结果有些相似，但是10％的下降是显着的下降，因此我相信HA会有差异在功能之间。该方法差异的背后有原因（例如model1尚未应用超参数搜索），并且有没有办法使这两个功能的行为相同？谢谢你！

原文

num_labels = 3 if task.startswith("mnli") else 1 if task=="stsb" else 2
preconfig = DistilBertConfig(n_layers=6)
    
model1 = AutoModelForSequenceClassification.from_config(preconfig)
model2 = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

I am modifying this code (modified code is provided above) to test DistilBERT transformer layer depth size via from_config since from my knowledge from_pretrained uses 6 layers because in the paper section 3 they said:

we initialize the student from the teacher by taking one layer out of two

While what I want to test is various sizes of layers. To test whether both are the same, I tried running the from_config
with n_layers=6 because based on the documentation DistilBertConfig the n_layers is used to determine the transformer block depth. However as I run model1 and model2 I found that with SST-2 dataset, in accuracy:

model1 achieved only 0.8073
model2 achieved 0.901

If they both behave the same I expect the result to be somewhat similar but 10% drop is a significant drop, therefore I believe there ha to be a difference between the functions. Is there a reason behind the difference of the approach (for example model1 has not yet applied hyperparameter search) and is there a way to make both functions behave the same? Thank you!

分享到QQ

分享到微博