如何使用HF中的TF模型构建用于BERT和变体的蒸馏器类

发布于 2025-02-11 01:15:59 字数 4866 浏览 1 评论 0原文

我正在尝试使用拥抱脸的TF模型来建立一个“蒸馏”类，以进行知识蒸馏。我从此试图对其进行修改。

我构建了一个看起来像这样的数据集：

DatasetDict({
train: Dataset({
    features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 512
})
validation: Dataset({
    features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 128
})
test: Dataset({
    features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 160
})})

并在我收到的“火车”上调用功能：

{'labels': ClassLabel(num_classes=4, names=['A', 'B', 'C', 'D'], id=None), 'text': Value(dtype='string', id=None)}

我从中创建了火车（但也有效）如下：

tf_train_dataset = my_tokenized_dataset["train"].to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
label_cols=["labels"],
shuffle=True,
collate_fn=data_collator,
batch_size=8,)

假设正确（我遵循 hf教程），我打电话给

teacher = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=4) 
student = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=4) 
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) 
teacher.compile(optimizer=opt, loss=loss, metrics=["accuracy"]) # where opt is Adam with a lr_scheduler

并安装了3个时代的教师模型，其验证精度约为95％。

从TF教程中留下不变的蒸馏类，当需要计算Student_Loss时出现问题：详细说明，我检查了这一点

x = {'input_ids': <tf.Tensor 'IteratorGetNext:1' shape=(None, None) dtype=int64>, 'token_type_ids': <tf.Tensor 'IteratorGetNext:2' shape=(None, None) dtype=int64>, 'attention_mask': <tf.Tensor 'IteratorGetNext:0' shape=(None, None) dtype=int64>}

，

y = Tensor("IteratorGetNext:3", shape=(None,), dtype=int64)

虽然

Teacher predictions: TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor 'tf_bert_for_sequence_classification_1/classifier/BiasAdd:0' shape=(None, 4) dtype=float32>, hidden_states=None, attentions=None)
Student predictions: TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor 'tf_bert_for_sequence_classification_2/classifier/BiasAdd:0' shape=(None, 4) dtype=float32>, hidden_states=None, attentions=None)

错误是以下内容：

---------------------------------------------------------------------------
TypeError                                 Traceback         (most recent call last)
<ipython-input-62-a15be00a8e5e> in <module>()
11 
12 # Distill teacher to student
---> 13 distiller.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
14 
15 # Evaluate student on test dataset

1 frames
/usr/local/lib/python3.7/dist-   packages/keras/utils/traceback_utils.py in     error_handler(*args, **kwargs)
65     except Exception as e:  # pylint: disable=broad-except
66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
68     finally:
69       del filtered_tb

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1145           except Exception as e:  # pylint:disable=broad-except
1146             if hasattr(e, "ag_error_metadata"):
-> 1147               raise e.ag_error_metadata.to_exception(e)
1148             else:
1149               raise

TypeError: in user code:

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function  *
    return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step  **
    outputs = model.train_step(data)
File "<ipython-input-61-219884772d38>", line 52, in train_step
    student_loss = self.student_loss_fn(y, student_predictions)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 141, in __call__
    losses = call_fn(y_true, y_pred)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 245, in call  **
    return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 1860, in sparse_categorical_crossentropy
    y_pred = tf.convert_to_tensor(y_pred)

TypeError: Expected any non-tensor type, but got a tensor instead.

如果问题和错误是微不足道的，我深表歉意，我仍然有一个不幸的是要学习很多东西。更好的是，如果您可以给我一个逐步的答案而又没有任何理所当然的答案，也许我更容易理解。

事先感谢大家。

原文

I'm trying to build a "Distillation" class for Knowledge Distillation using TF models from Hugging Face. I started with this and tried to modify it.

I built a dataset that looks like this:

DatasetDict({
train: Dataset({
    features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 512
})
validation: Dataset({
    features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 128
})
test: Dataset({
    features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 160
})})

and calling features on the "train" I receive:

{'labels': ClassLabel(num_classes=4, names=['A', 'B', 'C', 'D'], id=None), 'text': Value(dtype='string', id=None)}

from which I created train (but also valid and test) as follows:

tf_train_dataset = my_tokenized_dataset["train"].to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
label_cols=["labels"],
shuffle=True,
collate_fn=data_collator,
batch_size=8,)

Supposing this correct (I followed HF tutorials), I called

teacher = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=4) 
student = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=4) 
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) 
teacher.compile(optimizer=opt, loss=loss, metrics=["accuracy"]) # where opt is Adam with a lr_scheduler

and fitted the teacher model in 3 epochs with about 95% validation accuracy.

Leaving unchanged the Distill class from the TF tutorial, the problem arise when it is needed to calculate the student_loss: in detail, I checked that

x = {'input_ids': <tf.Tensor 'IteratorGetNext:1' shape=(None, None) dtype=int64>, 'token_type_ids': <tf.Tensor 'IteratorGetNext:2' shape=(None, None) dtype=int64>, 'attention_mask': <tf.Tensor 'IteratorGetNext:0' shape=(None, None) dtype=int64>}

and

y = Tensor("IteratorGetNext:3", shape=(None,), dtype=int64)

while

Teacher predictions: TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor 'tf_bert_for_sequence_classification_1/classifier/BiasAdd:0' shape=(None, 4) dtype=float32>, hidden_states=None, attentions=None)
Student predictions: TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor 'tf_bert_for_sequence_classification_2/classifier/BiasAdd:0' shape=(None, 4) dtype=float32>, hidden_states=None, attentions=None)

The error is the following:

---------------------------------------------------------------------------
TypeError                                 Traceback         (most recent call last)
<ipython-input-62-a15be00a8e5e> in <module>()
11 
12 # Distill teacher to student
---> 13 distiller.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
14 
15 # Evaluate student on test dataset

1 frames
/usr/local/lib/python3.7/dist-   packages/keras/utils/traceback_utils.py in     error_handler(*args, **kwargs)
65     except Exception as e:  # pylint: disable=broad-except
66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
68     finally:
69       del filtered_tb

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1145           except Exception as e:  # pylint:disable=broad-except
1146             if hasattr(e, "ag_error_metadata"):
-> 1147               raise e.ag_error_metadata.to_exception(e)
1148             else:
1149               raise

TypeError: in user code:

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function  *
    return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step  **
    outputs = model.train_step(data)
File "<ipython-input-61-219884772d38>", line 52, in train_step
    student_loss = self.student_loss_fn(y, student_predictions)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 141, in __call__
    losses = call_fn(y_true, y_pred)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 245, in call  **
    return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 1860, in sparse_categorical_crossentropy
    y_pred = tf.convert_to_tensor(y_pred)

TypeError: Expected any non-tensor type, but got a tensor instead.

I apologize if the questions and mistakes are trivial, I still have a lot to learn unfortunately. Better yet, if you can give me a step-by-step answer without taking anything for granted maybe it will be easier for me to understand.

Thanks in advance to everyone.

分享到QQ

分享到微博