如何使用HF中的TF模型构建用于BERT和变体的蒸馏器类
我正在尝试使用拥抱脸的TF模型来建立一个“蒸馏”类,以进行知识蒸馏。我从此试图对其进行修改。
我构建了一个看起来像这样的数据集:
DatasetDict({
train: Dataset({
features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 512
})
validation: Dataset({
features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 128
})
test: Dataset({
features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 160
})})
并在我收到的“火车”上调用功能:
{'labels': ClassLabel(num_classes=4, names=['A', 'B', 'C', 'D'], id=None), 'text': Value(dtype='string', id=None)}
我从中创建了火车(但也有效)如下:
tf_train_dataset = my_tokenized_dataset["train"].to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
label_cols=["labels"],
shuffle=True,
collate_fn=data_collator,
batch_size=8,)
假设正确(我遵循 hf教程),我打电话给
teacher = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=4)
student = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=4)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
teacher.compile(optimizer=opt, loss=loss, metrics=["accuracy"]) # where opt is Adam with a lr_scheduler
并安装了3个时代的教师模型,其验证精度约为95%。
从TF教程中留下不变的蒸馏类,当需要计算Student_Loss时出现问题:详细说明,我检查了这一点
x = {'input_ids': <tf.Tensor 'IteratorGetNext:1' shape=(None, None) dtype=int64>, 'token_type_ids': <tf.Tensor 'IteratorGetNext:2' shape=(None, None) dtype=int64>, 'attention_mask': <tf.Tensor 'IteratorGetNext:0' shape=(None, None) dtype=int64>}
,
y = Tensor("IteratorGetNext:3", shape=(None,), dtype=int64)
虽然
Teacher predictions: TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor 'tf_bert_for_sequence_classification_1/classifier/BiasAdd:0' shape=(None, 4) dtype=float32>, hidden_states=None, attentions=None)
Student predictions: TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor 'tf_bert_for_sequence_classification_2/classifier/BiasAdd:0' shape=(None, 4) dtype=float32>, hidden_states=None, attentions=None)
错误是以下内容:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-a15be00a8e5e> in <module>()
11
12 # Distill teacher to student
---> 13 distiller.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
14
15 # Evaluate student on test dataset
1 frames
/usr/local/lib/python3.7/dist- packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1145 except Exception as e: # pylint:disable=broad-except
1146 if hasattr(e, "ag_error_metadata"):
-> 1147 raise e.ag_error_metadata.to_exception(e)
1148 else:
1149 raise
TypeError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "<ipython-input-61-219884772d38>", line 52, in train_step
student_loss = self.student_loss_fn(y, student_predictions)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 141, in __call__
losses = call_fn(y_true, y_pred)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 245, in call **
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 1860, in sparse_categorical_crossentropy
y_pred = tf.convert_to_tensor(y_pred)
TypeError: Expected any non-tensor type, but got a tensor instead.
如果问题和错误是微不足道的,我深表歉意,我仍然有一个不幸的是要学习很多东西。更好的是,如果您可以给我一个逐步的答案而又没有任何理所当然的答案,也许我更容易理解。
事先感谢大家。
I'm trying to build a "Distillation" class for Knowledge Distillation using TF models from Hugging Face. I started with this and tried to modify it.
I built a dataset that looks like this:
DatasetDict({
train: Dataset({
features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 512
})
validation: Dataset({
features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 128
})
test: Dataset({
features: ['text', 'labels', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 160
})})
and calling features on the "train" I receive:
{'labels': ClassLabel(num_classes=4, names=['A', 'B', 'C', 'D'], id=None), 'text': Value(dtype='string', id=None)}
from which I created train (but also valid and test) as follows:
tf_train_dataset = my_tokenized_dataset["train"].to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
label_cols=["labels"],
shuffle=True,
collate_fn=data_collator,
batch_size=8,)
Supposing this correct (I followed HF tutorials), I called
teacher = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=4)
student = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=4)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
teacher.compile(optimizer=opt, loss=loss, metrics=["accuracy"]) # where opt is Adam with a lr_scheduler
and fitted the teacher model in 3 epochs with about 95% validation accuracy.
Leaving unchanged the Distill class from the TF tutorial, the problem arise when it is needed to calculate the student_loss: in detail, I checked that
x = {'input_ids': <tf.Tensor 'IteratorGetNext:1' shape=(None, None) dtype=int64>, 'token_type_ids': <tf.Tensor 'IteratorGetNext:2' shape=(None, None) dtype=int64>, 'attention_mask': <tf.Tensor 'IteratorGetNext:0' shape=(None, None) dtype=int64>}
and
y = Tensor("IteratorGetNext:3", shape=(None,), dtype=int64)
while
Teacher predictions: TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor 'tf_bert_for_sequence_classification_1/classifier/BiasAdd:0' shape=(None, 4) dtype=float32>, hidden_states=None, attentions=None)
Student predictions: TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor 'tf_bert_for_sequence_classification_2/classifier/BiasAdd:0' shape=(None, 4) dtype=float32>, hidden_states=None, attentions=None)
The error is the following:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-a15be00a8e5e> in <module>()
11
12 # Distill teacher to student
---> 13 distiller.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
14
15 # Evaluate student on test dataset
1 frames
/usr/local/lib/python3.7/dist- packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1145 except Exception as e: # pylint:disable=broad-except
1146 if hasattr(e, "ag_error_metadata"):
-> 1147 raise e.ag_error_metadata.to_exception(e)
1148 else:
1149 raise
TypeError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "<ipython-input-61-219884772d38>", line 52, in train_step
student_loss = self.student_loss_fn(y, student_predictions)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 141, in __call__
losses = call_fn(y_true, y_pred)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 245, in call **
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 1860, in sparse_categorical_crossentropy
y_pred = tf.convert_to_tensor(y_pred)
TypeError: Expected any non-tensor type, but got a tensor instead.
I apologize if the questions and mistakes are trivial, I still have a lot to learn unfortunately. Better yet, if you can give me a step-by-step answer without taking anything for granted maybe it will be easier for me to understand.
Thanks in advance to everyone.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论