调谐后,无法将longformer模型加载到张量
我在Google CoLab上微调了tensorflow 2.8.2的长形模型“> 1 ]。我正在尝试对文本进行二进制分类,我有一个数据集,该数据集具有标记为相关(1)和无关(0)的文本。这是我这样做的代码:
tokenizer_path = "allenai/longformer-base-4096"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
model_path = "allenai/longformer-base-4096"
config = AutoConfig.from_pretrained(model_path)
config.attention_window = 128
config.num_labels = 2
config.id2label = {0: "irrelevant", 1: "relevant"}
longformer = TFAutoModel.from_pretrained(model_path, config=config)
def tokenize(texts):
tokenized_texts = tokenizer(texts, truncation=True, padding=True, return_tensors="np")
return tokenized_texts
def format_input(texts, labels):
inputs = tokenize(texts).data
labels = np.asarray(labels).astype('float16').reshape((-1,1))
return inputs, labels
train_inputs, train_labels = format_input(train_texts, train_labels) # train_texts is an array of strings and train_labels is an array containing 0 and 1, same for val and test sets
val_inputs, val_labels = format_input(val_texts, val_labels)
test_inputs, test_labels = format_input(test_texts, test_labels)
# Model layers
input_ids = tf.keras.layers.Input((None,), dtype=np.int32, name="input_ids")
attention_mask = tf.keras.layers.Input((None,), dtype=np.int32, name="attention_mask")
longformer_output = longformer.longformer(input_ids=input_ids, attention_mask=attention_mask)
cls_output = longformer_output["last_hidden_state"][:,0,:]
hidden = tf.keras.layers.Dense(32, activation="tanh")(cls_output)
output = tf.keras.layers.Dense(1, activation="sigmoid")(hidden)
model = tf.keras.Model(inputs=[input_ids, attention_mask], outputs=[output])
loss = tf.keras.losses.BinaryCrossentropy()
metrics = [
tf.metrics.BinaryAccuracy(),
]
# Check if layer 2 is the Longformer layer.
print(model.layers[2])
# Freeze the Longformer layer
model.layers[2].trainable = False
epochs = 100
steps_per_epoch = len(train_labels)
num_train_steps = steps_per_epoch * epochs
num_warmup_steps = int(0.1*num_train_steps)
init_lr = 3e-5
optimizer = optimization.create_optimizer(init_lr=init_lr,
num_train_steps=num_train_steps,
num_warmup_steps=num_warmup_steps,
optimizer_type='adamw')
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=0)
history = model.fit(
x=train_inputs,
y=train_labels,
batch_size=1,
validation_batch_size=1,
validation_data=(val_inputs, val_labels),
epochs=epochs,
class_weight=class_weight,
steps_per_epoch=steps_per_epoch,
callbacks=[
early_stopping_callback,
]
)
trained_model_save_path = "tf_longformer_cls"
model.save(trained_model_save_path, include_optimizer=False)
这是模型摘要:
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_ids (InputLayer) [(None, 4096)] 0 []
attention_mask (InputLayer) [(None, 4096)] 0 []
longformer (TFLongformerMainLa multiple 148659456 ['input_ids[0][0]',
yer) 'attention_mask[0][0]']
tf.__operators__.getitem_1 (Sl (None, 768) 0 ['longformer[1][0]']
icingOpLambda)
dense_2 (Dense) (None, 32) 24608 ['tf.__operators__.getitem_1[0][0
]']
dense_3 (Dense) (None, 1) 33 ['dense_2[0][0]']
==================================================================================================
Total params: 148,684,097
Trainable params: 148,684,097
Non-trainable params: 0
对其进行微调后,我将其保存到Google Drive,但是当这样加载时,
new_model = tf.keras.models.load_model(trained_model_save_path)
我会收到此错误:
ValueError: The two structures don't have the same nested structure.
First structure: type=tuple str=(({'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids'), 'global_attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None), 'attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None)}, None, None, None, None, None, None, None, None, None, False), {})
Second structure: type=tuple str=((TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids'), TensorSpec(shape=(None, None), dtype=tf.int32, name='attention_mask'), None, None, None, None, None, None, None, None, False), {})
More specifically: Substructure "type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids'), 'global_attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None), 'attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None)}" is a sequence, while substructure "type=TensorSpec str=TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids')" is not
Entire first structure:
(({'input_ids': ., 'global_attention_mask': ., 'attention_mask': .}, ., ., ., ., ., ., ., ., ., .), {})
Entire second structure:
((., ., ., ., ., ., ., ., ., ., .), {})
因此,从我的理解中,我的理解有问题输入的形状。我在这里搜索了一些谷歌搜索,并在这里找到了类似的问题[ 2 ],在哪里他们通过指定这样的模型层来解决它:base_output = base_model.bert([ids,bask,token_type_ids]))
而不是base_output = base_output = base_model([ids,token_type_ids
longformer_output = longformer.longformer(input_ids = input_ids,activation_mask = coadivation_mask)
to longformer_output = longformer.longformer.longformer.longformer.longformer([intput_ids,apective_mask]
OperatorNotAllowedInGraphError: Exception encountered when calling layer "longformer" (type TFLongformerMainLayer).
using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
Call arguments received:
• args=(['tf.Tensor(shape=(None, None), dtype=int32)', 'tf.Tensor(shape=(None, None), dtype=int32)'],)
• kwargs={'training': 'False'}
。在训练阶段,我无法使用大于1的批次尺寸,否则我会得到OOM
。
I fine-tuned a longformer model in Tensorflow 2.8.2 on Google Colab starting from a pretrained longformer from Hugging Face [1]. I am trying to do a binary classification on texts, I have a dataset with texts marked as relevant (1) and irrelevant (0). This is my code for doing this:
tokenizer_path = "allenai/longformer-base-4096"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
model_path = "allenai/longformer-base-4096"
config = AutoConfig.from_pretrained(model_path)
config.attention_window = 128
config.num_labels = 2
config.id2label = {0: "irrelevant", 1: "relevant"}
longformer = TFAutoModel.from_pretrained(model_path, config=config)
def tokenize(texts):
tokenized_texts = tokenizer(texts, truncation=True, padding=True, return_tensors="np")
return tokenized_texts
def format_input(texts, labels):
inputs = tokenize(texts).data
labels = np.asarray(labels).astype('float16').reshape((-1,1))
return inputs, labels
train_inputs, train_labels = format_input(train_texts, train_labels) # train_texts is an array of strings and train_labels is an array containing 0 and 1, same for val and test sets
val_inputs, val_labels = format_input(val_texts, val_labels)
test_inputs, test_labels = format_input(test_texts, test_labels)
# Model layers
input_ids = tf.keras.layers.Input((None,), dtype=np.int32, name="input_ids")
attention_mask = tf.keras.layers.Input((None,), dtype=np.int32, name="attention_mask")
longformer_output = longformer.longformer(input_ids=input_ids, attention_mask=attention_mask)
cls_output = longformer_output["last_hidden_state"][:,0,:]
hidden = tf.keras.layers.Dense(32, activation="tanh")(cls_output)
output = tf.keras.layers.Dense(1, activation="sigmoid")(hidden)
model = tf.keras.Model(inputs=[input_ids, attention_mask], outputs=[output])
loss = tf.keras.losses.BinaryCrossentropy()
metrics = [
tf.metrics.BinaryAccuracy(),
]
# Check if layer 2 is the Longformer layer.
print(model.layers[2])
# Freeze the Longformer layer
model.layers[2].trainable = False
epochs = 100
steps_per_epoch = len(train_labels)
num_train_steps = steps_per_epoch * epochs
num_warmup_steps = int(0.1*num_train_steps)
init_lr = 3e-5
optimizer = optimization.create_optimizer(init_lr=init_lr,
num_train_steps=num_train_steps,
num_warmup_steps=num_warmup_steps,
optimizer_type='adamw')
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=0)
history = model.fit(
x=train_inputs,
y=train_labels,
batch_size=1,
validation_batch_size=1,
validation_data=(val_inputs, val_labels),
epochs=epochs,
class_weight=class_weight,
steps_per_epoch=steps_per_epoch,
callbacks=[
early_stopping_callback,
]
)
trained_model_save_path = "tf_longformer_cls"
model.save(trained_model_save_path, include_optimizer=False)
This is the model summary:
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_ids (InputLayer) [(None, 4096)] 0 []
attention_mask (InputLayer) [(None, 4096)] 0 []
longformer (TFLongformerMainLa multiple 148659456 ['input_ids[0][0]',
yer) 'attention_mask[0][0]']
tf.__operators__.getitem_1 (Sl (None, 768) 0 ['longformer[1][0]']
icingOpLambda)
dense_2 (Dense) (None, 32) 24608 ['tf.__operators__.getitem_1[0][0
]']
dense_3 (Dense) (None, 1) 33 ['dense_2[0][0]']
==================================================================================================
Total params: 148,684,097
Trainable params: 148,684,097
Non-trainable params: 0
After fine-tuning it, I saved it to Google Drive, but when loading it like this:
new_model = tf.keras.models.load_model(trained_model_save_path)
I get this error:
ValueError: The two structures don't have the same nested structure.
First structure: type=tuple str=(({'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids'), 'global_attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None), 'attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None)}, None, None, None, None, None, None, None, None, None, False), {})
Second structure: type=tuple str=((TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids'), TensorSpec(shape=(None, None), dtype=tf.int32, name='attention_mask'), None, None, None, None, None, None, None, None, False), {})
More specifically: Substructure "type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids'), 'global_attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None), 'attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None)}" is a sequence, while substructure "type=TensorSpec str=TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids')" is not
Entire first structure:
(({'input_ids': ., 'global_attention_mask': ., 'attention_mask': .}, ., ., ., ., ., ., ., ., ., .), {})
Entire second structure:
((., ., ., ., ., ., ., ., ., ., .), {})
So, from my understanding, there is something wrong with the shapes of the inputs. I googled around a bit and found a similar issue for bert here [2], where they solved it by specifying the model layer like this: base_output = base_model.bert([ids, mask, token_type_ids])
instead of base_output = base_model([ids, mask, token_type_ids])
. So, I tried something similar, and changed the line: longformer_output = longformer.longformer(input_ids=input_ids, attention_mask=attention_mask)
to longformer_output = longformer.longformer([input_ids, attention_mask]
. But now I am getting this error when trying to save the model:
OperatorNotAllowedInGraphError: Exception encountered when calling layer "longformer" (type TFLongformerMainLayer).
using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
Call arguments received:
• args=(['tf.Tensor(shape=(None, None), dtype=int32)', 'tf.Tensor(shape=(None, None), dtype=int32)'],)
• kwargs={'training': 'False'}
Do you have any idea on why this happens? I want to save the model first before evaluating or doing anything else with it, because it occupies the whole GPU RAM during the training phase, I cannot use a batch size bigger than 1 when training, else I get an OOM.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论