调谐后，无法将longformer模型加载到张量

发布于 2025-02-07 15:13:08 字数 7509 浏览 1 评论 0原文

我在Google CoLab上微调了tensorflow 2.8.2的长形模型“> 1 ]。我正在尝试对文本进行二进制分类，我有一个数据集，该数据集具有标记为相关（1）和无关（0）的文本。这是我这样做的代码：

tokenizer_path = "allenai/longformer-base-4096"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)

model_path = "allenai/longformer-base-4096"
config = AutoConfig.from_pretrained(model_path)

config.attention_window = 128
config.num_labels = 2
config.id2label = {0: "irrelevant", 1: "relevant"}

longformer = TFAutoModel.from_pretrained(model_path, config=config)

def tokenize(texts):
    tokenized_texts = tokenizer(texts, truncation=True, padding=True, return_tensors="np")
    return tokenized_texts

def format_input(texts, labels):
    inputs = tokenize(texts).data
    labels = np.asarray(labels).astype('float16').reshape((-1,1))

    return inputs, labels

train_inputs, train_labels = format_input(train_texts, train_labels) # train_texts is an array of strings and train_labels is an array containing 0 and 1, same for val and test sets
val_inputs, val_labels = format_input(val_texts, val_labels)
test_inputs, test_labels = format_input(test_texts, test_labels)

# Model layers
input_ids = tf.keras.layers.Input((None,), dtype=np.int32, name="input_ids")
attention_mask = tf.keras.layers.Input((None,), dtype=np.int32, name="attention_mask")

longformer_output = longformer.longformer(input_ids=input_ids, attention_mask=attention_mask)
cls_output = longformer_output["last_hidden_state"][:,0,:]

hidden = tf.keras.layers.Dense(32, activation="tanh")(cls_output)
output = tf.keras.layers.Dense(1, activation="sigmoid")(hidden)
model = tf.keras.Model(inputs=[input_ids, attention_mask], outputs=[output])

loss = tf.keras.losses.BinaryCrossentropy()

metrics = [
    tf.metrics.BinaryAccuracy(),
]

# Check if layer 2 is the Longformer layer.
print(model.layers[2])

# Freeze the Longformer layer
model.layers[2].trainable = False

epochs = 100

steps_per_epoch = len(train_labels)
num_train_steps = steps_per_epoch * epochs
num_warmup_steps = int(0.1*num_train_steps)

init_lr = 3e-5
optimizer = optimization.create_optimizer(init_lr=init_lr,
                                          num_train_steps=num_train_steps,
                                          num_warmup_steps=num_warmup_steps,
                                          optimizer_type='adamw')

model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=0)

history = model.fit(
    x=train_inputs,
    y=train_labels,
    batch_size=1,
    validation_batch_size=1,
    validation_data=(val_inputs, val_labels),
    epochs=epochs,
    class_weight=class_weight,
    steps_per_epoch=steps_per_epoch,
    callbacks=[
        early_stopping_callback,
    ]
)

trained_model_save_path = "tf_longformer_cls"
model.save(trained_model_save_path, include_optimizer=False)

这是模型摘要：

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_ids (InputLayer)         [(None, 4096)]       0           []                               
                                                                                                  
 attention_mask (InputLayer)    [(None, 4096)]       0           []                               
                                                                                                  
 longformer (TFLongformerMainLa  multiple            148659456   ['input_ids[0][0]',              
 yer)                                                             'attention_mask[0][0]']         
                                                                                                  
 tf.__operators__.getitem_1 (Sl  (None, 768)         0           ['longformer[1][0]']             
 icingOpLambda)                                                                                   
                                                                                                  
 dense_2 (Dense)                (None, 32)           24608       ['tf.__operators__.getitem_1[0][0
                                                                 ]']                              
                                                                                                  
 dense_3 (Dense)                (None, 1)            33          ['dense_2[0][0]']                
                                                                                                  
==================================================================================================
Total params: 148,684,097
Trainable params: 148,684,097
Non-trainable params: 0

对其进行微调后，我将其保存到Google Drive，但是当这样加载时，

new_model = tf.keras.models.load_model(trained_model_save_path)

我会收到此错误：

ValueError: The two structures don't have the same nested structure.

First structure: type=tuple str=(({'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids'), 'global_attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None), 'attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None)}, None, None, None, None, None, None, None, None, None, False), {})

Second structure: type=tuple str=((TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids'), TensorSpec(shape=(None, None), dtype=tf.int32, name='attention_mask'), None, None, None, None, None, None, None, None, False), {})

More specifically: Substructure "type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids'), 'global_attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None), 'attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None)}" is a sequence, while substructure "type=TensorSpec str=TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids')" is not
Entire first structure:
(({'input_ids': ., 'global_attention_mask': ., 'attention_mask': .}, ., ., ., ., ., ., ., ., ., .), {})
Entire second structure:
((., ., ., ., ., ., ., ., ., ., .), {})

因此，从我的理解中，我的理解有问题输入的形状。我在这里搜索了一些谷歌搜索，并在这里找到了类似的问题[ 2 ]，在哪里他们通过指定这样的模型层来解决它：base_output = base_model.bert（[ids，bask，token_type_ids]））而不是base_output = base_output = base_model（[ids，token_type_ids。因此，我尝试了类似的事情，然后更改了行：longformer_output = longformer.longformer（input_ids = input_ids，activation_mask = coadivation_mask） to longformer_output = longformer.longformer.longformer.longformer.longformer（[intput_ids，apective_mask]

OperatorNotAllowedInGraphError: Exception encountered when calling layer "longformer" (type TFLongformerMainLayer).

using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

Call arguments received:
  • args=(['tf.Tensor(shape=(None, None), dtype=int32)', 'tf.Tensor(shape=(None, None), dtype=int32)'],)
  • kwargs={'training': 'False'}

。在训练阶段，我无法使用大于1的批次尺寸，否则我会得到OOM

。

原文

I fine-tuned a longformer model in Tensorflow 2.8.2 on Google Colab starting from a pretrained longformer from Hugging Face [1]. I am trying to do a binary classification on texts, I have a dataset with texts marked as relevant (1) and irrelevant (0). This is my code for doing this:

tokenizer_path = "allenai/longformer-base-4096"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)

model_path = "allenai/longformer-base-4096"
config = AutoConfig.from_pretrained(model_path)

config.attention_window = 128
config.num_labels = 2
config.id2label = {0: "irrelevant", 1: "relevant"}

longformer = TFAutoModel.from_pretrained(model_path, config=config)

def tokenize(texts):
    tokenized_texts = tokenizer(texts, truncation=True, padding=True, return_tensors="np")
    return tokenized_texts

def format_input(texts, labels):
    inputs = tokenize(texts).data
    labels = np.asarray(labels).astype('float16').reshape((-1,1))

    return inputs, labels

train_inputs, train_labels = format_input(train_texts, train_labels) # train_texts is an array of strings and train_labels is an array containing 0 and 1, same for val and test sets
val_inputs, val_labels = format_input(val_texts, val_labels)
test_inputs, test_labels = format_input(test_texts, test_labels)

# Model layers
input_ids = tf.keras.layers.Input((None,), dtype=np.int32, name="input_ids")
attention_mask = tf.keras.layers.Input((None,), dtype=np.int32, name="attention_mask")

longformer_output = longformer.longformer(input_ids=input_ids, attention_mask=attention_mask)
cls_output = longformer_output["last_hidden_state"][:,0,:]

hidden = tf.keras.layers.Dense(32, activation="tanh")(cls_output)
output = tf.keras.layers.Dense(1, activation="sigmoid")(hidden)
model = tf.keras.Model(inputs=[input_ids, attention_mask], outputs=[output])

loss = tf.keras.losses.BinaryCrossentropy()

metrics = [
    tf.metrics.BinaryAccuracy(),
]

# Check if layer 2 is the Longformer layer.
print(model.layers[2])

# Freeze the Longformer layer
model.layers[2].trainable = False

epochs = 100

steps_per_epoch = len(train_labels)
num_train_steps = steps_per_epoch * epochs
num_warmup_steps = int(0.1*num_train_steps)

init_lr = 3e-5
optimizer = optimization.create_optimizer(init_lr=init_lr,
                                          num_train_steps=num_train_steps,
                                          num_warmup_steps=num_warmup_steps,
                                          optimizer_type='adamw')

model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

early_stopping_callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=0)

history = model.fit(
    x=train_inputs,
    y=train_labels,
    batch_size=1,
    validation_batch_size=1,
    validation_data=(val_inputs, val_labels),
    epochs=epochs,
    class_weight=class_weight,
    steps_per_epoch=steps_per_epoch,
    callbacks=[
        early_stopping_callback,
    ]
)

trained_model_save_path = "tf_longformer_cls"
model.save(trained_model_save_path, include_optimizer=False)

This is the model summary:

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_ids (InputLayer)         [(None, 4096)]       0           []                               
                                                                                                  
 attention_mask (InputLayer)    [(None, 4096)]       0           []                               
                                                                                                  
 longformer (TFLongformerMainLa  multiple            148659456   ['input_ids[0][0]',              
 yer)                                                             'attention_mask[0][0]']         
                                                                                                  
 tf.__operators__.getitem_1 (Sl  (None, 768)         0           ['longformer[1][0]']             
 icingOpLambda)                                                                                   
                                                                                                  
 dense_2 (Dense)                (None, 32)           24608       ['tf.__operators__.getitem_1[0][0
                                                                 ]']                              
                                                                                                  
 dense_3 (Dense)                (None, 1)            33          ['dense_2[0][0]']                
                                                                                                  
==================================================================================================
Total params: 148,684,097
Trainable params: 148,684,097
Non-trainable params: 0

After fine-tuning it, I saved it to Google Drive, but when loading it like this:

new_model = tf.keras.models.load_model(trained_model_save_path)

I get this error:

ValueError: The two structures don't have the same nested structure.

First structure: type=tuple str=(({'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids'), 'global_attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None), 'attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None)}, None, None, None, None, None, None, None, None, None, False), {})

Second structure: type=tuple str=((TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids'), TensorSpec(shape=(None, None), dtype=tf.int32, name='attention_mask'), None, None, None, None, None, None, None, None, False), {})

More specifically: Substructure "type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids'), 'global_attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None), 'attention_mask': TensorSpec(shape=(None, 5), dtype=tf.int32, name=None)}" is a sequence, while substructure "type=TensorSpec str=TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids')" is not
Entire first structure:
(({'input_ids': ., 'global_attention_mask': ., 'attention_mask': .}, ., ., ., ., ., ., ., ., ., .), {})
Entire second structure:
((., ., ., ., ., ., ., ., ., ., .), {})

So, from my understanding, there is something wrong with the shapes of the inputs. I googled around a bit and found a similar issue for bert here [2], where they solved it by specifying the model layer like this: base_output = base_model.bert([ids, mask, token_type_ids]) instead of base_output = base_model([ids, mask, token_type_ids]). So, I tried something similar, and changed the line: longformer_output = longformer.longformer(input_ids=input_ids, attention_mask=attention_mask) to longformer_output = longformer.longformer([input_ids, attention_mask]. But now I am getting this error when trying to save the model:

OperatorNotAllowedInGraphError: Exception encountered when calling layer "longformer" (type TFLongformerMainLayer).

using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

Call arguments received:
  • args=(['tf.Tensor(shape=(None, None), dtype=int32)', 'tf.Tensor(shape=(None, None), dtype=int32)'],)
  • kwargs={'training': 'False'}

Do you have any idea on why this happens? I want to save the model first before evaluating or doing anything else with it, because it occupies the whole GPU RAM during the training phase, I cannot use a batch size bigger than 1 when training, else I get an OOM.

Thanks!

分享到QQ

分享到微博