多个序列上的正向输出是错误的

发布于 2025-01-27 07:46:13 字数 2851 浏览 2 评论 0原文

我正在使用t5将多个序列汇总为批次。在这里,我想生成model.generate(input_ids)的输出,通过调用forward函数型号(** inputs))。我知道forward()generate()工作完全不同请参阅此< /a>。使他们以相同的方式工作。我在它们上进行一些序列并调用model.generate()以生成相应的输出并获得(text摘要)对。现在,每次生成相同的输出时,在这些对上调用forward函数。但是,当在一批序列上调用正向函数时,输出不相同吗? 我错过了

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
model.resize_token_embeddings(len(tokenizer))
model.to("cuda")
model.eval()

# sequences
seq1 = "summarize: Calling the model (which means the forward method) uses the labels for teacher forcing. This means inputs to the decoder are the labels shifted by one"
output1 = "calling the model uses the labels for teacher forcing. inputs to the decoder"

seq2 = "summarize: When you call the generate method, the model is used in the autoregressive fashion"
output2 = "the model is used in the auto-aggressive fashion."

seq3 = "summarize: However, selecting the token is a hard decision, and the gradient cannot be propagated through this decision"
output3 = "the token is a hard decision, and the gradient cannot be propagated through this decision"

input_sequences = [seq1, seq2, seq3]
output_seq = [output1, output2, output3]

# encoding input and attention mask
encoding = tokenizer(
    input_sequences,
    padding="longest",
    max_length=128,
    truncation=True,
    return_tensors="pt",
)

input_ids, attention_mask = encoding.input_ids.to("cuda"), encoding.attention_mask.to("cuda")

# labels
target_encoding = tokenizer(
    output_seq, padding="longest", max_length=128, truncation=True
)
labels = target_encoding.input_ids
labels = torch.tensor(labels).to("cuda")
labels[labels == tokenizer.pad_token_id] = -100

# Call the models
logits = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels).logits

# Apply softamx() and batch_decode()

X = logits
X = F.softmax(X, dim=-1)
ids = X.argmax(dim=-1)
y = tokenizer.batch_decode(sequences=ids, skip_special_tokens=True)

# results: batch_size=3

['call the model uses the labels for teacher forcing  inputs to the decoder are',
 'the model is used in the auto-aggressive fashion  the the the',
 'the token is a hard decision, and the gradient cannot be propagated through this decision ']

# results: batch_size =1 i.e. consider 1 seq each time

['call the model uses the labels for teacher forcing  inputs to the decoder are']

['the model is used in the auto-aggressive fashion ']

['the token is a hard decision, and the gradient cannot be propagated through this decision ']

I am using T5 to summarize multiple sequences as a batch. Here I want to generate the output of model.generate(input_ids) by calling forward function (model(**inputs)). I know that forward() and generate() work completely different see this. To make them working the same way. I take some sequences and call model.generate() on them to generate the corresponding outputs and get pairs of (text, summary). Now, Calling the forward function on these pairs one each time generates the same outputs. However, when calling the forward function on batch of sequences, the output is not the same ? What I missed ?

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
model.resize_token_embeddings(len(tokenizer))
model.to("cuda")
model.eval()

# sequences
seq1 = "summarize: Calling the model (which means the forward method) uses the labels for teacher forcing. This means inputs to the decoder are the labels shifted by one"
output1 = "calling the model uses the labels for teacher forcing. inputs to the decoder"

seq2 = "summarize: When you call the generate method, the model is used in the autoregressive fashion"
output2 = "the model is used in the auto-aggressive fashion."

seq3 = "summarize: However, selecting the token is a hard decision, and the gradient cannot be propagated through this decision"
output3 = "the token is a hard decision, and the gradient cannot be propagated through this decision"

input_sequences = [seq1, seq2, seq3]
output_seq = [output1, output2, output3]

# encoding input and attention mask
encoding = tokenizer(
    input_sequences,
    padding="longest",
    max_length=128,
    truncation=True,
    return_tensors="pt",
)

input_ids, attention_mask = encoding.input_ids.to("cuda"), encoding.attention_mask.to("cuda")

# labels
target_encoding = tokenizer(
    output_seq, padding="longest", max_length=128, truncation=True
)
labels = target_encoding.input_ids
labels = torch.tensor(labels).to("cuda")
labels[labels == tokenizer.pad_token_id] = -100

# Call the models
logits = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels).logits

# Apply softamx() and batch_decode()

X = logits
X = F.softmax(X, dim=-1)
ids = X.argmax(dim=-1)
y = tokenizer.batch_decode(sequences=ids, skip_special_tokens=True)

# results: batch_size=3

['call the model uses the labels for teacher forcing  inputs to the decoder are',
 'the model is used in the auto-aggressive fashion  the the the',
 'the token is a hard decision, and the gradient cannot be propagated through this decision ']

# results: batch_size =1 i.e. consider 1 seq each time

['call the model uses the labels for teacher forcing  inputs to the decoder are']

['the model is used in the auto-aggressive fashion ']

['the token is a hard decision, and the gradient cannot be propagated through this decision ']

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文