我自己实现的集束搜索每次都会得到不同的输出。但是当我在代码中添加延迟时就解决了。为什么?
我试图对使用 ONNX io 绑定运行的 T5 模型实现波束搜索。当我运行波束搜索时,它每次都会产生不同的输出。但是当我在令牌生成部分添加延迟时,它每次都会产生相同的输出。但延迟并不是解决方案,因为这个原因我正在从头开始实施波束搜索。下面是代码片段。
batch_size=4
num_beams=4
max_length=15
#Encoder prediction.
enc_out=t5_enc(input_ids=input_ids)
#First time prediction dec ids.
gen_dec_first=torch.zeros((batch_size,1),device="cuda",dtype=torch.long)
#Next sequence decoder ids representing the number of beams.
generated_dec = torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)
#Scores tensor to accomodate the log probabilities of the outputs chosen.
scores_tensor= torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)
#Preloop prediction.
#Predict for input_ids of batch_size.
dec_outs=t5_dec(gen_dec_first,enc_out)
#Select the top num_beams size tokens from each prediction.
top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)
#Append them as first prediction.
first_token=top_k_ele.indices.flatten().unsqueeze(0)
first_token=torch.transpose(first_token, 0, 1)
generated_dec=torch.cat((generated_dec,first_token),dim=1)
#Previous time step log probabilities.
prev_prob=top_k_ele.values.flatten().unsqueeze(0)
prev_prob=torch.transpose(prev_prob, 0, 1)
#Append the score first.
scores_tensor=torch.cat((scores_tensor,prev_prob),dim=1)
prev_prob=torch.tile(prev_prob,(1,num_beams))
#Repeat the encoder outputs for num_beams.
# enc_copy=enc_out.detach().clone()
# enc_out=torch.repeat_interleave(enc_out,torch.tensor([4,4,4,4],device="cuda"),dim=0)
enc_out=t5_enc(input_ids=input_ids.repeat_interleave(4, dim=0))
for i in range(max_length):
dec_outs=t5_dec(generated_dec,enc_out)
top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)
c_prob=torch.squeeze(top_k_ele.values,dim=1)
if i==0:
f_prob=torch.add(prev_prob,c_prob)
else:
prev_prob_t=torch.transpose(prev_prob.unsqueeze(0), 0, 1)
prev_prob_t=torch.tile(prev_prob_t,(1,num_beams))
f_prob=torch.add(prev_prob_t,c_prob)
f_probs_obj=f_prob.max(dim=1)
f_probs=f_probs_obj.values
f_indices=top_k_ele.indices[:,f_probs_obj.indices]
prev_prob=f_probs
#Append the score.
s_prob=prev_prob.unsqueeze(0)
s_prob=torch.transpose(s_prob, 0, 1)
scores_tensor=torch.cat((scores_tensor,s_prob),dim=1)
cur_tokens=top_k_ele.indices[:,f_probs_obj.indices]
cur_tokens=cur_tokens[:,0]
cur_tokens=cur_tokens.unsqueeze(0)
cur_tokens=torch.transpose(cur_tokens, 0, 1)
generated_dec=torch.cat((generated_dec,cur_tokens),dim=1)
# time.sleep(1) #Having a 1-second delay solves the issue.
在贪婪搜索的情况下输出很好。 我也尝试设置火炬种子,但由于没有随机变量,因此没有效果。
I was trying to implement a beam search for a T5 model which is running with ONNX io bindings. When I am running the beam search it is producing different outputs every time. But when I am adding a delay in the token generation part it is producing the same output every time. But delay is not the solution as for that reason I am implementing beam search from scratch. Below is the code snippet.
batch_size=4
num_beams=4
max_length=15
#Encoder prediction.
enc_out=t5_enc(input_ids=input_ids)
#First time prediction dec ids.
gen_dec_first=torch.zeros((batch_size,1),device="cuda",dtype=torch.long)
#Next sequence decoder ids representing the number of beams.
generated_dec = torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)
#Scores tensor to accomodate the log probabilities of the outputs chosen.
scores_tensor= torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)
#Preloop prediction.
#Predict for input_ids of batch_size.
dec_outs=t5_dec(gen_dec_first,enc_out)
#Select the top num_beams size tokens from each prediction.
top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)
#Append them as first prediction.
first_token=top_k_ele.indices.flatten().unsqueeze(0)
first_token=torch.transpose(first_token, 0, 1)
generated_dec=torch.cat((generated_dec,first_token),dim=1)
#Previous time step log probabilities.
prev_prob=top_k_ele.values.flatten().unsqueeze(0)
prev_prob=torch.transpose(prev_prob, 0, 1)
#Append the score first.
scores_tensor=torch.cat((scores_tensor,prev_prob),dim=1)
prev_prob=torch.tile(prev_prob,(1,num_beams))
#Repeat the encoder outputs for num_beams.
# enc_copy=enc_out.detach().clone()
# enc_out=torch.repeat_interleave(enc_out,torch.tensor([4,4,4,4],device="cuda"),dim=0)
enc_out=t5_enc(input_ids=input_ids.repeat_interleave(4, dim=0))
for i in range(max_length):
dec_outs=t5_dec(generated_dec,enc_out)
top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)
c_prob=torch.squeeze(top_k_ele.values,dim=1)
if i==0:
f_prob=torch.add(prev_prob,c_prob)
else:
prev_prob_t=torch.transpose(prev_prob.unsqueeze(0), 0, 1)
prev_prob_t=torch.tile(prev_prob_t,(1,num_beams))
f_prob=torch.add(prev_prob_t,c_prob)
f_probs_obj=f_prob.max(dim=1)
f_probs=f_probs_obj.values
f_indices=top_k_ele.indices[:,f_probs_obj.indices]
prev_prob=f_probs
#Append the score.
s_prob=prev_prob.unsqueeze(0)
s_prob=torch.transpose(s_prob, 0, 1)
scores_tensor=torch.cat((scores_tensor,s_prob),dim=1)
cur_tokens=top_k_ele.indices[:,f_probs_obj.indices]
cur_tokens=cur_tokens[:,0]
cur_tokens=cur_tokens.unsqueeze(0)
cur_tokens=torch.transpose(cur_tokens, 0, 1)
generated_dec=torch.cat((generated_dec,cur_tokens),dim=1)
# time.sleep(1) #Having a 1-second delay solves the issue.
The output is fine in case of greedy search.
I tried setting torch seed also but since there are no random variables there is no effect.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该问题是由于当我的代码执行时 CUDA 内核仍在幕后运行。添加 torch.cuda.synchronize() 解决了这个问题。
The issue was due to the fact that CUDA kernels were still running under the hood while my code was executing. Adding torch.cuda.synchronize() solved the issue.