我自己实现的集束搜索每次都会得到不同的输出。但是当我在代码中添加延迟时就解决了。为什么？

发布于 2025-01-13 11:49:33 字数 2585 浏览 1 评论 0原文

我试图对使用 ONNX io 绑定运行的 T5 模型实现波束搜索。当我运行波束搜索时，它每次都会产生不同的输出。但是当我在令牌生成部分添加延迟时，它每次都会产生相同的输出。但延迟并不是解决方案，因为这个原因我正在从头开始实施波束搜索。下面是代码片段。

batch_size=4
num_beams=4
max_length=15
#Encoder prediction.
enc_out=t5_enc(input_ids=input_ids)
#First time prediction dec ids.
gen_dec_first=torch.zeros((batch_size,1),device="cuda",dtype=torch.long)
#Next sequence decoder ids representing the number of beams.
generated_dec = torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)

#Scores tensor to accomodate the log probabilities of the outputs chosen.
scores_tensor= torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)


#Preloop prediction.

#Predict for input_ids of batch_size.
dec_outs=t5_dec(gen_dec_first,enc_out)
#Select the top num_beams size tokens from each prediction.
top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)

#Append them as first prediction.
first_token=top_k_ele.indices.flatten().unsqueeze(0)
first_token=torch.transpose(first_token, 0, 1)
generated_dec=torch.cat((generated_dec,first_token),dim=1)

#Previous time step log probabilities.
prev_prob=top_k_ele.values.flatten().unsqueeze(0)
prev_prob=torch.transpose(prev_prob, 0, 1)
#Append the score first.
scores_tensor=torch.cat((scores_tensor,prev_prob),dim=1)

prev_prob=torch.tile(prev_prob,(1,num_beams))

#Repeat the encoder outputs for num_beams.

# enc_copy=enc_out.detach().clone()
# enc_out=torch.repeat_interleave(enc_out,torch.tensor([4,4,4,4],device="cuda"),dim=0)
enc_out=t5_enc(input_ids=input_ids.repeat_interleave(4, dim=0))


for i in range(max_length):
  dec_outs=t5_dec(generated_dec,enc_out)
  top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)
  c_prob=torch.squeeze(top_k_ele.values,dim=1)
  if i==0:
    f_prob=torch.add(prev_prob,c_prob)
  else:
    prev_prob_t=torch.transpose(prev_prob.unsqueeze(0), 0, 1)
    prev_prob_t=torch.tile(prev_prob_t,(1,num_beams))
    f_prob=torch.add(prev_prob_t,c_prob)
  f_probs_obj=f_prob.max(dim=1)
  f_probs=f_probs_obj.values
  f_indices=top_k_ele.indices[:,f_probs_obj.indices]
  prev_prob=f_probs
  #Append the score.
  s_prob=prev_prob.unsqueeze(0)
  s_prob=torch.transpose(s_prob, 0, 1)
  scores_tensor=torch.cat((scores_tensor,s_prob),dim=1)
  cur_tokens=top_k_ele.indices[:,f_probs_obj.indices]
  cur_tokens=cur_tokens[:,0]
  cur_tokens=cur_tokens.unsqueeze(0)
  cur_tokens=torch.transpose(cur_tokens, 0, 1)
  generated_dec=torch.cat((generated_dec,cur_tokens),dim=1)

  # time.sleep(1) #Having a 1-second delay solves the issue.

在贪婪搜索的情况下输出很好。我也尝试设置火炬种子，但由于没有随机变量，因此没有效果。

原文

I was trying to implement a beam search for a T5 model which is running with ONNX io bindings. When I am running the beam search it is producing different outputs every time. But when I am adding a delay in the token generation part it is producing the same output every time. But delay is not the solution as for that reason I am implementing beam search from scratch. Below is the code snippet.

batch_size=4
num_beams=4
max_length=15
#Encoder prediction.
enc_out=t5_enc(input_ids=input_ids)
#First time prediction dec ids.
gen_dec_first=torch.zeros((batch_size,1),device="cuda",dtype=torch.long)
#Next sequence decoder ids representing the number of beams.
generated_dec = torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)

#Scores tensor to accomodate the log probabilities of the outputs chosen.
scores_tensor= torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)


#Preloop prediction.

#Predict for input_ids of batch_size.
dec_outs=t5_dec(gen_dec_first,enc_out)
#Select the top num_beams size tokens from each prediction.
top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)

#Append them as first prediction.
first_token=top_k_ele.indices.flatten().unsqueeze(0)
first_token=torch.transpose(first_token, 0, 1)
generated_dec=torch.cat((generated_dec,first_token),dim=1)

#Previous time step log probabilities.
prev_prob=top_k_ele.values.flatten().unsqueeze(0)
prev_prob=torch.transpose(prev_prob, 0, 1)
#Append the score first.
scores_tensor=torch.cat((scores_tensor,prev_prob),dim=1)

prev_prob=torch.tile(prev_prob,(1,num_beams))

#Repeat the encoder outputs for num_beams.

# enc_copy=enc_out.detach().clone()
# enc_out=torch.repeat_interleave(enc_out,torch.tensor([4,4,4,4],device="cuda"),dim=0)
enc_out=t5_enc(input_ids=input_ids.repeat_interleave(4, dim=0))


for i in range(max_length):
  dec_outs=t5_dec(generated_dec,enc_out)
  top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)
  c_prob=torch.squeeze(top_k_ele.values,dim=1)
  if i==0:
    f_prob=torch.add(prev_prob,c_prob)
  else:
    prev_prob_t=torch.transpose(prev_prob.unsqueeze(0), 0, 1)
    prev_prob_t=torch.tile(prev_prob_t,(1,num_beams))
    f_prob=torch.add(prev_prob_t,c_prob)
  f_probs_obj=f_prob.max(dim=1)
  f_probs=f_probs_obj.values
  f_indices=top_k_ele.indices[:,f_probs_obj.indices]
  prev_prob=f_probs
  #Append the score.
  s_prob=prev_prob.unsqueeze(0)
  s_prob=torch.transpose(s_prob, 0, 1)
  scores_tensor=torch.cat((scores_tensor,s_prob),dim=1)
  cur_tokens=top_k_ele.indices[:,f_probs_obj.indices]
  cur_tokens=cur_tokens[:,0]
  cur_tokens=cur_tokens.unsqueeze(0)
  cur_tokens=torch.transpose(cur_tokens, 0, 1)
  generated_dec=torch.cat((generated_dec,cur_tokens),dim=1)

  # time.sleep(1) #Having a 1-second delay solves the issue.

The output is fine in case of greedy search.
I tried setting torch seed also but since there are no random variables there is no effect.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

欲拥i 2025-01-20 11:49:33

该问题是由于当我的代码执行时 CUDA 内核仍在幕后运行。添加 torch.cuda.synchronize() 解决了这个问题。

batch_size=4
num_beams=4
max_length=15
#Encoder prediction.
enc_out=t5_enc(input_ids=input_ids)
#First time prediction dec ids.
gen_dec_first=torch.zeros((batch_size,1),device="cuda",dtype=torch.long)
#Next sequence decoder ids representing the number of beams.
generated_dec = torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)

#Scores tensor to accomodate the log probabilities of the outputs chosen.
scores_tensor= torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)


#Preloop prediction.

#Predict for input_ids of batch_size.
dec_outs=t5_dec(gen_dec_first,enc_out)
#Select the top num_beams size tokens from each prediction.
top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)

#Append them as first prediction.
first_token=top_k_ele.indices.flatten().unsqueeze(0)
first_token=torch.transpose(first_token, 0, 1)
generated_dec=torch.cat((generated_dec,first_token),dim=1)

#Previous time step log probabilities.
prev_prob=top_k_ele.values.flatten().unsqueeze(0)
prev_prob=torch.transpose(prev_prob, 0, 1)
#Append the score first.
scores_tensor=torch.cat((scores_tensor,prev_prob),dim=1)

prev_prob=torch.tile(prev_prob,(1,num_beams))

#Repeat the encoder outputs for num_beams.

# enc_copy=enc_out.detach().clone()
# enc_out=torch.repeat_interleave(enc_out,torch.tensor([4,4,4,4],device="cuda"),dim=0)
enc_out=t5_enc(input_ids=input_ids.repeat_interleave(4, dim=0))


for i in range(max_length):
  dec_outs=t5_dec(generated_dec,enc_out)
  top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)
  c_prob=torch.squeeze(top_k_ele.values,dim=1)
  if i==0:
    f_prob=torch.add(prev_prob,c_prob)
  else:
    prev_prob_t=torch.transpose(prev_prob.unsqueeze(0), 0, 1)
    prev_prob_t=torch.tile(prev_prob_t,(1,num_beams))
    f_prob=torch.add(prev_prob_t,c_prob)
  f_probs_obj=f_prob.max(dim=1)
  f_probs=f_probs_obj.values
  f_indices=top_k_ele.indices[:,f_probs_obj.indices]
  prev_prob=f_probs
  #Append the score.
  s_prob=prev_prob.unsqueeze(0)
  s_prob=torch.transpose(s_prob, 0, 1)
  scores_tensor=torch.cat((scores_tensor,s_prob),dim=1)
  cur_tokens=top_k_ele.indices[:,f_probs_obj.indices]
  cur_tokens=cur_tokens[:,0]
  cur_tokens=cur_tokens.unsqueeze(0)
  cur_tokens=torch.transpose(cur_tokens, 0, 1)
  generated_dec=torch.cat((generated_dec,cur_tokens),dim=1)
  torch.cuda.synchronize()

  # time.sleep(1) #Having a 1-second delay solves the issue.

The issue was due to the fact that CUDA kernels were still running under the hood while my code was executing. Adding torch.cuda.synchronize() solved the issue.

batch_size=4
num_beams=4
max_length=15
#Encoder prediction.
enc_out=t5_enc(input_ids=input_ids)
#First time prediction dec ids.
gen_dec_first=torch.zeros((batch_size,1),device="cuda",dtype=torch.long)
#Next sequence decoder ids representing the number of beams.
generated_dec = torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)

#Scores tensor to accomodate the log probabilities of the outputs chosen.
scores_tensor= torch.zeros((batch_size*num_beams,1),device="cuda",dtype=torch.long)


#Preloop prediction.

#Predict for input_ids of batch_size.
dec_outs=t5_dec(gen_dec_first,enc_out)
#Select the top num_beams size tokens from each prediction.
top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)

#Append them as first prediction.
first_token=top_k_ele.indices.flatten().unsqueeze(0)
first_token=torch.transpose(first_token, 0, 1)
generated_dec=torch.cat((generated_dec,first_token),dim=1)

#Previous time step log probabilities.
prev_prob=top_k_ele.values.flatten().unsqueeze(0)
prev_prob=torch.transpose(prev_prob, 0, 1)
#Append the score first.
scores_tensor=torch.cat((scores_tensor,prev_prob),dim=1)

prev_prob=torch.tile(prev_prob,(1,num_beams))

#Repeat the encoder outputs for num_beams.

# enc_copy=enc_out.detach().clone()
# enc_out=torch.repeat_interleave(enc_out,torch.tensor([4,4,4,4],device="cuda"),dim=0)
enc_out=t5_enc(input_ids=input_ids.repeat_interleave(4, dim=0))


for i in range(max_length):
  dec_outs=t5_dec(generated_dec,enc_out)
  top_k_ele=torch.topk(dec_outs[:,-1,:],k=num_beams,dim=-1)
  c_prob=torch.squeeze(top_k_ele.values,dim=1)
  if i==0:
    f_prob=torch.add(prev_prob,c_prob)
  else:
    prev_prob_t=torch.transpose(prev_prob.unsqueeze(0), 0, 1)
    prev_prob_t=torch.tile(prev_prob_t,(1,num_beams))
    f_prob=torch.add(prev_prob_t,c_prob)
  f_probs_obj=f_prob.max(dim=1)
  f_probs=f_probs_obj.values
  f_indices=top_k_ele.indices[:,f_probs_obj.indices]
  prev_prob=f_probs
  #Append the score.
  s_prob=prev_prob.unsqueeze(0)
  s_prob=torch.transpose(s_prob, 0, 1)
  scores_tensor=torch.cat((scores_tensor,s_prob),dim=1)
  cur_tokens=top_k_ele.indices[:,f_probs_obj.indices]
  cur_tokens=cur_tokens[:,0]
  cur_tokens=cur_tokens.unsqueeze(0)
  cur_tokens=torch.transpose(cur_tokens, 0, 1)
  generated_dec=torch.cat((generated_dec,cur_tokens),dim=1)
  torch.cuda.synchronize()

  # time.sleep(1) #Having a 1-second delay solves the issue.

回复收藏 0 原文

~没有更多了~