Perl 递归技术？

发布于 2024-08-05 07:35:06 字数 6314 浏览 12 评论 0原文

我需要一些关于这段代码的帮助。我知道应该递归的部分，或者至少我认为我知道，但不确定如何实现它。我正在尝试从对齐矩阵实现路径查找程序，该程序将找到返回零值的多条路线。例如，如果您执行我的代码并插入 CGCA 作为第一个序列，插入 CACGTAT 作为第二个序列，并插入 1、0 和 -1 作为匹配、不匹配和间隙分数。该程序给出的路径为 HDHHDD，对齐方式为

CACGTAT

CGC--A-。

然而，还有更多可能的路径和路线，但我不知道有多少。我想要做的是让我的一段代码自行循环并使用与第一次相同的代码找到其他路径和对齐方式，直到它用完可能的对齐方式。我在网上找到的最好的方法是递归，但没有人能解释如何做到这一点。在这种情况下，应该还有两条路径和对齐方式 HDDDHHD 和 CACGTAT，以及 C--GCA-，和。 HDDDDHH、CACGTAT 和 --CGCA-。我只是不知道如何编码来执行此任务。

# Implementation of Needleman and Wunsch Algorithm

my($seq1, $len1, $seq2, $len2, $data, @matrix, $i, $j, $x, $y, $val1, $val2);
my($val3, $pathrow, $pathcol, $seq1loc, $seq2loc, $gapscore, $matchscore, $mismatchscore);

#first obtain the data from the user. 
print "Please enter the first sequence for comaprsion\n";
$seq1=<STDIN>;
chomp $seq1;

print "Please enter the second sequence for comparsion\n";
$seq2=<STDIN>;
chomp $seq2;


# adding extra characters so sequences align with matrix
# saves some calculations later on
$seq1 = " " . $seq1;
$seq2 = " " . $seq2;
$len1 = length($seq1);
$len2 = length($seq2);

print "Enter the match score: ";
$matchscore=<STDIN>;
chomp $matchscore;

print "Enter the mismatch score: ";
$mismatchscore=<STDIN>;
chomp $mismatchscore;

print "Enter the gap score: ";
$gapscore=<STDIN>;
chomp $gapscore;

# declare a two dimensional array and initialize to spaces
# array must contain one extra row and one extra column
@matrix = ();  
for($i = 0; $i < $len1; $i++){  
   for($j = 0; $j < $len2; $j++){  
      $matrix[$i][$j] = ' ';  
   }  
}  

# initialize 1st row and 1st column of matrix  
$matrix[0][0] = 0;  
for ($i = 1; $i < $len1; $i ++){  
    $matrix[$i][0] = $matrix[$i-1][0] + $gapscore;  
}  
for ($i = 1; $i < $len2; $i ++){  
    $matrix[0][$i] = $matrix[0][$i-1] + $gapscore;  
}  


# STEP 1:
# Fill in rest of matrix using the following rules:
# determine three possible values for matrix[x][y]
# value 1 = add gap score to matrix[x][y-1]
# value 2 = add gap score to matrix[x-1][y]
# value 3 = add match score or mismatch score to 
#           matrix[x-1][y-1] depending on nucleotide 
#           match for position x of $seq1 and position y
#           of seq2
# place the largest of the three values in matrix[x][y]
#
# Best alignment score appears in matrix[$len1][$len2].

for($x = 1; $x < $len1; $x++){  
   for($y = 1; $y < $len2; $y++){  
 $val1 = $matrix[$x][$y-1] + $gapscore;  
 $val2 = $matrix[$x-1][$y] + $gapscore;  
 if (substr($seq1, $x, 1) eq substr($seq2, $y, 1)){  
           $val3 = $matrix[$x-1][$y-1] + $matchscore;  
 }  
 else{  
    $val3 = $matrix[$x-1][$y-1] + $mismatchscore;  
 }  
 if (($val1 >= $val2) && ($val1 >= $val3)){  
    $matrix[$x][$y] = $val1;  
 }  
 elsif (($val2 >= $val1) && ($val2 >= $val3)){  
    $matrix[$x][$y] = $val2;  
 }  
 else{  
    $matrix[$x][$y] = $val3;  
 }  
   }  
}  

# Display scoring matrix  
print "MATRIX:\n";   
for($x = 0; $x < $len1; $x++){  
   for($y = 0; $y < $len2; $y++){  
 print "$matrix[$x][$y] ";  
   }  
   print "\n";  
}  
print "\n";    

# STEP 2:  
# Begin at matrix[$len1][$len2] and find a path to   
# matrix[0][0].  
# Build string to hold path pattern by concatenating either   
# 'H' (current cell produced by cell horizontally to left),   
# 'D' (current cell produced by cell on diagonal),   
# 'V' (current cell produced by cell vertically above)
# ***This is were I need help I need this code to be recursive, so I can find more then one path***

$pathrow = $len1-1;  
$pathcol = $len2-1;  

while (($pathrow != 0) || ($pathcol != 0)){  
    if ($pathrow == 0){  
       # must be from cell to left  
       $path = $path . 'H';  
       $pathcol = $pathcol - 1;  
    }  
    elsif ($pathcol == 0){  
       # must be from cell above  
       $path = $path . 'V';  
       $pathrow = $pathrow - 1;  
    }  
    # could be from any direction  
    elsif  (($matrix[$pathrow][$pathcol-1] + $gapscore) == $matrix[$pathrow][$pathcol]){  
       # from left  
       $path = $path . 'H';  
       $pathcol = $pathcol - 1;  
    }  
    elsif (($matrix[$pathrow-1][$pathcol] + $gapscore) == $matrix[$pathrow][$pathcol]){  
       # from above  
       $path = $path . 'V';  
       $pathrow = $pathrow - 1;  
    }   
    else{  
       # must be from diagonal  
       $path = $path . 'D';  
       $pathrow = $pathrow - 1;  
       $pathcol = $pathcol - 1;  
    }  


}   



print "Path is $path\n";   

# STEP 3:
# Determine alignment pattern by reading path string 
# created in step 2.
# Create two string variables ($alignseq1 and $alignseq2) to hold 
# the sequences for alignment.
# Reading backwards from $seq1, $seq2 and path string, 
#   if string character is 'D', then 
#      concatenate to front of $alignseq1, last char in $seq1 
#      and to the front of $alignseq2, last char in $seq2
#   if string character is 'V', then 
#      concatenate to front of $alignseq1, last char in $seq1 
#      and to the front of $alignseq2, the gap char
#   if string character is 'H', then 
#      concatenate to front of $alignseq1 the gap char
#      and to the front of $alignseq2, last char in $seq2
# Continue process until path string has been traversed.
# Result appears in string $alignseq1 and $seq2
***#I need this code to be recursive as well.***

$seq1loc = $len1-1;  
$seq2loc = $len2-1;  
$pathloc = 0;  
print length($path);  
while ($pathloc < length($path)){  
   if (substr($path, $pathloc, 1) eq 'D'){  
      $alignseq1 = substr($seq1, $seq1loc, 1) . $alignseq1;  
      $alignseq2 = substr($seq2, $seq2loc, 1) . $alignseq2;  
      $seq1loc--;  
      $seq2loc--;  
   }  
   elsif (substr($path, $pathloc, 1) eq 'V'){  
      $alignseq1 = substr($seq1, $seq1loc, 1) . $alignseq1;  
      $alignseq2 = '-' . $alignseq2;  
      $seq1loc--;  
   }    
   else{  # must be an H  
      $alignseq1 = '-' . $alignseq1;  
      $alignseq2 = substr($seq2, $seq2loc, 1) . $alignseq2;  
      $seq2loc--;  
   }    
   $pathloc++;  
}  

print "\nAligned Sequences:\n";  
print "$alignseq2 \n";  
print "$alignseq1 \n";  

# statement may be needed to hold output screen  
print "Press any key to exit program";  
$x = <STDIN>;

如果有人想知道这是一个needleman-wunsch 算法。如有任何帮助，我们将不胜感激。

原文

I need a bit of help with is this code. I know the sections that should be recursive, or at least I think I do but am not sure how to implement it. I am trying to implement a path finding program from an alignment matrix that will find multiple routes back to the zero value. For example if you excute my code and insert CGCA as the first sequence and CACGTAT as the second sequence, and 1, 0, and -1 as the match, mismatch and gap scores. The program gives off the path as HDHHDD and the aligment as

CACGTAT

CGC--A-.

However there are more possible paths and aligments then this, except I don't know how many. What I want to do is have a piece of my code loop back on itself and find other paths and alignments, using the same code as the first time around, until it runs out of possible alignments. The best way I found on the net to do this is recursion, except no one can explain how to do it. In this case there should be two more paths and aligmennts HDDDHHD and CACGTAT, and C--GCA-, and. HDDDDHH , CACGTAT AND --CGCA-. I just don't know how to code to perform this task.

# Implementation of Needleman and Wunsch Algorithm

my($seq1, $len1, $seq2, $len2, $data, @matrix, $i, $j, $x, $y, $val1, $val2);
my($val3, $pathrow, $pathcol, $seq1loc, $seq2loc, $gapscore, $matchscore, $mismatchscore);

#first obtain the data from the user. 
print "Please enter the first sequence for comaprsion\n";
$seq1=<STDIN>;
chomp $seq1;

print "Please enter the second sequence for comparsion\n";
$seq2=<STDIN>;
chomp $seq2;


# adding extra characters so sequences align with matrix
# saves some calculations later on
$seq1 = " " . $seq1;
$seq2 = " " . $seq2;
$len1 = length($seq1);
$len2 = length($seq2);

print "Enter the match score: ";
$matchscore=<STDIN>;
chomp $matchscore;

print "Enter the mismatch score: ";
$mismatchscore=<STDIN>;
chomp $mismatchscore;

print "Enter the gap score: ";
$gapscore=<STDIN>;
chomp $gapscore;

# declare a two dimensional array and initialize to spaces
# array must contain one extra row and one extra column
@matrix = ();  
for($i = 0; $i < $len1; $i++){  
   for($j = 0; $j < $len2; $j++){  
      $matrix[$i][$j] = ' ';  
   }  
}  

# initialize 1st row and 1st column of matrix  
$matrix[0][0] = 0;  
for ($i = 1; $i < $len1; $i ++){  
    $matrix[$i][0] = $matrix[$i-1][0] + $gapscore;  
}  
for ($i = 1; $i < $len2; $i ++){  
    $matrix[0][$i] = $matrix[0][$i-1] + $gapscore;  
}  


# STEP 1:
# Fill in rest of matrix using the following rules:
# determine three possible values for matrix[x][y]
# value 1 = add gap score to matrix[x][y-1]
# value 2 = add gap score to matrix[x-1][y]
# value 3 = add match score or mismatch score to 
#           matrix[x-1][y-1] depending on nucleotide 
#           match for position x of $seq1 and position y
#           of seq2
# place the largest of the three values in matrix[x][y]
#
# Best alignment score appears in matrix[$len1][$len2].

for($x = 1; $x < $len1; $x++){  
   for($y = 1; $y < $len2; $y++){  
 $val1 = $matrix[$x][$y-1] + $gapscore;  
 $val2 = $matrix[$x-1][$y] + $gapscore;  
 if (substr($seq1, $x, 1) eq substr($seq2, $y, 1)){  
           $val3 = $matrix[$x-1][$y-1] + $matchscore;  
 }  
 else{  
    $val3 = $matrix[$x-1][$y-1] + $mismatchscore;  
 }  
 if (($val1 >= $val2) && ($val1 >= $val3)){  
    $matrix[$x][$y] = $val1;  
 }  
 elsif (($val2 >= $val1) && ($val2 >= $val3)){  
    $matrix[$x][$y] = $val2;  
 }  
 else{  
    $matrix[$x][$y] = $val3;  
 }  
   }  
}  

# Display scoring matrix  
print "MATRIX:\n";   
for($x = 0; $x < $len1; $x++){  
   for($y = 0; $y < $len2; $y++){  
 print "$matrix[$x][$y] ";  
   }  
   print "\n";  
}  
print "\n";    

# STEP 2:  
# Begin at matrix[$len1][$len2] and find a path to   
# matrix[0][0].  
# Build string to hold path pattern by concatenating either   
# 'H' (current cell produced by cell horizontally to left),   
# 'D' (current cell produced by cell on diagonal),   
# 'V' (current cell produced by cell vertically above)
# ***This is were I need help I need this code to be recursive, so I can find more then one path***

$pathrow = $len1-1;  
$pathcol = $len2-1;  

while (($pathrow != 0) || ($pathcol != 0)){  
    if ($pathrow == 0){  
       # must be from cell to left  
       $path = $path . 'H';  
       $pathcol = $pathcol - 1;  
    }  
    elsif ($pathcol == 0){  
       # must be from cell above  
       $path = $path . 'V';  
       $pathrow = $pathrow - 1;  
    }  
    # could be from any direction  
    elsif  (($matrix[$pathrow][$pathcol-1] + $gapscore) == $matrix[$pathrow][$pathcol]){  
       # from left  
       $path = $path . 'H';  
       $pathcol = $pathcol - 1;  
    }  
    elsif (($matrix[$pathrow-1][$pathcol] + $gapscore) == $matrix[$pathrow][$pathcol]){  
       # from above  
       $path = $path . 'V';  
       $pathrow = $pathrow - 1;  
    }   
    else{  
       # must be from diagonal  
       $path = $path . 'D';  
       $pathrow = $pathrow - 1;  
       $pathcol = $pathcol - 1;  
    }  


}   



print "Path is $path\n";   

# STEP 3:
# Determine alignment pattern by reading path string 
# created in step 2.
# Create two string variables ($alignseq1 and $alignseq2) to hold 
# the sequences for alignment.
# Reading backwards from $seq1, $seq2 and path string, 
#   if string character is 'D', then 
#      concatenate to front of $alignseq1, last char in $seq1 
#      and to the front of $alignseq2, last char in $seq2
#   if string character is 'V', then 
#      concatenate to front of $alignseq1, last char in $seq1 
#      and to the front of $alignseq2, the gap char
#   if string character is 'H', then 
#      concatenate to front of $alignseq1 the gap char
#      and to the front of $alignseq2, last char in $seq2
# Continue process until path string has been traversed.
# Result appears in string $alignseq1 and $seq2
***#I need this code to be recursive as well.***

$seq1loc = $len1-1;  
$seq2loc = $len2-1;  
$pathloc = 0;  
print length($path);  
while ($pathloc < length($path)){  
   if (substr($path, $pathloc, 1) eq 'D'){  
      $alignseq1 = substr($seq1, $seq1loc, 1) . $alignseq1;  
      $alignseq2 = substr($seq2, $seq2loc, 1) . $alignseq2;  
      $seq1loc--;  
      $seq2loc--;  
   }  
   elsif (substr($path, $pathloc, 1) eq 'V'){  
      $alignseq1 = substr($seq1, $seq1loc, 1) . $alignseq1;  
      $alignseq2 = '-' . $alignseq2;  
      $seq1loc--;  
   }    
   else{  # must be an H  
      $alignseq1 = '-' . $alignseq1;  
      $alignseq2 = substr($seq2, $seq2loc, 1) . $alignseq2;  
      $seq2loc--;  
   }    
   $pathloc++;  
}  

print "\nAligned Sequences:\n";  
print "$alignseq2 \n";  
print "$alignseq1 \n";  

# statement may be needed to hold output screen  
print "Press any key to exit program";  
$x = <STDIN>;

If anyone is wondering this is a needleman-wunsch algorithm. Any help here would be greatly apperciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ζ澈沫 2024-08-12 07:35:06

我无法提供答案，因为我不明白你到底想做什么，但我可以提供一些一般性建议。

开始将代码组织成执行狭义定义的任务的离散子例程。此外，实现中央算法的子例程不应面向从键盘接收输入并向屏幕生成输出；相反，他们应该接收输入作为参数并返回结果。如果需要用户输入或屏幕输出，这些任务应该位于单独的子例程中，而不是与主要算法混合在一起。

该路径的第一步（也是部分步骤）是获取整个程序，将其包含在子例程定义中，然后使用所需的参数调用子例程。子例程应该返回它们，而不是打印其关键结果 - 具体来说，是对 @matrix 的引用以及 $path、$alignseq1 的值代码>，<代码>$alignseq2。

sub NW_algo {
    my ($seq1, $seq2, $matchscore, $mismatchscore, $gapscore) = @_;

    # The rest of your code here, but with all print
    # statements and <STDIN> inputs commented out.

    return \@matrix, $path, $alignseq1, $alignseq2;
}

my(@return_values) = NW_algo('CGCA', 'CACGTAT', 1, 0, -1);

Print_matrix($return_values[0]);

sub Print_matrix {
    for my $m ( @{$_[0]} ){
        print join(' ', @$m), "\n";
    }
}

此时，您将拥有一个可由其他代码调用的算法，从而更轻松地测试和调试您的程序。例如，您可以定义多组输入数据并在每组上运行 NW_algo()。只有这样才有可能考虑递归或其他技术。

I can't provide an answer, because I don't understand exactly what you are try to do, but I can offer some general advice.

Start organizing your code into discrete subroutines that perform narrowly defined tasks. In addition, the subroutines that implement your central algorithms should not be oriented toward receiving input from the keyboard and producing output to the screen; rather they should receive input as arguments and return their results. If there is a need for user input or screen output, those tasks should be in separate subroutines, not comingled with your primary algorithms.

A first (and partial) step down that path is to take you entire program, enclose it in a subroutine definition, and then call the subroutine with the required arguments. Instead of printing its key results, the subroutine should return them -- specifically, a reference to @matrix along with the values for $path, $alignseq1, $alignseq2.

sub NW_algo {
    my ($seq1, $seq2, $matchscore, $mismatchscore, $gapscore) = @_;

    # The rest of your code here, but with all print
    # statements and <STDIN> inputs commented out.

    return \@matrix, $path, $alignseq1, $alignseq2;
}

my(@return_values) = NW_algo('CGCA', 'CACGTAT', 1, 0, -1);

Print_matrix($return_values[0]);

sub Print_matrix {
    for my $m ( @{$_[0]} ){
        print join(' ', @$m), "\n";
    }
}

At this point, you'll have an algorithm that can be invoked by other code, making it easier to test and debug your program going forward. For example, you could define various sets of input data and run NW_algo() on each set. Only then will it be possible to think about recursion or other techniques.

回复收藏 0 原文

眼角的笑意。 2024-08-12 07:35:06

由于 Needleman-Wunsch 是一种动态规划算法，因此在计算 DP 矩阵时大部分工作已经完成。获得 DP 矩阵后，您应该回溯该矩阵以找到最佳对齐方式。这个问题有点像出租车的几何形状，只不过你可以沿对角线移动。本质上，当您需要回溯矩阵时，不是在向上、向左或对角线之间进行选择，而是通过进行三个递归调用来完成所有这三个操作，并且每个递归调用都为向上、向左或对角线调用自己，直到您到达了起点。每条递归所追踪的路径将绘制出每个对齐。

编辑：所以基本上你需要将步骤 2 放在一个子过程中（它获取到目前为止追踪的位置和路径），这样它就可以一遍又一遍地调用自身。当然，定义过程后，您需要对其进行一次调用才能实际启动跟踪过程：

sub tracePaths {
    $x = shift;
    $y = shift;
    $pathSoFar = shift; # assuming you're storing your path as a string
    #
    # ... do some work ...
    #
    tracePaths($x - 1, $y, $pathSoFar . something);
    tracePaths($x, $y - 1, $pathSoFar . somethingelse);
    tracePaths($x - 1, $y - 1, $pathSoFar . somethingelselse);
    #
    #
    #
    if(reached the end) return $pathSoFar;
}
# launch the recursion
tracePaths(beginningx, beginningy, "");

Since Needleman-Wunsch is a dynamic-programming algorithm, most of the work is already done by the time you compute your DP matrix. Once you have your DP matrix, you're supposed to backtrack through the matrix to find the optimal alignment. The problem is a bit like taxicab geometry except that you can move diagonally. Essentially, when you need to backtrack through the matrix, instead of choosing between going up, left, or diagonally, you do all three by making three recursive calls, and each of those call themselves for each of up, left, or diagonally, until you reach your starting point. The path traced by each strand of recursion will draw out each alignment.

EDIT: So basically you need to put Step 2 in a subprocedure (that takes position and the path traced so far), so it can call itself over and over. Of course, after you define the procedure you need to make one call to it to actually start the tracing process:

sub tracePaths {
    $x = shift;
    $y = shift;
    $pathSoFar = shift; # assuming you're storing your path as a string
    #
    # ... do some work ...
    #
    tracePaths($x - 1, $y, $pathSoFar . something);
    tracePaths($x, $y - 1, $pathSoFar . somethingelse);
    tracePaths($x - 1, $y - 1, $pathSoFar . somethingelselse);
    #
    #
    #
    if(reached the end) return $pathSoFar;
}
# launch the recursion
tracePaths(beginningx, beginningy, "");

回复收藏 0 原文