使用 perl 脚本编辑帮助以在数组中的特定位置启动和停止

发布于 2024-10-01 11:09:10 字数 2329 浏览 14 评论 0原文

寻找故障排除和编辑帮助。这是一项家庭作业。我的教授鼓励使用论坛。我还没有 Perl 函数或 Subs 的经验，所以请将回复限制在适当的水平，以便我能够理解。

该脚本的目的是读取一串 DNA（或来自命令行的文件，我稍后将添加），将其翻译为 RNA，然后以大写单字母氨基酸名称的形式返回蛋白质的值。

脚本的功能：

从第一个字符中取出 3 个字符“密码子”，并给它们一个单字母符号（哈希表中的大写单字母氨基酸名称）
打印 RNA 蛋白质，以 AUG（“M”）开头并以 UAG、UAA 或 UGA 结尾的字符串。
如果遇到间隙，则会启动新行并重复该过程。我们可以假设间隙是三的倍数。

据我所知主要问题：

我不知道在哪里让数据循环通过哈希表。我尝试将它放在 Foreach 块之前和之后。我还完全取出了 Foreach 块并尝试了 While &如果。
Foreach 块似乎没有处理所有 @all_codons 数组，并且仅在 AUG 处停止。
明显且最大的问题是它没有返回任何内容。在此过程中，$next_codon 值被指定为“false”。我尝试逐条注释每一行 - 返回任何内容的最后一行是 My $start ，从那里开始，一切都是错误的。

脚本：

$^W = 1;
use strict;


my $dna_string = "CCCCAAATGCTGGGATTACAGGCGTGAGCCACCACGCCCGGCCACTTGGCATGAATTTAATTCCCGCCATAAACCTGTGAGATAGGTAATTCTGTTATATCCACTTTACAAATGAAGAGACTGAGGCAAAGAAAGATGATGTAACTTACGCAAAGC";

my %codon_codes = (
    "UUU" => "f", "UUC" => "f", "UUA" => "l", "UUG" => "l",
    "CUU" => "l", "CUC" => "l", "CUA" => "l", "CUG" => "l",
    "AUU" => "i", "AUC" => "i", "AUA" => "i", "AUG" => "m",
    "GUU" => "v", "GUC" => "v", "GUA" => "v", "GUG" => "v",
    "UCU" => "s", "UCC" => "s", "UCA" => "s", "UCG" => "s",
    "CCU" => "p", "CCC" => "p", "CCA" => "p", "CCG" => "p",
    "ACU" => "t", "ACC" => "t", "ACA" => "t", "ACG" => "t", 
    "GCU" => "a", "GCC" => "a", "GCA" => "a", "GCG" => "a",
    "UAU" => "y", "UAC" => "y", "UAA" => " ", "UAG" => " ",
    "CAU" => "h", "CAC" => "h", "CAA" => "q", "CAG" => "q",
    "AAU" => "n", "AAC" => "n", "AAA" => "k", "AAG" => "k"
 );

my $rna_string = $dna_string;
$rna_string =~ tr/[tT]/U/;

my @all_codons = ($rna_string =~ m/.../g);

foreach my $next_codon(@all_codons){
            
    while ($next_codon =~ /AUG/gi){
            
        my $start = pos ($next_codon) -3;
    
        last unless $next_codon =~ /U(AA|GA|AG)/gi;
    
        my $stop = pos($next_codon);
            
        my $genelen = $stop - $start;
            
        my $gene = substr ($next_codon, $start, $genelen);
            
        print "\n" . join($start+1, $stop, $gene,) . "\n";
    }
}

原文

Looking for troubleshooting and editing help. This is a homework assignment. My professor encourages the use of forums. I don't have experience with Perl Functions or Subs yet so please limit responses to the appropriate level so I can understand.

The purpose of the script is to read a string of DNA (or file from command line which I will add later), translate it into RNA, and then return the value of the protein in the form of uppercase one-letter amino acid names.

The function of the script:

Take 3 character "codons" from the first character and give them a single letter Symbol (an uppercase one-letter amino acid name from the hash table)
Print RNA Proteins which are strings that start with the AUG ("M") and ends with UAG, UAA or UGA.
If a gap is encountered a new line is started and process is repeated. We can assume that gaps are multiples of threes.

Main problems as far as I can tell:

I don't know where to have the data loop through the hash table. I've tried placing it before and after my Foreach block. I've also taken the Foreach block out altogether and tried While & If.
The Foreach block doesn't seem to be processing all of the @all_codons array and only stopping at AUG.
The obvious and biggest problem is that it's returning nothing. Somewhere along the way the $next_codon value is being assigned "false". I've tried commenting each line out piece by piece - last line that returned anything was My $start and from there on it's all false.

The Script:

$^W = 1;
use strict;


my $dna_string = "CCCCAAATGCTGGGATTACAGGCGTGAGCCACCACGCCCGGCCACTTGGCATGAATTTAATTCCCGCCATAAACCTGTGAGATAGGTAATTCTGTTATATCCACTTTACAAATGAAGAGACTGAGGCAAAGAAAGATGATGTAACTTACGCAAAGC";

my %codon_codes = (
    "UUU" => "f", "UUC" => "f", "UUA" => "l", "UUG" => "l",
    "CUU" => "l", "CUC" => "l", "CUA" => "l", "CUG" => "l",
    "AUU" => "i", "AUC" => "i", "AUA" => "i", "AUG" => "m",
    "GUU" => "v", "GUC" => "v", "GUA" => "v", "GUG" => "v",
    "UCU" => "s", "UCC" => "s", "UCA" => "s", "UCG" => "s",
    "CCU" => "p", "CCC" => "p", "CCA" => "p", "CCG" => "p",
    "ACU" => "t", "ACC" => "t", "ACA" => "t", "ACG" => "t", 
    "GCU" => "a", "GCC" => "a", "GCA" => "a", "GCG" => "a",
    "UAU" => "y", "UAC" => "y", "UAA" => " ", "UAG" => " ",
    "CAU" => "h", "CAC" => "h", "CAA" => "q", "CAG" => "q",
    "AAU" => "n", "AAC" => "n", "AAA" => "k", "AAG" => "k"
 );

my $rna_string = $dna_string;
$rna_string =~ tr/[tT]/U/;

my @all_codons = ($rna_string =~ m/.../g);

foreach my $next_codon(@all_codons){
            
    while ($next_codon =~ /AUG/gi){
            
        my $start = pos ($next_codon) -3;
    
        last unless $next_codon =~ /U(AA|GA|AG)/gi;
    
        my $stop = pos($next_codon);
            
        my $genelen = $stop - $start;
            
        my $gene = substr ($next_codon, $start, $genelen);
            
        print "\n" . join($start+1, $stop, $gene,) . "\n";
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦断已成空 2024-10-08 11:09:10

我不明白“通过哈希表的数据循环”部分。

在我看来，对于每个密码子，你需要检查它是起始密码子、终止密码子、缺口还是氨基酸。并且您需要某种方式来保持状态（如下$in_gene）。

my $in_gene = 0;

foreach my $next_codon(@all_codons){
    if ($next_codon eq 'AUG') {
        $in_gene = 1;
    }
    elsif ($next_codon =~ m/U(AA|GA|AG)/) {
        $in_gene = 0;
    }
    elsif ($in_gene == 1) {
        my $aminoacid = $codon_codes{$next_codon};
        print "\n" and next unless defined $aminoacid;
        print $aminoacid;
    }
}

这打印

l
lqak
l
q
k

I don't understand the 'data loop through the hash table' part.

It seems to me that, for each codon, you need to check whether it is a start codon, a stop codon, a gap or an amino-acid. And you need to some way to keep state (below as $in_gene).

my $in_gene = 0;

foreach my $next_codon(@all_codons){
    if ($next_codon eq 'AUG') {
        $in_gene = 1;
    }
    elsif ($next_codon =~ m/U(AA|GA|AG)/) {
        $in_gene = 0;
    }
    elsif ($in_gene == 1) {
        my $aminoacid = $codon_codes{$next_codon};
        print "\n" and next unless defined $aminoacid;
        print $aminoacid;
    }
}

This prints