模拟 RNA 合成的 Perl 程序

发布于 2024-09-30 21:13:22 字数 1109 浏览 3 评论 0原文

寻找有关如何完成我的 Perl 编程作业以编写 RNA 合成程序的建议。我总结并概述了以下计划。具体来说，我正在寻找有关以下块的反馈（我将编号以便于参考）。我已经阅读了 Andrew Johnson 所著的《Perl 编程原理》（很棒的书）的第 6 章。我还阅读了 perlfunc 和 perlop pod 页面，但没有任何内容跳出从哪里开始。

程序描述：该程序应从命令行读取输入文件，将其翻译为 RNA，然后将 RNA 转录为大写单字母氨基酸名称序列。

接受在命令行上命名的文件
<块引用>
这里我将使用<>运算符

检查以确保文件仅包含 acgt 或 die

if ( <> ne [acgt] ) { die "用法：文件必须仅包含核苷酸 \n"; }

将 DNA 转录为 RNA（每个 A 替换为 U，T 替换为 A，C 替换为 G，G 替换为 C）< /p> <块引用>
不知道该怎么做
接受这个转录&从第一次出现“AUG”开始将其分成 3 个字符“密码子”
<块引用>
不确定，但我想这是我开始 %hash 变量的地方？
取 3 个字符“密码子”并赋予它们一个字母符号（大写单字母氨基酸名称）
<块引用>
使用（这里有 70 种可能性，所以我不确定在哪里存储或如何访问）为键分配值
如果遇到间隙，则会启动新行并重复该过程
<块引用>
不确定，但我们可以假设间隙是三的倍数。
我处理这个问题的方式正确吗？是否有一个我忽略的 Perl 函数可以简化主程序？

注意

必须是自包含程序（密码子名称和符号的存储值）。

每当程序读取没有符号的密码子（这是 RNA 中的间隙）时，它应该开始新的输出行并从下一次出现“AUG”开始。为简单起见，我们可以假设间隙始终是三的倍数。

在我花费额外的时间进行研究之前，我希望得到确认，证明我正在采取正确的方法。感谢您花时间阅读并分享您的专业知识！

原文

Looking for suggestions on how to approach my Perl programming homework assignment to write an RNA synthesis program. I've summed and outlined the program below. Specifically, I'm looking for feedback on the blocks below (I'll number for easy reference). I've read up to chapter 6 in Elements of Programming with Perl by Andrew Johnson (great book). I've also read the perlfunc and perlop pod-pages with nothing jumping out on where to start.

Program Description: The program should read an input file from the command line, translate it into RNA, and then transcribe the RNA into a sequence of uppercase one-letter amino acid names.

Accept a file named on the command line
here I will use the <> operator

Check to make sure the file only contains acgt or die

if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }

Transcribe the DNA to RNA (Every A replaced by U, T replaced by A, C replaced by G, G replaced by C)
not sure how to do this
Take this transcription & break it into 3 character 'codons' starting at the first occurance of "AUG"
not sure but I'm thinking this is where I will start a %hash variables?
Take the 3 character "codons" and give them a single letter Symbol (an uppercase one-letter amino acid name)
Assign a key a value using (there are 70 possibilities here so I'm not sure where to store or how to access)
If a gap is encountered a new line is started and process is repeated
not sure but we can assume that gaps are multiples of threes.
Am I approaching this the right way? Is there a Perl function that I'm overlooking that can simplify the main program?

Note

Must be self contained program (stored values for codon names & symbols).

Whenever the program reads a codon that has no symbol this is a gap in the RNA, it should start a new line of output and begin at the next occurance of "AUG". For simplicity we can assume that gaps are always multiples of threes.

Before I spend any additional hours on research I am hoping to get confirmation that I'm taking the right approach. Thanks for taking time to read and for sharing your expertise!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

铃予 2024-10-07 21:13:22

<代码>1。在这里我将使用 <>好的

，你的计划是逐行读取文件。不要忘记在执行过程中chomp每一行，否则最终会在序列中出现换行符。

<代码>2。检查并确保文件仅包含 acgt 或 die

if ( <> ne [acgt] ) { die "usage: 文件必须仅包含核苷酸 \n";

在 while 循环中，<> 运算符将读取的行放入特殊变量 $_ 中，除非您显式分配它 ( 我的$line = <> ）。

在上面的代码中，您从文件中读取一行并丢弃它。您需要保存该行。

另外，ne 运算符比较两个字符串，而不是一个字符串和一个正则表达式。您将需要此处的 !~ 运算符（或 =~ 运算符，带有否定字符类 [^acgt]。如果您需要测试不区分大小写，查看 i 标志进行正则表达式匹配

3 。 G，G 替换为 C)。

正如 GWW 所说，检查您的生物学特性，T->U 是转录中的唯一步骤。您会发现 tr（音译）运算符很有帮助。。

4. 从第一次出现“AUG”开始将其分解为 3 个字符“密码子”

不确定，但我想这就是我要开始的地方 %hash 变量？

我会在 while(<>) 循环之外使用一个缓冲区。使用 index 来匹配“AUG”。如果找不到它，请将最后两个基数放在该标量上（您可以使用 substr $line, -2, 2 在循环追加的下一次迭代中）。 .=) 到这两个碱基的线路，然后再次测试“AUG”。如果您命中了，您就会知道在哪里，这样您就可以标记该位置并开始翻译。

<代码>5。取 3 个字符的“密码子”并给它们一个单字母符号（大写单字母氨基酸名称）

使用（这里有 70 种可能性，所以我不确定在哪里）为键分配一个值存储或如何访问）

再次，正如 GWW 所说，构建一个哈希表：

%codons = ( AUG => 'M', ...)。

然后，您可以使用（例如）split 构建您正在检查的当前行的数组，一次构建三个元素的密码子，并从哈希表中获取正确的氨基酸代码。

6.如果遇到间隙，则会启动新行并重复该过程

不确定，但我们可以假设间隙是三的倍数。

参见上文。您可以使用 exists $codons{$current_codon} 测试是否存在间隙。

<代码>7。我以正确的方式处理这个问题吗？是否有一个我忽略的 Perl 函数可以简化主程序？

你知道，看看上面的内容，它似乎太复杂了。我搭建了一些积木；子例程 read_codon 和 translate：我认为它们对程序的逻辑有很大帮助。

我知道这是一项家庭作业，但我认为它可能会帮助您了解其他可能的方法：

use warnings; use strict;
use feature 'state';


# read_codon works by using the new [state][1] feature in Perl 5.10
# both @buffer and $handle represent 'state' on this function:
# Both permits abstracting reading codons from processing the file
# line-by-line.
# Once read_colon is called for the first time, both are initialized.
# Since $handle is a state variable, the current file handle position
# is never reset. Similarly, @buffer always holds whatever was left
# from the previous call.
# The base case is that @buffer contains less than 3bp, in which case
# we need to read a new line, remove the "\n" character,
# split it and push the resulting list to the end of the @buffer.
# If we encounter EOF on the $handle, then we have exhausted the file,
# and the @buffer as well, so we 'return' undef.
# otherwise we pick the first 3bp of the @buffer, join them into a string,
# transcribe it and return it.

sub read_codon {
    my ($file) = @_;

    state @buffer;
    open state $handle, '<', $file or die $!;

    if (@buffer < 3) {
        my $new_line = scalar <$handle> or return;
        chomp $new_line;
        push @buffer, split //, $new_line;
    }

    return transcribe(
                       join '', 
                       shift @buffer,
                       shift @buffer,
                       shift @buffer
                     );
}

sub transcribe {
    my ($codon) = @_;
    $codon =~ tr/T/U/;
    return $codon;
}


# translate works by using the new [state][1] feature in Perl 5.10
# the $TRANSLATE state is initialized to 0
# as codons are passed to it, 
# the sub updates the state according to start and stop codons.
# Since $TRANSLATE is a state variable, it is only initialized once,
# (the first time the sub is called)
# If the current state is 'translating',
# then the sub returns the appropriate amino-acid from the %codes table, if any.
# Thus this provides a logical way to the caller of this sub to determine whether
# it should print an amino-acid or not: if not, the sub will return undef.
# %codes could also be a state variable, but since it is not actually a 'state',
# it is initialized once, in a code block visible form the sub,
# but separate from the rest of the program, since it is 'private' to the sub

{
    our %codes = (
        AUG => 'M',
        ...
    );

    sub translate {
        my ($codon) = @_ or return;

        state $TRANSLATE = 0;

        $TRANSLATE = 1 if $codon =~ m/AUG/i;
        $TRANSLATE = 0 if $codon =~ m/U(AA|GA|AG)/i;

        return $codes{$codon} if $TRANSLATE;
    }
}

1. here I will use the <> operator

OK, your plan is to read the file line by line. Don't forget to chomp each line as you go, or you'll end up with newline characters in your sequence.

2. Check to make sure the file only contains acgt or die

if ( <> ne [acgt] ) { die "usage: file must only contain nucleotides \n"; }

In a while loop, the <> operator puts the line read into the special variable $_, unless you assign it explicitly (my $line = <>).

In the code above, you're reading one line from the file and discarding it. You'll need to save that line.

Also, the ne operator compares two strings, not one string and one regular expression. You'll need the !~ operator here (or the =~ one, with a negated character class [^acgt]. If you need the test to be case-insensitive, look into the i flag for regular expression matching.

3. Transcribe the DNA to RNA (Every A replaced by U, T replaced by A, C replaced by G, G replaced by C).

As GWW said, check your biology. T->U is the only step in transcription. You'll find the tr (transliterate) operator helpful here.

4. Take this transcription & break it into 3 character 'codons' starting at the first occurance of "AUG"

not sure but I'm thinking this is where I will start a %hash variables?

I would use a buffer here. Define an scalar outside the while(<>) loop. Use index to match "AUG". If you don't find it, put the last two bases on that scalar (you can use substr $line, -2, 2 for that). On the next iteration of the loop append (with .=) the line to those two bases, and then test for "AUG" again. If you get a hit, you'll know where, so you can mark the spot and start translation.

5. Take the 3 character "codons" and give them a single letter Symbol (an uppercase one-letter amino acid name)

Assign a key a value using (there are 70 possibilities here so I'm not sure where to store or how to access)

Again, as GWW said, build a hash table:

%codons = ( AUG => 'M', ...).

Then you can use (for eg.) split to build an array of the current line you're examining, build codons three elements at a time, and grab the correct aminoacid code from the hash table.

6.If a gap is encountered a new line is started and process is repeated

not sure but we can assume that gaps are multiples of threes.

See above. You can test for the existence of a gap with exists $codons{$current_codon}.

7. Am I approaching this the right way? Is there a Perl function that I'm overlooking that can simplify the main program?

You know, looking at the above, it seems way too complex. I built a few building blocks; the subroutines read_codon and translate: I think they help the logic of the program immensely.

I know this is a homework assignment, but I figure it might help you get a feel for other possible approaches:

use warnings; use strict;
use feature 'state';


# read_codon works by using the new [state][1] feature in Perl 5.10
# both @buffer and $handle represent 'state' on this function:
# Both permits abstracting reading codons from processing the file
# line-by-line.
# Once read_colon is called for the first time, both are initialized.
# Since $handle is a state variable, the current file handle position
# is never reset. Similarly, @buffer always holds whatever was left
# from the previous call.
# The base case is that @buffer contains less than 3bp, in which case
# we need to read a new line, remove the "\n" character,
# split it and push the resulting list to the end of the @buffer.
# If we encounter EOF on the $handle, then we have exhausted the file,
# and the @buffer as well, so we 'return' undef.
# otherwise we pick the first 3bp of the @buffer, join them into a string,
# transcribe it and return it.

sub read_codon {
    my ($file) = @_;

    state @buffer;
    open state $handle, '<', $file or die $!;

    if (@buffer < 3) {
        my $new_line = scalar <$handle> or return;
        chomp $new_line;
        push @buffer, split //, $new_line;
    }

    return transcribe(
                       join '', 
                       shift @buffer,
                       shift @buffer,
                       shift @buffer
                     );
}

sub transcribe {
    my ($codon) = @_;
    $codon =~ tr/T/U/;
    return $codon;
}


# translate works by using the new [state][1] feature in Perl 5.10
# the $TRANSLATE state is initialized to 0
# as codons are passed to it, 
# the sub updates the state according to start and stop codons.
# Since $TRANSLATE is a state variable, it is only initialized once,
# (the first time the sub is called)
# If the current state is 'translating',
# then the sub returns the appropriate amino-acid from the %codes table, if any.
# Thus this provides a logical way to the caller of this sub to determine whether
# it should print an amino-acid or not: if not, the sub will return undef.
# %codes could also be a state variable, but since it is not actually a 'state',
# it is initialized once, in a code block visible form the sub,
# but separate from the rest of the program, since it is 'private' to the sub

{
    our %codes = (
        AUG => 'M',
        ...
    );

    sub translate {
        my ($codon) = @_ or return;

        state $TRANSLATE = 0;

        $TRANSLATE = 1 if $codon =~ m/AUG/i;
        $TRANSLATE = 0 if $codon =~ m/U(AA|GA|AG)/i;

        return $codes{$codon} if $TRANSLATE;
    }
}

回复收藏 0 原文