如何在 Perl 中重写此代码的一行代码(或命令行中的更少行代码)?
我有这样的代码:
#!/usr/bin/perl
use strict;
use warnings;
my %proteins = qw/
UUU F UUC F UUA L UUG L UCU S UCC S UCA S UCG S UAU Y UAC Y UGU C UGC C UGG W
CUU L CUC L CUA L CUG L CCU P CCC P CCA P CCG P CAU H CAC H CAA Q CAG Q CGU R CGC R CGA R CGG R
AUU I AUC I AUA I AUG M ACU T ACC T ACA T ACG T AAU N AAC N AAA K AAG K AGU S AGC S AGA R AGG R
GUU V GUC V GUA V GUG V GCU A GCC A GCA A GCG A GAU D GAC D GAA E GAG E GGU G GGC G GGA G GGG G
/;
open(INPUT,"<dna.txt");
while (<INPUT>) {
tr/[a,c,g,t]/[A,C,G,T]/;
y/GCTA/CGAU/;
foreach my $protein (/(...)/g) {
if (defined $proteins{$protein}) {
print $proteins{$protein};
}
}
}
close(INPUT);
此代码与我的其他问题的答案相关: DNA 到 RNA 并用 Perl 获取蛋白质
程序的输出是:
SIMQNISGREAT
我怎样才能用 Perl 重写该代码,它将在命令行上运行,并且将用更少的代码重写(如果可能的话,一行代码) ?
PS 1: dna.txt 是这样的:
TCATAATACGTTTTGTATTCGCCAGCGCTTCGGTGT
PS 2: 如果代码行数较少,可以接受编写 my % Proteins
变量到文件中。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我建议进行的唯一更改是简化
while
循环:由于
y
和tr
是同义词,因此您应该只使用其中之一。我认为tr
比y
读起来更好,所以我选择了tr
。此外,您对它们的称呼非常不同,但这应该是相同的效果,并且只提到您实际更改的字母。 (所有其他字符都被调换为自身。这使得查看实际更改的内容变得更加困难。)您可能需要删除
open(INPUT," 和相应的
close(INPUT);
行,因为它们使得在 shell 管道或不同输入文件中使用程序变得更加困难。但这取决于您,如果输入文件始终是dna.txt
并且没有任何不同,那就没问题。The only changes I would recommend making are simplifying your
while
loop:Since
y
andtr
are synonyms, you should only use one of them. I thinktr
reads better thany
, so I pickedtr
. Further, you were calling them very differently, but this should be the same effect and only mentions the letters you actually change. (All the other characters were being transposed to themselves. That makes it much harder to see what is actually being changed.)You might want to remove the
open(INPUT,"<dna.txt");
and correspondingclose(INPUT);
lines, as they make it much harder to use your program in shell pipelines or with different input files. But that's up to you, if the input file will always bedna.txt
and never anything different, this is alright.有人(@kamaci)在另一个帖子中叫了我的名字。这是我在将蛋白质表保留在命令行上时能想到的最好的方法:
(Shell 引用,对于 Windows 引用交换
'
和"
字符)。此版本标记带有%
的无效密码子,您可以通过在适当的位置添加=~y/%//d
来解决这个问题提示:这会从原始 ASCII 中挑选出 6 位。 RNA 三元组的编码,给出 0 到 101058048 之间的 64 个代码;为了获得字符串索引,我将结果模 63 减少,但这创建了一个双重映射,遗憾的是必须编码两种不同的蛋白质
s/GGG/。 GGC/i
将其中一个映射到编码正确蛋白质的另一个,还要注意
%
运算符之前的括号,它们都隔离,<。
substr
参数列表中的 /code> 运算符 and 修复了&
与%
的优先级。在生产代码中使用它,你就是一个非常非常糟糕的人。Somebody (@kamaci) called my name in another thread. This is the best I can come up with while keeping the protein table on the command line:
(Shell quoting, for Windows quoting swap
'
and"
characters). This version marks invalid codons with%
, you can probably fix that by adding=~y/%//d
at an appropriate spot.Hint: This picks out 6 bits from the raw ASCII encoding of an RNA triple, giving 64 codes between 0 and 101058048; to get a string index, I reduce the result modulo 63, but this creates one double mapping which regrettably had to code two different proteins. The
s/GGG/GGC/i
maps one of them to another that codes the right protein.Also note the parentheses before the
%
operator which both isolate the,
operator from the argument list ofsubstr
and fix the precedence of&
vs%
. If you ever use that in production code, you're a bad, bad person.唷。我能想到的最好的办法,至少这么快。如果您确定输入始终为大写,您还可以删除
uc
来保存另外两个字符。或者,如果输入始终相同,您可以立即将其分配给$_
,而不是从任何地方读取它。我想我不需要说这段代码不应该在生产环境或除了纯粹的乐趣之外的任何其他地方使用。在进行实际编程时,可读性几乎总是胜过紧凑性。
我在评论中提到的一些其他版本:
Reading %p and the DNA from files:
From shell with
perl -e
:Phew. Best I can come up with, at least this quickly. If you're sure the input is always already in uppercase, you can also drop the
uc
saving another two characters. Or if the input is always the same, you could assign it to$_
straight away instead of reading it from anywhere.I guess I don't need to say that this code should not be used in production environments or anywhere else other than pure fun. When doing actual programming, readability almost always wins over compactness.
A few other versions I mentioned in the comments:
Reading %p and the DNA from files:
From shell with
perl -e
:大多数事情已经指出,尤其是可读性很重要。我不会尝试将程序简化得比下面的更多。
我添加的唯一“一行”内容是 while 循环中的
push map grep m//g
。请注意,Perl 5.10 添加了“定义或”运算符 -//
- 它允许您编写:啊好吧,
open do local $/
file slurp 习惯用法很方便将小文件放入内存中。希望你觉得它有点启发。 :-)Most things have already been pointed out, especially that readability matters. I wouldn't try to reduce the program more than what follows.
The only "one-liner" thing I added is the
push map grep m//g
in the while loop. Note that Perl 5.10 adds the "defined or" operator -//
- which allows you to write:Ah okay, the
open do local $/
file slurp idiom is handy for slurping small files into memory. Hope you find it a bit inspiring. :-)如果将蛋白质数据写入另一个文件,请以空格分隔且不换行。因此,您可以通过读取一次文件来导入数据。
您可以删除代码行“tr/a,c,g,t/A,C,G,T/”,因为匹配运算符具有不区分大小写的选项(< b>i 选项)。原始的 foreach 循环可以像上面的代码一样进行优化。 $1 这里的变量是匹配操作括号内的匹配模式结果 /(\w{3})/gi
If write proteins data to another file, space delimited and without line break. So, you can import data by reading file once time.
You can remove line of code "tr/a,c,g,t/A,C,G,T/" because match operator has option for case insensitive (i option). And original foreach loop can be optimized like code above. $1 variable here is matched pattern result inside parentheses of match operation /(\w{3})/gi