生成具有取代率的合成 DNA 序列
给定这些输入:
my $init_seq = "AAAAAAAAAA" #length 10 bp
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );
我想生成:
一千个长度为 10 的标签
标签中每个位置的替换率为 0.003
产生如下输出:
AAAAAAAAAA
AATAACAAAA
.....
AAGGAAAAGA # 1000th tags
是否有一种紧凑的方法可以在 Perl 中做到这一点?
我坚持这个脚本的核心逻辑:
#!/usr/bin/perl
my $init_seq = "AAAAAAAAAA" #length 10 bp
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );
$i = 0;
while ($i < length($init_seq)) {
$roll = int(rand 4) + 1; # $roll is now an integer between 1 and 4
if ($roll == 1) {$base = A;}
elsif ($roll == 2) {$base = T;}
elsif ($roll == 3) {$base = C;}
elsif ($roll == 4) {$base = G;};
print $base;
}
continue {
$i++;
}
Given these inputs:
my $init_seq = "AAAAAAAAAA" #length 10 bp
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );
I want to generate:
One thousand length-10 tags
Substitution rate for each position in a tag is 0.003
Yielding output like:
AAAAAAAAAA
AATAACAAAA
.....
AAGGAAAAGA # 1000th tags
Is there a compact way to do it in Perl?
I am stuck with the logic of this script as core:
#!/usr/bin/perl
my $init_seq = "AAAAAAAAAA" #length 10 bp
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );
$i = 0;
while ($i < length($init_seq)) {
$roll = int(rand 4) + 1; # $roll is now an integer between 1 and 4
if ($roll == 1) {$base = A;}
elsif ($roll == 2) {$base = T;}
elsif ($roll == 3) {$base = C;}
elsif ($roll == 4) {$base = G;};
print $base;
}
continue {
$i++;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
作为一个小的优化,将:替换
为
As a small optimisation, replace:
with
编辑:假设替换率在 0.001 到 1.000 范围内:
以及
$roll
,生成 [1..1000] 范围内的另一个(伪)随机数,如果它小于或等于到 (1000 * $sub_rate) 然后执行替换,否则不执行任何操作(即输出“A”)。请注意,除非随机数生成器的属性已知,否则您可能会引入微妙的偏差。
EDIT: Assuming substitution rate is in the range 0.001 to 1.000:
As well as
$roll
, generate another (pseudo)random number in the range [1..1000], if it is less than or equal to (1000 * $sub_rate) then perform the substitution, otherwise do nothing (i.e. output 'A').Be aware that you may introduce subtle bias unless the properties of your random number generator are known.
不完全是你要找的,但我建议你看看 BioPerl 的 Bio::SeqEvolution::DNAPoint 模块。 但它不将突变率作为参数。 相反,它询问与您想要的原始序列同一性的下限是多少。
所有 1000 个突变序列将存储在 @mutated 数组中,它们的序列可以通过 seq 方法访问。
Not exactly what you are looking for, but I suggest you take a look at BioPerl's Bio::SeqEvolution::DNAPoint module. It does not take mutation rate as a parameter though. Rather, it asks what the lower bound of sequence identity with the original you want.
All 1000 mutated sequences will be stored in the @mutated array, their sequences can be accessed via the
seq
method.如果发生替换,您希望从可能性中排除当前碱基:
另请参阅Mitch Wheat 对于如何实现替代率的回答。
In the event of a substitution, you want to exclude the current base from the possibilities:
Also please see Mitch Wheat's answer for how to implement the substitution rate.
我不知道我是否理解正确,但我会做这样的事情(伪代码):
I don't know if I understand correctly but I'd do something like this (pseudocode):