查找两个字符串中的碱基重叠计数和内部间隙
我有两个长度相等的字符串,我需要对其进行比较。 我想找到重叠基数(.) 和内部间隙(*)。下面是示例:
------ACTAAAAATACAAAAA--TTAGCCAGGCGTGGTGGCAC
-----TACTAAAAATACAAAAAAATTAGCCAGGTGTGGTGG---
................**.................
重叠数 = 33。 内部间隙的数量= 2。
我可以毫无问题地找到重叠的数量。但我有问题 寻找内部差距。以下是我当前的代码。速度慢得可怕。 原则上我需要计算数百万个这样的对。
#!/usr/bin/perl -w
my $s1 = "------ACTAAAAATACAAAAA--TTAGCCAGGCGTGGTGGCAC";
my $s2 = "-----TACTAAAAATACAAAAAAATTAGCCAGGTGTGGTGG---";
print "$s1\n";
print "$s2\n";
my %base = ("A" => 1, "T" => 1, "C" => 1, "G" => 1);
my $ovlp_basecount = 0;
my $internal_gap = 0;
foreach my $si ( 0 .. length($s1) ) {
my $base1 = substr($s1,$si,1);
my $base2 = substr($s2,$si,1);
# Overlap
if ( $base{$base1} && $base{$base2} ) {
$ovlp_basecount++;
}
# Not sure how to compute internal gap
}
print "TOTAL OVERLAP BASE = $ovlp_basecount\n";
print "TOTAL Internal Gap \?\n";
请建议我如何有效地找到内部差距和重叠。
I have this two strings of equal length, which I need to compare.
I want to find overlap base(.) and internal gap (*). Below is the example:
------ACTAAAAATACAAAAA--TTAGCCAGGCGTGGTGGCAC
-----TACTAAAAATACAAAAAAATTAGCCAGGTGTGGTGG---
................**.................
Number of overlap = 33.
Number of internal gap = 2.
I have no problem finding the number of overlap. But I have problem
finding internal gap. Below is the current code I have. It is horribly slow.
In principle I need to compute millions of such pairs.
#!/usr/bin/perl -w
my $s1 = "------ACTAAAAATACAAAAA--TTAGCCAGGCGTGGTGGCAC";
my $s2 = "-----TACTAAAAATACAAAAAAATTAGCCAGGTGTGGTGG---";
print "$s1\n";
print "$s2\n";
my %base = ("A" => 1, "T" => 1, "C" => 1, "G" => 1);
my $ovlp_basecount = 0;
my $internal_gap = 0;
foreach my $si ( 0 .. length($s1) ) {
my $base1 = substr($s1,$si,1);
my $base2 = substr($s2,$si,1);
# Overlap
if ( $base{$base1} && $base{$base2} ) {
$ovlp_basecount++;
}
# Not sure how to compute internal gap
}
print "TOTAL OVERLAP BASE = $ovlp_basecount\n";
print "TOTAL Internal Gap \?\n";
Please advice how can I find internal gap and overlap efficiently.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以对字符串使用按位或来查找一个字符串中与另一个字符串中的空白区域重叠的区域。此过程还具有通过将非重叠字符转换为小写来显示重叠的效果,从而使查找重叠也变得非常简单:
打印:
有关字符串按位运算的更多信息:
http://teaching.idallen.com/cst8214/08w/notes/bit_operations.txt
You can use a bitwise OR on the strings to find the the areas in one string that overlap blank areas in the other. This process also has the effect of revealing the overlap by converting non-overlapping characters to lower case, thus making finding the overlap quite simple too:
Prints:
For more information on string bitwise operations:
http://teaching.idallen.com/cst8214/08w/notes/bit_operations.txt
假设间隙永远不会重叠,您可以使用正则表达式来解决这个问题。这是您的
s1
的答案。Assuming the gaps never overlap, you can solve this using regular expressions. Here's an answer for your
s1
.