获取 DNA 子串的原始顺序

发布于 2024-12-18 08:32:59 字数 557 浏览 2 评论 0原文

我想获取长DNA序列的子串

例如,给定:

1/ATXGAAATTXXGGAAGGGGTGG
2/AATXGAAGGAAGGAAGGGGATATTX
3/AAAAAATTXXGGAAGGGGXTTTA
4/AAAATTXXATAXXGGAAGGGGXTXG
5/ATTATTGTTXAXTATTT

输出为:

1/TXG    -  TTXX
2/TXG     -
3/       -  TTXX
4/TTXX  -   TXG
5/             -    

我尝试了以下正则表达式模式:

(TXG|TTXX) 

它有效,结果放入列表中,但我不知道如何检索原始序列中出现的每个结果的顺序。那是, TTXXTXG 是否分别出现在序列 4 中的第一个和第二个,以及序列 1 中的第二个和第一个;在第二个和第三个结果中,这更困难,因为 match-xx 函数调用不提供从相关序列中获取的子字符串的索引。感谢您的见解。

I'd like to get the substrings of long DNA sequences

For example, given:

1/ATXGAAATTXXGGAAGGGGTGG
2/AATXGAAGGAAGGAAGGGGATATTX
3/AAAAAATTXXGGAAGGGGXTTTA
4/AAAATTXXATAXXGGAAGGGGXTXG
5/ATTATTGTTXAXTATTT

the output is to be:

1/TXG    -  TTXX
2/TXG     -
3/       -  TTXX
4/TTXX  -   TXG
5/             -    

I tried the following regex pattern:

(TXG|TTXX) 

and it works, and the results are put in a list but I don't know how to retrieve the order of each result that has appeared in the original sequences. That is,
whether TTXX and TXG appear first and second respectively as in sequence 4 but second and first as in sequence 1; and in 2nd and 3rd results, that is harder because match-xx function call doesn't offer an index of the substring which it took from the sequence in question. Thank you for your insights.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

萌逼全场 2024-12-25 08:32:59

怎么样:

#!/usr/bin/perl 
use strict;
use warnings;
use Data::Dump qw(dump);

my %res;
while(my $line = <DATA>) {
    chomp$line;
    while($line =~ /TXG|TTXX/g) {
        push @{$res{$line}}, "found 
amp; at pos:".(pos($line)-length(
amp;));
    }
}
dump%res;

__DATA__
ATXGAAATTXXGGAAGGGGTGG
AATXGAAGGAAGGAAGGGGATATTX
AAAAAATTXXGGAAGGGGXTTTA
AAAATTXXATAXXGGAAGGGGXTXG
ATTATTGTTXXXTATTT

输出:

(
  "ATTATTGTTXXXTATTT",
  ["found TTXX at pos:7"],
  "AATXGAAGGAAGGAAGGGGATATTX",
  ["found TXG at pos:2"],
  "AAAAAATTXXGGAAGGGGXTTTA",
  ["found TTXX at pos:6"],
  "AAAATTXXATAXXGGAAGGGGXTXG",
  ["found TTXX at pos:4", "found TXG at pos:22"],
  "ATXGAAATTXXGGAAGGGGTGG",
  ["found TXG at pos:1", "found TTXX at pos:7"],
)

How about:

#!/usr/bin/perl 
use strict;
use warnings;
use Data::Dump qw(dump);

my %res;
while(my $line = <DATA>) {
    chomp$line;
    while($line =~ /TXG|TTXX/g) {
        push @{$res{$line}}, "found 
amp; at pos:".(pos($line)-length(
amp;));
    }
}
dump%res;

__DATA__
ATXGAAATTXXGGAAGGGGTGG
AATXGAAGGAAGGAAGGGGATATTX
AAAAAATTXXGGAAGGGGXTTTA
AAAATTXXATAXXGGAAGGGGXTXG
ATTATTGTTXXXTATTT

output:

(
  "ATTATTGTTXXXTATTT",
  ["found TTXX at pos:7"],
  "AATXGAAGGAAGGAAGGGGATATTX",
  ["found TXG at pos:2"],
  "AAAAAATTXXGGAAGGGGXTTTA",
  ["found TTXX at pos:6"],
  "AAAATTXXATAXXGGAAGGGGXTXG",
  ["found TTXX at pos:4", "found TXG at pos:22"],
  "ATXGAAATTXXGGAAGGGGTGG",
  ["found TXG at pos:1", "found TTXX at pos:7"],
)
乙白 2024-12-25 08:32:59

如果你放置 2 个匹配函数怎么办?

my $result="";

$result.="TXG" if(/TXG/);
$result.="TTXX" if (/TTXX/);

print $result;

What if you put 2 matching functions?

my $result="";

$result.="TXG" if(/TXG/);
$result.="TTXX" if (/TTXX/);

print $result;
拥抱我好吗 2024-12-25 08:32:59
perl -ne'($a)=/(TXG)/gc;($b)=/\G.*(TTXX)/;($a,$b)=($1,$a)if$a and not$b and/(TTXX)/;m{^(\d/)};printf"$1%5s -%5s\n",$a,$b'
perl -ne'($a)=/(TXG)/gc;($b)=/\G.*(TTXX)/;($a,$b)=($1,$a)if$a and not$b and/(TTXX)/;m{^(\d/)};printf"$1%5s -%5s\n",$a,$b'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文