字符串匹配搜索
一个像这样的文本文件作为查询文件:
fooLONGcite
GetmoreDATA
stringMATCH
GOODthing
另一个像这样的文本文件作为主题文件:
sometingfooLONGcite
anyotherfooLONGcite
matchGetmoreDATA
GETGOODthing
brotherGETDATA
CITEMORETHING
TOOLONGSTUFFETC
预期的结果将从主题文件中获取匹配的字符串,然后将其打印出来。所以,输出应该是:
sometingfooLONGcite
anyotherfooLONGcite
matchGetmoreDATA
GETGOODthing
这是我的 perl 脚本。但这不起作用。你能帮我看看问题出在哪里吗?谢谢。
#!/usr/bin/perl
use strict;
# to check the command line option
if($#ARGV<0){
printf("Usage: \n <tag> <seq> <outfile>\n");
exit 1;
}
# to open the given infile file
open(tag, $ARGV[0]) or die "Cannot open the file $ARGV[0]";
open(seq, $ARGV[1]) or die "Cannot open the file $ARGV[1]";
my %seqhash = ();
my $tag_id;
my $tag_seq;
my $seq_id;
my $seq_seq;
my $seq;
my $i = 0;
print "Processing cds seq\n";
#check the seq file
while(<seq>){
my @line = split;
if($i != 0){
$seqhash{$seq_seq} = $seq;
$seq = "";
print "$seq_seq\n";
}
$seq_seq = $line[0];
$i++;
}
while(<tag>){
my @tagline = split;
$tag_seq = $tagline[0];
$seq = $seqhash{$seq_seq};
#print "$tag_seq\n";
print "$seq\n";
#print output ">$id\n$seq\n";
}
#print "Ending of Processing gff\n";
close(tag);
close(seq);
one text file like this as query file:
fooLONGcite
GetmoreDATA
stringMATCH
GOODthing
another text file like this as subject file:
sometingfooLONGcite
anyotherfooLONGcite
matchGetmoreDATA
GETGOODthing
brotherGETDATA
CITEMORETHING
TOOLONGSTUFFETC
The expected result will be get the matched string from subject file and then print it out. So, the output should be:
sometingfooLONGcite
anyotherfooLONGcite
matchGetmoreDATA
GETGOODthing
Here is my perl script. But It doesn't work. Can you help me find where is the problem? Thanks.
#!/usr/bin/perl
use strict;
# to check the command line option
if($#ARGV<0){
printf("Usage: \n <tag> <seq> <outfile>\n");
exit 1;
}
# to open the given infile file
open(tag, $ARGV[0]) or die "Cannot open the file $ARGV[0]";
open(seq, $ARGV[1]) or die "Cannot open the file $ARGV[1]";
my %seqhash = ();
my $tag_id;
my $tag_seq;
my $seq_id;
my $seq_seq;
my $seq;
my $i = 0;
print "Processing cds seq\n";
#check the seq file
while(<seq>){
my @line = split;
if($i != 0){
$seqhash{$seq_seq} = $seq;
$seq = "";
print "$seq_seq\n";
}
$seq_seq = $line[0];
$i++;
}
while(<tag>){
my @tagline = split;
$tag_seq = $tagline[0];
$seq = $seqhash{$seq_seq};
#print "$tag_seq\n";
print "$seq\n";
#print output ">$id\n$seq\n";
}
#print "Ending of Processing gff\n";
close(tag);
close(seq);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
据我了解,您寻找的是字符串的一部分的匹配,而不是精确的匹配。这是一个我认为您正在寻找的脚本:
script.pl
的内容。我考虑到查询文件很小,因为我将其所有内容添加到正则表达式中:运行脚本:
结果:
As I understand, you look for a match of part of the string, not an exact one. Here a script that does what I think you are looking for:
Content of
script.pl
. I take into account that file of queries is small because I add all its content to a regex:Run the script:
And result:
您当前的代码没有多大意义;您甚至引用了未分配任何内容的变量。
您需要做的就是将第一个文件读入散列,然后根据该散列检查第二个文件的每一行。
Your current code doesn't make a lot of sense; you're even referencing variables you don't assign anything to.
All you need to do is read the first file into a hash, then check each line of the second against that hash.