以编程方式获取blastn数据库

发布于 2024-09-19 11:14:27 字数 280 浏览 7 评论 0原文

核苷酸BLAST搜索页面

有没有办法以编程方式获取“选择搜索集”框中列出的数据库? 也许是 XML 格式? (使用的编程语言并不重要)

In the Nucleotide BLAST search page

is there a way to obtain programmatically the databases listed in the "Choose Search Set" box?
Maybe in XML format? (it doesn't matter the programming language used)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

静赏你的温柔 2024-09-26 11:14:32

你的问题我不清楚。然而,这里是一个使用 BioPerl 的程序,它将从任何数据库(在“db”下指定)和任何搜索词(在“term”下指定)获取信息。然后,这将保存一个文件,其中包含与给定数据库中的搜索词相关的所有 NCBI 序列。

########## http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook #########

#!/usr/bin/perl -w

BEGIN {push @INC,"path to BioPerl";}
use Bio::DB::EUtilities;
my $factory = Bio::DB::EUtilities->new(-eutil      => 'esearch',
                                       -email      => '[email protected]',
                                       -db         => 'nucleotide',
                                       -term       => 'search terms here',
                                       -usehistory => 'y');

my $count = $factory->get_count;
# get history from queue
my $hist  = $factory->next_History || die 'No history data returned';
print "History returned\n";
# note db carries over from above
$factory->set_parameters(-eutil   => 'efetch',
                         -rettype => 'fasta',
                         -history => $hist);

my $retry = 0;
my ($retmax, $retstart) = (500,0);

open (my $out, '>', 'db:protein_term-VP1_AND_Parvovirus-NOT_dependovirus,patent,partial.fa') || die "Can't open file:$!";

RETRIEVE_SEQS:
while ($retstart < $count) {
    $factory->set_parameters(-retmax   => $retmax,
                             -retstart => $retstart);
    eval{
        $factory->get_Response(-cb => sub {my ($data) = @_; print $out $data} );
    };
    if ($@) {
        die "Server error: $@.  Try again later" if $retry == 5;
        print STDERR "Server error, redo #$retry\n";
        $retry++ && redo RETRIEVE_SEQS;
    }
    #say "Retrieved $retstart";
    $retstart += $retmax;
}

close $out;

Your question is unclear to me. However, here is a program using BioPerl that will obtain info from any database (specified under 'db') and any search term (specified under 'term'). This will then save a file with all of NCBIs sequences related to your search term in the given database.

########## http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook #########

#!/usr/bin/perl -w

BEGIN {push @INC,"path to BioPerl";}
use Bio::DB::EUtilities;
my $factory = Bio::DB::EUtilities->new(-eutil      => 'esearch',
                                       -email      => '[email protected]',
                                       -db         => 'nucleotide',
                                       -term       => 'search terms here',
                                       -usehistory => 'y');

my $count = $factory->get_count;
# get history from queue
my $hist  = $factory->next_History || die 'No history data returned';
print "History returned\n";
# note db carries over from above
$factory->set_parameters(-eutil   => 'efetch',
                         -rettype => 'fasta',
                         -history => $hist);

my $retry = 0;
my ($retmax, $retstart) = (500,0);

open (my $out, '>', 'db:protein_term-VP1_AND_Parvovirus-NOT_dependovirus,patent,partial.fa') || die "Can't open file:$!";

RETRIEVE_SEQS:
while ($retstart < $count) {
    $factory->set_parameters(-retmax   => $retmax,
                             -retstart => $retstart);
    eval{
        $factory->get_Response(-cb => sub {my ($data) = @_; print $out $data} );
    };
    if ($@) {
        die "Server error: $@.  Try again later" if $retry == 5;
        print STDERR "Server error, redo #$retry\n";
        $retry++ && redo RETRIEVE_SEQS;
    }
    #say "Retrieved $retstart";
    $retstart += $retmax;
}

close $out;
未央 2024-09-26 11:14:31

我认为您无法通过 NCBI Web 服务获取此信息。

使用 XSLT:

<?xml version='1.0'  encoding="ISO-8859-1" ?>
<xsl:stylesheet
    xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
    version='1.0'
    >

<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="//select[@id='DATABASE']"/>
</xsl:template>


<xsl:template match="select[@id='DATABASE']">
<xsl:for-each select=".//option">
<xsl:value-of select="@value"/>
<xsl:text>  </xsl:text>
<xsl:value-of select="."/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

和 xsltproc:

xsltproc --html stylesheet.xsl "http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome" 2> /dev/null

返回;

dbindex/9606/ref_contig dbindex/9606/alt_contig_HuRef dbindex/9606/rna  Human genomic plus transcript (Human G+T)
dbindex/10090/alt_contig dbindex/10090/ref_contig dbindex/10090/rna     Mouse genomic plus transcript (Mouse G+T)
nr      Nucleotide collection (nr/nt)
refseq_rna      Reference mRNA sequences (refseq_rna)
refseq_genomic  Reference genomic sequences (refseq_genomic)
chromosome      NCBI Genomes (chromosome)
est     Expressed sequence tags (est)
est_others      Non-human, non-mouse ESTs (est_others)
gss     Genomic survey sequences (gss)
htgs    High throughput genomic sequences (HTGS)
pat     Patent sequences(pat)
pdb     Protein Data Bank (pdb)
alu     Human ALU repeat elements (alu_repeats)
dbsts   Sequence tagged sites (dbsts)
wgs     Whole-genome shotgun reads (wgs)
env_nt  Environmental samples (env_nt)

I don't think you can get this information threw the NCBI Web services.

Using XSLT:

<?xml version='1.0'  encoding="ISO-8859-1" ?>
<xsl:stylesheet
    xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
    version='1.0'
    >

<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="//select[@id='DATABASE']"/>
</xsl:template>


<xsl:template match="select[@id='DATABASE']">
<xsl:for-each select=".//option">
<xsl:value-of select="@value"/>
<xsl:text>  </xsl:text>
<xsl:value-of select="."/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

and xsltproc:

xsltproc --html stylesheet.xsl "http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome" 2> /dev/null

returns;

dbindex/9606/ref_contig dbindex/9606/alt_contig_HuRef dbindex/9606/rna  Human genomic plus transcript (Human G+T)
dbindex/10090/alt_contig dbindex/10090/ref_contig dbindex/10090/rna     Mouse genomic plus transcript (Mouse G+T)
nr      Nucleotide collection (nr/nt)
refseq_rna      Reference mRNA sequences (refseq_rna)
refseq_genomic  Reference genomic sequences (refseq_genomic)
chromosome      NCBI Genomes (chromosome)
est     Expressed sequence tags (est)
est_others      Non-human, non-mouse ESTs (est_others)
gss     Genomic survey sequences (gss)
htgs    High throughput genomic sequences (HTGS)
pat     Patent sequences(pat)
pdb     Protein Data Bank (pdb)
alu     Human ALU repeat elements (alu_repeats)
dbsts   Sequence tagged sites (dbsts)
wgs     Whole-genome shotgun reads (wgs)
env_nt  Environmental samples (env_nt)
霞映澄塘 2024-09-26 11:14:31

我不完全是你打算用这个的目的,但 NCBI 使用的完整数据库集位于他们的 FTP 站点:ftp://ftp.ncbi.nih.gov/blast/db/
如果您只对数据库名称感兴趣,只需查看第一个 之前的位即可。 ——大多数数据库都足够大,可以进行分段。
为了进行大量的过滤(例如通过生物体),他们使用别名文件来通过 GI 编号限制这些较大数据库中的一个或多个。

I'm not entirely what you intend to use this for, but the complete set of databases used by NCBI are at their FTP site: ftp://ftp.ncbi.nih.gov/blast/db/
If you're only interested in the database names, just look at the bit before the first . -- most of the databases are large enough to be segmented.
In order to do a good chunk of the filtering (e.g by organism), they use alias files that restrict one or more of these larger databases by GI number.

陌伤ぢ 2024-09-26 11:14:31

需要一些 FTP API 才能以编程方式获取这些库。然而,即使压缩后,这些文件也相当大。也许您至少应该检查下载网站上的可用版本是否与您已下载的缓存版本不同。 Java FTP 库的审查位于 http://www.javaworld .com/javaworld/jw-04-2003/jw-0404-ftp.html

Some FTP API is required to get these libraries programatically. However these files are rather large, even when compressed. Probably you should at least check if the version available on the download site is not the same as your already downloaded, cached version. Java FTP libraries are reviewed at http://www.javaworld.com/javaworld/jw-04-2003/jw-0404-ftp.html.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文