自定义 BLAST 的输出?
我知道这是一个与 BLAST 和生物信息学相关的非常具体的问题,但这里是:
我正在尝试使用独立的 BLAST(我已经下载了它并在命令行上测试了它运行)来执行 DNA 序列比对 (blastn)。我需要能够提供我自己的查询文件(fasta 格式)和我自己的数据库文件(也是 fasta 格式)。
关键是我想让程序只输出2个字段,而不是通常输出的详细报告。我只想要输出对齐的最高分数和e值。我的想法是,一旦我完成了这项工作,我就可以将其包装在我自己的控制程序中,并使用不同的查询序列自动运行它多次,并记录分数和 e 值。
我知道这是一个渺茫的机会,但有人知道我该如何做到这一点吗?对我来说,两个障碍是使用我自己的数据库文件和自定义输出。
I know this is a very specific question relating to BLAST and Bioinformatics but here goes:
I am attempting to use standalone BLAST (I already have downloaded it and tested it running on the command line) to perform a DNA sequence alignment (blastn). I need to be able to provide both my own query file (fasta format) and my own database file (also fasta format).
The key is that I want to have the program only output 2 fields rather than the detailed reports that it usually outputs. I only want the highest score and the e-value for the alignment to be output. The idea is that once I have this working, I can wrap this in my own control program and automatically run it many times with different query sequences and log the scores and e-values.
I know this is a long shot, but does anybody have an idea on how I can go about doing this? The two hurdles for me are using my own database file and customizing the output.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
事实上它很简单:
blastall
有几个命令行选项可以帮助您:so您将运行如下所示的命令:
但是,表输出有几列。我不记得列的顺序,但您可以使用
剪切
工具仅选择您感兴趣的列。 中仅选择第1、7和8列例如,以下命令将从blastoutput yannick
in fact it's simple:
blastall
has several command line option that will help you:so you'll be running something like this:
The table output has several columns however. I don't recall the order of columns, but you can use the
cut
tool to select only your columns of interest. For example the following command would select only columns 1, 7 and 8 from the blastoutputyannick
Yannick 的答案涵盖了如何从
blastall
获取您需要的特定输出 - 您关心的第二件事是使用您自己的数据库文件。独立的 BLAST 也提供了您所需的工具。除了
blastall
之外,您还应该有一个名为formatdb
的程序的副本,您可以将其与 fasta 序列数据库一起提供,它会为 BLAST 正确格式化它。对于核苷酸数据库,运行以下命令:formatdb -i input_database.fa -p F
这将在您的工作目录中生成许多文件 (
input_database.fa.nhr
,input_database.fa.nin
、input_database.fa.nsq
),您可以通过使用数据库的原始名称(即,忽略.n*
后缀)。HTH
PS
formatdb -h
将为您提供formatdb
选项的完整列表Yannick's answer covers how to get the specific output you need from
blastall
- the second thing you're concerned about is using your own database file. Standalone BLAST provides the tools you need for this too.Along with
blastall
, you should also have a copy of a program calledformatdb
, you can provide this with your fasta sequence database, and it will format it correctly for BLAST. For a nucleotide database, run the following:formatdb -i input_database.fa -p F
This will produce a number of files in your working directory (
input_database.fa.nhr
,input_database.fa.nin
,input_database.fa.nsq
) which you can use in yourblastall
command by using the original name of your database (ie, miss off the.n*
suffix).HTH
PS
formatdb -h
will give you a full list of options forformatdb