BioPython:从 Blast 输出文件中提取序列 ID
我有一个 XML 格式的 BLAST 输出文件。它有 22 个查询序列,每个序列报告 50 个命中。我想提取所有 50x22 的点击。这是我当前拥有的代码,但它只从第一个查询中提取 50 个命中。
from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()
save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
save_file.write('>%s\n' % (alignment.title,))
save_file.close()
有人对提取所有点击有什么建议吗?我想我必须使用比对齐以外的东西。 希望这一点是清楚的。谢谢!
乔恩
I have a BLAST output file in XML format. It is 22 query sequences with 50 hits reported from each sequence. And I want to extract all the 50x22 hits. This is the code I currently have, but it only extracts the 50 hits from the first query.
from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()
save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
save_file.write('>%s\n' % (alignment.title,))
save_file.close()
Somebody have any suggestions as to extract all the hits? I guess I have to use something else than alignments.
Hope this was clear. Thanks!
Jon
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这应该得到所有记录。与原始版本相比,新颖之处在于
,它是一个Python习惯用法,用于迭代“类似列表”对象中的项目,例如blast_records(检查CBIXML 模块文档 显示 parse() 确实返回一个迭代器)
This should get all records. The novelty compared with the original is the
which is a python idiom to iterate through items in a "list-like" object, such as the blast_records (checking the CBIXML module documentation showed that parse() indeed returns an iterator)
我使用此代码来提取所有结果
,或者为了获取更少的细节,
我使用了此网站
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/rpsblast/
I used this code for extract all the results
or for less details
I used this site
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/rpsblast/