将 GenBank 平面文件转换为 FASTA

发布于 2024-11-15 09:41:10 字数 127 浏览 5 评论 0原文

我需要解析一个初步的 GenBank 平面文件。该序列尚未发布,因此我无法通过加入查找它并下载 FASTA 文件。我是生物信息学的新手,所以有人可以告诉我在哪里可以找到 BioPerl 或 BioPython 脚本来自己完成此操作吗?谢谢!

I need to parse a preliminary GenBank Flatfile. The sequence hasn't been published yet, so I can't look it up by accession and download a FASTA file. I'm new to Bioinformatics, so could someone show me where I could find a BioPerl or BioPython script to do this myself? Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

☆獨立☆ 2024-11-22 09:41:10

您需要 Bio::SeqIO 模块来读取或写出生物信息学数据。 SeqIO HOWTO 应该告诉您需要了解的所有内容,但是 这里有一个用 Perl 编写的读取 GenBank 文件的小脚本,可以帮助您入门!

You need the Bio::SeqIO module to read or write out bioinformatics data. The SeqIO HOWTO should tell you everything you need to know, but here's a small read-a-GenBank-file script in Perl to get you started!

梦里°也失望 2024-11-22 09:41:10

我在这里为您提供了 Biopython 解决方案。我首先假设您的基因库文件与基因组序列相关,然后假设它是基因序列,我将提供不同的解决方案。事实上,如果知道您正在处理的是其中的哪一个,将会很有帮助。

基因组序列解析:

从文件中解析您的自定义genbank平面文件:

from Bio import SeqIO
record = SeqIO.read("yourGenbankFileDirectory/yourGenbankFile.gb","genbank")

如果您只想要原始序列,那么:

rawSequence = record.seq.tostring()

现在也许您需要这个序列的名称,为序列提供“>标题”在制作.fasta之前。让我们看看 genbank .gb 文件中包含哪些名称:

nameSequence = record.features[0].qualifiers

这应该返回一个字典,其中包含该 genbank 文件作者注释的整个序列的各种同义词

基因序列解析:

在您的自定义 genbank 平面文件中解析文件方式:

from Bio import SeqIO
record = SeqIO.read("yourGenbankFileDirectory/yourGenbankFile.gb","genbank")

获取基因的原始序列列表/所有基因的列表然后:

rawSequenceList = [gene.extract(record.seq.tostring()) for gene in record.features]

获取每个基因序列的名称列表(更准确地说是每个基因的同义词字典)

nameSequenceList = [gene.qualifiers for gene in record.features]

I have the Biopython solution for you here. I will firstly assume your genbank file relates to a genome sequence, then I will provide a different solution assuming it was instead a gene sequence. Indeed it would have been helpful to have known which of these you are dealing with.

Genome Sequence Parsing:

Parse in your custom genbank flatfile from file by:

from Bio import SeqIO
record = SeqIO.read("yourGenbankFileDirectory/yourGenbankFile.gb","genbank")

If you just want the raw sequence then:

rawSequence = record.seq.tostring()

Now perhaps you need a name for this sequence, to give the sequence a ">header" before making the .fasta. Let's see what names came with the genbank .gb file:

nameSequence = record.features[0].qualifiers

This should return a dictionary with various synonyms of that whole sequence as annotated by author of that genbank file

Gene Sequence Parsing:

Parse in your custom genbank flatfile from file by:

from Bio import SeqIO
record = SeqIO.read("yourGenbankFileDirectory/yourGenbankFile.gb","genbank")

To get a list of raw sequences for the gene/list of all genes then:

rawSequenceList = [gene.extract(record.seq.tostring()) for gene in record.features]

To get a list of names for each gene sequence (more precisely a dictionary of synonyms for each gene)

nameSequenceList = [gene.qualifiers for gene in record.features]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文