定义计算氨基酸相对频率的函数
我正在尝试计算给定 DNA 序列内的密码子频率。
例如:
sequence = 'ATGAAGAAA'
codons = ['ATG', 'AAG', 'AAA']
对于密码子中的 XX:
frequency = codons.count(XX)/(codons.count(XX)+codons.count(XX2)+codons.count(XX3))
请注意,XX2 和 XX3 并不总是在序列中。一些密码子可能有也可能没有多个密码子。
示例:赖氨酸有 2 个密码子,AAA 和 AAG,
因此频率
AAA = codons.count('AAA')/(codons.count('AAA') + codons.count('AAG'))
如何对列表中的每个密码子执行此操作?我如何解释多个密码子?
I'm trying to calculate codon frequency within a given sequence of DNA.
For example:
sequence = 'ATGAAGAAA'
codons = ['ATG', 'AAG', 'AAA']
for XX in codons:
frequency = codons.count(XX)/(codons.count(XX)+codons.count(XX2)+codons.count(XX3))
Note that XX2 and XX3 will not always be in the sequence. Some codons may or may not have multiple codons.
Example: Lysine has 2 codons, AAA and AAG
so the frequency of
AAA = codons.count('AAA')/(codons.count('AAA') + codons.count('AAG'))
How can I do this for EVERY codon in the list? How do I account for multiple codons?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用defaultdict
这适用于氨基酸(蛋白质)。
对于密码子,您应该以 3 个位置的步骤迭代序列以获取 defaultdict 的键。例如:
编辑:如果你想计算简并,你应该准备一个字典,将每个密码子(键)与其简并密码子(值,密码子列表)相关联。为了计算频率,
从 defaultdict 中,您可以获得每个密码子的计数,然后对于每个密码子,您可以计算从上述密码子字典中读取的简并密码子的计数总和。然后就可以计算频率了。
编辑2:这里有一个真实的例子:
use defaultdict
This works for aminoacids (proteins).
For codons, you should iterate on the sequence in 3 positions steps to get the keys of the defaultdict. For example:
EDIT: If you want to calculate degeneration, you should prepare a dictionary relating each codon (key) with its degenerated codons (value, list of codons). To calculate the frecuency,
from the defaultdict you can get the counts for each codon, then for each codon you calculate the sum of the counts of the degenerated codons read from the dictionary of codons indicated above. Then you can calculate the frecuency.
EDIT 2: Here you have a real example:
如果您的序列位于正确的阅读框架中:
其他信息:
我认为您要求找到一种称为密码子使用的东西。
有一些在线工具可以让您查找密码子的使用情况。该功能还允许离线使用。
http://www.bioinformatics.org/sms2/codon_usage.html
和结果(在这个“分数”就是您所要求的):
cusp 是 EMBOSS 的密码子使用工具,也可能值得一看。
您可能想查看 BioPython 来处理生物序列。我相信他们有密码子使用模块。
If your sequence is in the correct reading frame:
other info:
I think you are asking to find something called codon usage.
There are tools online which allow you to find codon usage. This one also allows for offline use.
http://www.bioinformatics.org/sms2/codon_usage.html
and results (in this 'Fraction' is what you are asking for):
cusp is the codon usage tool from EMBOSS which also may be worth taking a look at.
You may want to checkout BioPython for working with biological sequences. I believe they have a codon usage module.
PLY 是一个解析器模块,具有一些很好的调试功能;它非常擅长这样的任务...
运行该代码会产生...
当您开始介绍化学术语时我有点迷失,但您可能可以从这里接管...
PLY is a parser module that has some nice debugging features; it is very good at tasks like this...
Running that code produces...
I'm a little lost when you start introducing the chemical terminology, but you can probably take over from here...
包含所有 64 个密码子的密码子表,甚至包括非简并密码子(它们构成一个元素组)
在迭代期间计算密码子出现的同时计算每个密码子组的出现次数
包含编码氨基酸名称的密码子表->一个良好的显示
代码:
.
编辑
我添加了一个函数outputResults()来显示在文件中记录数据和结果的方式
生成的文件的内容是:
a codon table containing ALL the 64 codons, even the non-degenarated ones (they constitute one element groups)
counting the occurences of each codon's group at the same time that occurences of codons are counted during the iteration
codon table comprising the names of coded amino acids -> a good display
code:
.
EDIT
I've added a function outputResults() to show the manner to record data and results in a file
The resulting file's content is:
我不确定我是否完全理解了这个问题,但我认为你需要将计算分为两个阶段:首先计算每个密码子出现的次数,然后计算出频率。我提出了以下代码:
请注意在最后一个循环中将
total
显式转换为浮点数。如果保留为整数,则在 Python 2.x 上后续除法将是 0 或 1,因此我们需要将其转换以获得浮点输出。我得到的输出是:这是您正在寻找的输出类型吗?
I'm not sure if I've fully understood the question, but I think you need to split the calculations into two stages: first count how many times each codon occurs, and then work out the frequencies. I've come up with the following code:
Note the explicit conversion of
total
to a floating-point number in the last loop. If it is left as an integer, the subsequent division will be either 0 or 1 on Python 2.x, so we need to convert it to get a floating-point output. The output I get is:Is this the sort of output you were looking for?