返回由RNA序列编码的所有氨基酸序列的列表

发布于 2025-01-17 15:23:59 字数 1562 浏览 1 评论 0原文

我有一个特定的 DNA 序列，我需要返回该 DNA 序列编码的所有氨基酸序列的列表。

我还有一本所有密码子及其氨基酸（单字母字符串）的字典。 * 代表终止密码子。

table = {
        'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M', 'ACA':'T', 'ACC':'T',
        'ACG':'T', 'ACT':'T', 'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K', 
        'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R', 'CTA':'L', 'CTC':'L', 
        'CTG':'L', 'CTT':'L', 'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P', 
        'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q', 'CGA':'R', 'CGC':'R', 
        'CGG':'R', 'CGT':'R', 'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V', 
        'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A', 'GAC':'D', 'GAT':'D', 
        'GAA':'E', 'GAG':'E', 'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G', 
        'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S', 'TTC':'F', 'TTT':'F', 
        'TTA':'L', 'TTG':'L', 'TAC':'Y', 'TAT':'Y', 'TAA':'*', 'TAG':'*', 
        'TGC':'C', 'TGT':'C', 'TGA':'*', 'TGG':'W',
        }

因此，我需要获取（返回）字符串列表，其中每个字符串代表 DNA 序列编码的氨基酸序列。

他是我的代码

rna_sequence = rna_sequence.upper()
rna_seq_new = rna_sequence.replace("\n", "")
rna_seq_new = rna_seq_new.strip()

AA_list = []

nb = (len(rna_seq_new)) - 3


for i in range (0, len(rna_seq_new), 3):
    codon = rna_seq_new[i:i+3]
    if len(rna_seq_new):
         return []
    if codon == "AUG":
         new = translate_sequence(rna_seq_new = rna_seq_new[i:], genetic_co>
         AA_list.append(new)

return(AA_list)

此刻我什么也没得到，我也不知道为什么。我得到的是这个 AssertionError: Lists different: [] != ['MTAVRYV', 'MTYV']

原文

I have a specific DNA sequence, and I need to return a list of all amino acids sequences encoded for that DNA sequence.

I also have a dictionary of all codons and their amino acids (single-letter strings). The * represents stop codons.

table = {
        'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M', 'ACA':'T', 'ACC':'T',
        'ACG':'T', 'ACT':'T', 'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K', 
        'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R', 'CTA':'L', 'CTC':'L', 
        'CTG':'L', 'CTT':'L', 'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P', 
        'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q', 'CGA':'R', 'CGC':'R', 
        'CGG':'R', 'CGT':'R', 'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V', 
        'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A', 'GAC':'D', 'GAT':'D', 
        'GAA':'E', 'GAG':'E', 'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G', 
        'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S', 'TTC':'F', 'TTT':'F', 
        'TTA':'L', 'TTG':'L', 'TAC':'Y', 'TAT':'Y', 'TAA':'*', 'TAG':'*', 
        'TGC':'C', 'TGT':'C', 'TGA':'*', 'TGG':'W',
        }

So, I need to get (RETURN) the list of strings, where each string represents a sequence of amino acids encoded by the DNA sequence.

He is my code

rna_sequence = rna_sequence.upper()
rna_seq_new = rna_sequence.replace("\n", "")
rna_seq_new = rna_seq_new.strip()

AA_list = []

nb = (len(rna_seq_new)) - 3


for i in range (0, len(rna_seq_new), 3):
    codon = rna_seq_new[i:i+3]
    if len(rna_seq_new):
         return []
    if codon == "AUG":
         new = translate_sequence(rna_seq_new = rna_seq_new[i:], genetic_co>
         AA_list.append(new)

return(AA_list)

By the moment I do not get anything and I do not know why. What I get is this AssertionError: Lists differ: [] != ['MTAVRYV', 'MTYV']

分享到QQ

分享到微博