更稳定/有效的笛卡尔产品功能，以确定所有长度的组合n

发布于 2025-01-20 08:06:40 字数 911 浏览 2 评论 0原文

我正在研究 DNA 数据存储的纠错方法。它的工作原理是识别错误，然后快速将所有可能的碱基组合替换到错误位置，直到数据解码。我的代码正确识别错误并使用 itertools.product 循环替换它们，其中一个迭代重复直到长度为 n，n 是错误数。

for error in itertools.product(' ATGC', repeat=len(errorPos)):
    pos = 0
    for base in error:
        if base == " ":
            fseqL[errorPos[pos]] = ""
        else:
            fseqL[errorPos[pos]] = base
        pos = pos + 1
    ffseq = ""
    ffseq = ffseq.join(fseqL)
    try:
        Decoder.dna2binary(ffseq)
    except:
        pLeft = pLeft - 1 
        print("Incorrect decode: %s possibilities left"%(pLeft))
    else:
        print("Data decoded")
        now = datetime.now()
        start = open('data\\log.txt', 'r+')
        end = start.read()
        start.write("\nend: "+str(now))
        start.close()
        break

该程序在错误量较小的情况下运行良好，但随着错误的增加，处理时间呈指数增长。速度不是问题，但在大约 10-15 个错误之后，它会变得不稳定并锁定。有没有更好的方法来找到这个组合（NumPy函数，错误的方法等）？

原文

I am working on an error correction method for DNA data storage. It works by identifying errors and then rapidly substituting all possible combinations of bases into error positions until the data decodes. My code correctly identifies errors and substitutes them using an itertools.product loop with one iterable repeated until n in length, n being the number of errors.

for error in itertools.product(' ATGC', repeat=len(errorPos)):
    pos = 0
    for base in error:
        if base == " ":
            fseqL[errorPos[pos]] = ""
        else:
            fseqL[errorPos[pos]] = base
        pos = pos + 1
    ffseq = ""
    ffseq = ffseq.join(fseqL)
    try:
        Decoder.dna2binary(ffseq)
    except:
        pLeft = pLeft - 1 
        print("Incorrect decode: %s possibilities left"%(pLeft))
    else:
        print("Data decoded")
        now = datetime.now()
        start = open('data\\log.txt', 'r+')
        end = start.read()
        start.write("\nend: "+str(now))
        start.close()
        break

This program works well with small amounts of error, but it exponentially increases in processing time as error increases. The speed isn't a problem, but after about 10-15 errors it becomes unstable and locks up. Is there a better way to find this combination (NumPy function, wrong method, etc.)?

分享到QQ

分享到微博