更稳定/有效的笛卡尔产品功能,以确定所有长度的组合n
我正在研究 DNA 数据存储的纠错方法。它的工作原理是识别错误,然后快速将所有可能的碱基组合替换到错误位置,直到数据解码。我的代码正确识别错误并使用 itertools.product 循环替换它们,其中一个迭代重复直到长度为 n,n 是错误数。
for error in itertools.product(' ATGC', repeat=len(errorPos)):
pos = 0
for base in error:
if base == " ":
fseqL[errorPos[pos]] = ""
else:
fseqL[errorPos[pos]] = base
pos = pos + 1
ffseq = ""
ffseq = ffseq.join(fseqL)
try:
Decoder.dna2binary(ffseq)
except:
pLeft = pLeft - 1
print("Incorrect decode: %s possibilities left"%(pLeft))
else:
print("Data decoded")
now = datetime.now()
start = open('data\\log.txt', 'r+')
end = start.read()
start.write("\nend: "+str(now))
start.close()
break
该程序在错误量较小的情况下运行良好,但随着错误的增加,处理时间呈指数增长。速度不是问题,但在大约 10-15 个错误之后,它会变得不稳定并锁定。有没有更好的方法来找到这个组合(NumPy函数,错误的方法等)?
I am working on an error correction method for DNA data storage. It works by identifying errors and then rapidly substituting all possible combinations of bases into error positions until the data decodes. My code correctly identifies errors and substitutes them using an itertools.product loop with one iterable repeated until n in length, n being the number of errors.
for error in itertools.product(' ATGC', repeat=len(errorPos)):
pos = 0
for base in error:
if base == " ":
fseqL[errorPos[pos]] = ""
else:
fseqL[errorPos[pos]] = base
pos = pos + 1
ffseq = ""
ffseq = ffseq.join(fseqL)
try:
Decoder.dna2binary(ffseq)
except:
pLeft = pLeft - 1
print("Incorrect decode: %s possibilities left"%(pLeft))
else:
print("Data decoded")
now = datetime.now()
start = open('data\\log.txt', 'r+')
end = start.read()
start.write("\nend: "+str(now))
start.close()
break
This program works well with small amounts of error, but it exponentially increases in processing time as error increases. The speed isn't a problem, but after about 10-15 errors it becomes unstable and locks up. Is there a better way to find this combination (NumPy function, wrong method, etc.)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论