从字典中的子字符串生成字符串,同时最小化某些彼此相邻的字符
我希望能够从包含子字符串的字典中生成一个字符串,其中我输入一个字符串,其中每个字符对应于字典的键,然后它从与该键相关的值中吐出一个新字符串。不过,我也想尽量减少某些角色彼此相邻的情况。
例如:
dict = {'I': ['ATA', 'ATC', 'ATT'], 'M': ['ATG'], 'T': ['ACA', 'ACC', 'ACG', 'ACT'], 'N':['AAC', 'AAT'], 'K': ['AAA', 'AAG'], 'S': ['AGC', 'AGT'], 'R': ['AGA', 'AGG']}
input_str = "IIMTSTTKRI"
输出将是与每个键关联的三个字符子字符串的字符串。但是可以使用许多 3 个字符子字符串,我想尽量减少彼此相邻的 G 和 C 的数量。
我目前有这个:
n = []
#make list of possible substrings for each character in string
for i in str:
if i in dict.keys():
n.append(dict[i])
#generate all permutations
p = [''.join(s) for s in itertools.product(*n)]
#if no consecutive GCs in a permutation add to list
ls = []
for i in p:
q = i.count('GC')
if q == 0:
ls.append(i)
哪个“有效”,但存在一些问题。第一个(次要的)是我必须假设连续的“GC”为 0,对于某些字符串来说这可能是不可能的。第二个(主要的)是它对于较长的字符串来说非常慢,因为它必须生成所有排列。
谁能提供一种提高速度的方法或替代方法?
I want to be able to generate a string from a dictionary containing substrings, whereby I input a string where each character corresponds to the key of the dictionary and it spits out a new string from the associated values to that key. However I also want to minimise certain characters being next to each other.
For example:
dict = {'I': ['ATA', 'ATC', 'ATT'], 'M': ['ATG'], 'T': ['ACA', 'ACC', 'ACG', 'ACT'], 'N':['AAC', 'AAT'], 'K': ['AAA', 'AAG'], 'S': ['AGC', 'AGT'], 'R': ['AGA', 'AGG']}
input_str = "IIMTSTTKRI"
The output would be a string of the three character substrings associated with each key.However there are many 3 character substrings that could be used, I would like to minimise the number of G's and C's that are next to one another.
I currently have this:
n = []
#make list of possible substrings for each character in string
for i in str:
if i in dict.keys():
n.append(dict[i])
#generate all permutations
p = [''.join(s) for s in itertools.product(*n)]
#if no consecutive GCs in a permutation add to list
ls = []
for i in p:
q = i.count('GC')
if q == 0:
ls.append(i)
Which 'works' but there are a couple of problems. The first (minor one) is that I have to assume the consective "GC" is 0 and for some strings that may not be possible. The second (major one), is its extremely slow for longer strings because it has to generate all permutations.
Can anyone provide a way to improve the speed or an alternative way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据您的评论,您可以将问题视为最佳路径搜索(将您的问题视为一个图表,您必须遵循
input_str
中定义的路径,并且在每个顶点中,您必须从以下列表中进行选择定义的 3 字符宽字符串)。有很多搜索算法,我的解决方案是使用 A*Star:
这会打印:
Based on your comments, you can look at the problem as a optimal path searching (Think about your problem as a graph where you must follow path defined in
input_str
and in each vertex you must chose from a list of defined 3-character wide strings).There are many search algorithms, my solution is using A*Star:
This prints: