从字典中的子字符串生成字符串，同时最小化某些彼此相邻的字符

发布于 2025-01-13 15:20:15 字数 918 浏览 2 评论 0原文

我希望能够从包含子字符串的字典中生成一个字符串，其中我输入一个字符串，其中每个字符对应于字典的键，然后它从与该键相关的值中吐出一个新字符串。不过，我也想尽量减少某些角色彼此相邻的情况。

例如：

dict = {'I': ['ATA', 'ATC', 'ATT'], 'M': ['ATG'], 'T': ['ACA', 'ACC', 'ACG', 'ACT'], 'N':['AAC', 'AAT'], 'K': ['AAA', 'AAG'], 'S': ['AGC', 'AGT'], 'R': ['AGA', 'AGG']}

input_str = "IIMTSTTKRI"

输出将是与每个键关联的三个字符子字符串的字符串。但是可以使用许多 3 个字符子字符串，我想尽量减少彼此相邻的 G 和 C 的数量。

我目前有这个：

n = []

#make list of possible substrings for each character in string 
for i in str:
    if i in dict.keys():
       n.append(dict[i])
#generate all permutations 
p = [''.join(s) for s in itertools.product(*n)]

#if no consecutive GCs in a permutation add to list
ls = []
for i in p:
    q = i.count('GC')    
    if q == 0:
        ls.append(i)

哪个“有效”，但存在一些问题。第一个（次要的）是我必须假设连续的“GC”为 0，对于某些字符串来说这可能是不可能的。第二个（主要的）是它对于较长的字符串来说非常慢，因为它必须生成所有排列。

谁能提供一种提高速度的方法或替代方法？

原文

I want to be able to generate a string from a dictionary containing substrings, whereby I input a string where each character corresponds to the key of the dictionary and it spits out a new string from the associated values to that key. However I also want to minimise certain characters being next to each other.

For example:

dict = {'I': ['ATA', 'ATC', 'ATT'], 'M': ['ATG'], 'T': ['ACA', 'ACC', 'ACG', 'ACT'], 'N':['AAC', 'AAT'], 'K': ['AAA', 'AAG'], 'S': ['AGC', 'AGT'], 'R': ['AGA', 'AGG']}

input_str = "IIMTSTTKRI"

The output would be a string of the three character substrings associated with each key.However there are many 3 character substrings that could be used, I would like to minimise the number of G's and C's that are next to one another.

I currently have this:

n = []

#make list of possible substrings for each character in string 
for i in str:
    if i in dict.keys():
       n.append(dict[i])
#generate all permutations 
p = [''.join(s) for s in itertools.product(*n)]

#if no consecutive GCs in a permutation add to list
ls = []
for i in p:
    q = i.count('GC')    
    if q == 0:
        ls.append(i)

Which 'works' but there are a couple of problems. The first (minor one) is that I have to assume the consective "GC" is 0 and for some strings that may not be possible. The second (major one), is its extremely slow for longer strings because it has to generate all permutations.

Can anyone provide a way to improve the speed or an alternative way?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

分分钟 2025-01-20 15:20:15

根据您的评论，您可以将问题视为最佳路径搜索（将您的问题视为一个图表，您必须遵循 input_str 中定义的路径，并且在每个顶点中，您必须从以下列表中进行选择定义的 3 字符宽字符串）。

有很多搜索算法，我的解决方案是使用 A*Star：

from heapq import heappop, heappush

dct = {
    "I": ["ATA", "ATC", "ATT"],
    "M": ["ATG"],
    "T": ["ACA", "ACC", "ACG", "ACT"],
    "N": ["AAC", "AAT"],
    "K": ["AAA", "AAG"],
    "S": ["AGC", "AGT"],
    "R": ["AGA", "AGG"],
}


input_str = "IIMTSTTKRI"

def valid_moves(s):
    key = input_str[len(s) // 3]
    for i in dct[key]:
        yield s + i


def distance(s):
    return len(input_str) - (len(s) // 3)


def my_cost_func(_from, _to):
    return _to.count("GC")


def a_star(start, moves_func, h_func, cost_func):
    """
    Find a shortest sequence of states from start to a goal state
    (a state s with h_func(s) == 0).
    """

    frontier = [
        (h_func(start), start)
    ]  # A priority queue, ordered by path length, f = g + h
    previous = {
        start: None
    }  # start state has no previous state; other states will
    path_cost = {start: 0}  # The cost of the best path to a state.
    Path = lambda s: ([] if (s is None) else Path(previous[s]) + [s])
    while frontier:
        (f, s) = heappop(frontier)
        if h_func(s) == 0:
            return Path(s)
        for s2 in moves_func(s):
            g = path_cost[s] + cost_func(s, s2)
            if s2 not in path_cost or g < path_cost[s2]:
                heappush(frontier, (g + h_func(s2), s2))
                path_cost[s2] = g
                previous[s2] = s


path = a_star("", valid_moves, distance, my_cost_func)
print("Result:", path[-1])

这会打印：

Result: ATAATAATGACAAGTACAACAAAAAGAATAAGT

Based on your comments, you can look at the problem as a optimal path searching (Think about your problem as a graph where you must follow path defined in input_str and in each vertex you must chose from a list of defined 3-character wide strings).

There are many search algorithms, my solution is using A*Star:

from heapq import heappop, heappush

dct = {
    "I": ["ATA", "ATC", "ATT"],
    "M": ["ATG"],
    "T": ["ACA", "ACC", "ACG", "ACT"],
    "N": ["AAC", "AAT"],
    "K": ["AAA", "AAG"],
    "S": ["AGC", "AGT"],
    "R": ["AGA", "AGG"],
}


input_str = "IIMTSTTKRI"

def valid_moves(s):
    key = input_str[len(s) // 3]
    for i in dct[key]:
        yield s + i


def distance(s):
    return len(input_str) - (len(s) // 3)


def my_cost_func(_from, _to):
    return _to.count("GC")


def a_star(start, moves_func, h_func, cost_func):
    """
    Find a shortest sequence of states from start to a goal state
    (a state s with h_func(s) == 0).
    """

    frontier = [
        (h_func(start), start)
    ]  # A priority queue, ordered by path length, f = g + h
    previous = {
        start: None
    }  # start state has no previous state; other states will
    path_cost = {start: 0}  # The cost of the best path to a state.
    Path = lambda s: ([] if (s is None) else Path(previous[s]) + [s])
    while frontier:
        (f, s) = heappop(frontier)
        if h_func(s) == 0:
            return Path(s)
        for s2 in moves_func(s):
            g = path_cost[s] + cost_func(s, s2)
            if s2 not in path_cost or g < path_cost[s2]:
                heappush(frontier, (g + h_func(s2), s2))
                path_cost[s2] = g
                previous[s2] = s


path = a_star("", valid_moves, distance, my_cost_func)
print("Result:", path[-1])

This prints:

Result: ATAATAATGACAAGTACAACAAAAAGAATAAGT

回复收藏 0 原文

~没有更多了~

关于作者

一页

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

从字典中的子字符串生成字符串，同时最小化某些彼此相邻的字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

从字典中的子字符串生成字符串，同时最小化某些彼此相邻的字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。