Python递归函数超出递归限制。我如何将其转换为迭代

发布于 2024-11-29 18:35:35 字数 4054 浏览 1 评论 0原文

我创建了一个读取 ID 对列表（即 [("A","B"),("B","C"),("C","D"),...] 和序列的函数ID 从头到尾包括任何分支。

每个有序 ID 列表都保存在一个称为 Alignment 的类中，该函数使用递归来处理分支，方法是从分支与主列表分离的 ID 开始创建新的对齐。

我发现通过某些输入是可能的想避免这种策略。

为了达到Python设置的最大递归限制，我知道我可以使用sys.setrecursionlimit()来增加这个限制，但由于我不知道可能有多少种分支组合，所以我一直在阅读几篇有关将递归函数转换为迭代函数的文章，但我无法确定处理此特定函数的最佳方法，因为递归发生在函数的中间，并且可能是指数级的

。建议？

谢谢，Brian

代码发布在下面：

def buildAlignments(alignment, alignmentList, endIDs):
    while alignment.start in endIDs:

        #If endID only has one preceding ID: add preceding ID to alignment
        if len(endIDs[alignment.start]) == 1:
            alignment.add(endIDs[alignment.start][0])

        else:

            #List to hold all branches that end at spanEnd
            branches = []

            for each in endIDs[alignment.start]:

                #New alignment for each branch
                al = Alignment(each)

                #Recursively process each new alignment
                buildAlignments(al, branches, endIDs)

                branches.append(al)
            count = len(branches)
            i = 0
            index = 0

            #Loop through branches by length
            for branch in branches:
                if i < count - 1:

                    #Create copy of original alignment and add branch to alignment
                    al = Alignment(alignment)
                    al += branch #branches[index]
                    alignmentList.append(al)
                    i += 1

                #Add single branch to existing original alignment
                else: alignment += branch #branches[index]
                index += 1

def main():
    IDs = [("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")]

    #Gather all startIDs with corresponding endIDs and vice versa
    startIDs = {}
    endIDs = {}
    for pair in IDs:
        if not pair[0] in startIDs: startIDs[pair[0]] = []
        startIDs[pair[0]].append(pair[1])
        if not pair[1] in endIDs: endIDs[pair[1]] = []
        endIDs[pair[1]].append(pair[0])

    #Create Alignment objects from any endID that does not start another pair (i.e. final ID in sequence)
    alignments = [Alignment(end) for end in endIDs if not end in startIDs]

    #Build build sequences in each original Alignment
    i = len(alignments)
    while i:
        buildAlignments(alignments[i-1], alignments, endIDs)
        i -= 1

编辑：我应该指出，提供的 ID 只是我用于测试该算法的一个小样本。实际上，ID序列可能有几千长，并且有许多分支和分支之间的分支。

解决方案：感谢安德鲁·库克。新方法在调用堆栈上似乎更简单、更容易。我确实对他的代码做了一些细微的调整，以更好地满足我的目的。我已将完整的解决方案包含在下面：

from collections import defaultdict

def expand(line, have_successors, known):
    #print line
    known.append(line)
    for child in have_successors[line[-1]]:
        newline = line + [child]
        if line in known: known.remove(line)
        yield expand(newline, have_successors, known)

def trampoline(generator):
    stack = [generator]
    while stack:
        try:
            generator = stack.pop()
            child = next(generator)
            stack.append(generator)
            stack.append(child)
        except StopIteration:
            pass

def main(pairs):
    have_successors = defaultdict(lambda: set())
    links = set()
    for (start, end) in pairs:
        links.add(end)
        have_successors[start].add(end)
    known = []
    for node in set(have_successors.keys()):
        if node not in links:
            trampoline(expand([node], have_successors, known))
    for line in known:
        print line

if __name__ == '__main__':
    main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])

变更摘要：交换链接和 have_successors 以从头到尾创建列表添加了 if line inknown:known.remove(line) 进行扩展，以便仅保留完整的系列将行变量从字符串更改为列表，以便处理单个 ID 中的多个字符。

更新：所以我刚刚发现我遇到所有这些问题的原因首先是对我提供的 ID 列表中的循环引用进行的。现在循环引用已修复，任何一种方法都可以按预期工作。 - 再次感谢您的所有帮助。

原文

I have created a function that reads lists of ID pairs (i.e. [("A","B"),("B","C"),("C","D"),...] and sequences the ID's from start to finish including any branches.

Each list of ordered ID's is held in a class called an Alignment and this function uses recursion to handle branches by creating a new alignment starting at the ID at which the branch splits from the main list.

I have found that with certain inputs it is possible to reach the maximum recursion limit set by Python. I know I could just increase this limit using sys.setrecursionlimit(), but as I do not know how many combinations of branches are possible, I'd like to avoid this tactic.

I have been reading several articles about converting recursive functions to iterative functions, but I have not been able to determine the best way to handle this particular function because the recursion takes place in the middle of the function and can be exponential.

Could any of you offer any suggestions?

Thanks, Brian

Code is posted below:

def buildAlignments(alignment, alignmentList, endIDs):
    while alignment.start in endIDs:

        #If endID only has one preceding ID: add preceding ID to alignment
        if len(endIDs[alignment.start]) == 1:
            alignment.add(endIDs[alignment.start][0])

        else:

            #List to hold all branches that end at spanEnd
            branches = []

            for each in endIDs[alignment.start]:

                #New alignment for each branch
                al = Alignment(each)

                #Recursively process each new alignment
                buildAlignments(al, branches, endIDs)

                branches.append(al)
            count = len(branches)
            i = 0
            index = 0

            #Loop through branches by length
            for branch in branches:
                if i < count - 1:

                    #Create copy of original alignment and add branch to alignment
                    al = Alignment(alignment)
                    al += branch #branches[index]
                    alignmentList.append(al)
                    i += 1

                #Add single branch to existing original alignment
                else: alignment += branch #branches[index]
                index += 1

def main():
    IDs = [("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")]

    #Gather all startIDs with corresponding endIDs and vice versa
    startIDs = {}
    endIDs = {}
    for pair in IDs:
        if not pair[0] in startIDs: startIDs[pair[0]] = []
        startIDs[pair[0]].append(pair[1])
        if not pair[1] in endIDs: endIDs[pair[1]] = []
        endIDs[pair[1]].append(pair[0])

    #Create Alignment objects from any endID that does not start another pair (i.e. final ID in sequence)
    alignments = [Alignment(end) for end in endIDs if not end in startIDs]

    #Build build sequences in each original Alignment
    i = len(alignments)
    while i:
        buildAlignments(alignments[i-1], alignments, endIDs)
        i -= 1

EDIT: I should point out that the provided IDs are just a small sample I used for testing this algorithm. In actuality, the sequences of IDs may be several thousand long with many branches and branches off of branches.

RESOLUTION: Thanks to Andrew Cooke. The new method seems to be much simpler and much easier on the call stack. I did make some minor adjustments to his code to better suit my purposes. I have included the completed solution below:

from collections import defaultdict

def expand(line, have_successors, known):
    #print line
    known.append(line)
    for child in have_successors[line[-1]]:
        newline = line + [child]
        if line in known: known.remove(line)
        yield expand(newline, have_successors, known)

def trampoline(generator):
    stack = [generator]
    while stack:
        try:
            generator = stack.pop()
            child = next(generator)
            stack.append(generator)
            stack.append(child)
        except StopIteration:
            pass

def main(pairs):
    have_successors = defaultdict(lambda: set())
    links = set()
    for (start, end) in pairs:
        links.add(end)
        have_successors[start].add(end)
    known = []
    for node in set(have_successors.keys()):
        if node not in links:
            trampoline(expand([node], have_successors, known))
    for line in known:
        print line

if __name__ == '__main__':
    main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])

SUMMARY OF CHANGES:
swapped links and have_successors to create list from start to end
added if line in known: known.remove(line) to expand in order to retain only the complete series
changed line variable from string to list in order to handle multiple characters in a single ID.

UPDATE: So I just discovered the reason I was having an issue with all this in the first place is do to circular references in the list of IDs I was provided. Now that the circular reference is fixed, either method works as expected. - Thanks again for all your help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

香橙ぽ 2024-12-06 18:35:35

你的代码是一个杂乱无章的混乱。我无法详细说明它应该做什么。如果你更小心（更整洁、更清晰），那么你可能也会发现重构更容易。

无论如何，这可能会做类似您想要的事情：

from collections import defaultdict

def expand(line, links, known):
    print 'expand'
    known.append(line)
    for child in links[line[-1]]:
        newline = line + child
        yield expand(newline, links, known)

def trampoline(generator):
    stack = [generator]
    while stack:
        try:
            generator = stack.pop()
            print 'next'
            child = next(generator)
            stack.append(generator)
            stack.append(child)
        except StopIteration:
            pass

def main(pairs):
    have_successors = set()
    links = defaultdict(lambda: set())
    for (start, end) in pairs:
        have_successors.add(start)
        links[end].add(start)
    known = []
    for node in set(links.keys()):
        if node not in have_successors:
            trampoline(expand(node, links, known))
    for line in known:
        print line

if __name__ == '__main__':
    main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])

我使用了 python2.7 - 对于早期版本，您可能需要将 next(foo) 替换为 foo.__next__() 或相似的。

关于编写更干净的代码

首先，我也是一个自学成才的程序员，最初是一名学者（天文学家），所以我对你表示同情。如果你继续前进，你可以赶上并超过许多“受过训练”的程序员。这并不像你想象的那么难......

其次，使用像defaultdict这样的“技巧”和“整洁”之间是有区别的，这只是经验/实践的问题。我不希望你知道 defaultdict - 这会随着时间的推移而出现。

但是您现在应该能够做的是编写干净、简单的代码：

我认为您有关于早期版本代码的注释。有人提到“最大长度”，但我没有看到任何长度计算。所以要么评论已经过时（在这种情况下为什么它在那里），要么它需要更清晰（为什么那些东西是最大长度的？）。一般来说，您应该尽可能少地发表评论，否则它就会过时。但同时，您应该在不清楚代码背后的“想法”的地方使用注释。代码应该不言而喻，所以不要说“我在这里添加两个数字”，但如果有一些“隐藏”逻辑，请说“这里的片段必须是最大长度，因为......”。
小心您使用的图片。由于某种原因，您的代码以已知的终端开始。所以你要从头开始构建东西。为什么？这是一种奇怪的看待问题的方式。从开始点开始而不是结束点不是更清楚吗？然后使用“startIDs”来增长它们？这样你就“向前走”。这看起来可能是一件小事，但它会让阅读代码变得混乱。
使用正确的工具来完成工作。你没有使用startID，那你为什么要构建地图呢？你所需要的只是一套。也许你不知道集合，在这种情况下没关系（但你现在知道了！:o）。但除此之外，这也令人困惑——阅读你的代码的人希望你做事是有原因的。因此，当您做的事情超出必要范围时，他们会想知道为什么。
避免在不需要的时候数东西。你有i和index和count。他们都需要吗？这类计数器是最容易出现错误的，因为它们可能会出现愚蠢的小逻辑错误。它们使代码变得不清楚。如果 i 则为count - 1： 真的是在说“这是最后一个分支”吗？如果是这样，最好写成ifbranch==branchs[-1]:，因为这样清楚你在想什么。
与 main 中的循环对齐类似。使用 i 只会让事情变得复杂。你正在处理每个对齐，所以只需说对于对齐中的每个对齐。如果由于对齐方式发生变化而导致错误，请制作一个不变的副本：对于列表中的每个对齐方式（对齐方式）。
如果不需要，请避免特殊情况。在 buildAlignment 中，您从一开始就有一个针对特殊情况的测试。但这真的需要吗？如果没有它你会得到同样的结果吗？通常，当您编写简单的代码时，结果表明您不需要特殊情况。在我的代码中，我不需要测试是否有一个或没有“链接”，因为它在所有这些情况下都可以正常工作。这可以减少代码量，减少需要担心的事情，并减少出现错误的机会。

比所有这些更重要的是 - 你必须非常整洁和有条理。你有很多想法，但与其尝试其中的一半然后跳到另一个，不如把它们写下来并一一解决。否则你最终会得到一团混乱和你不理解的代码。起初，你感觉自己在浪费时间，但你开始发现，结果你变得更快了，因为你花在困惑上的时间更少了......

在生成器上

[我稍微修改了代码以分离出newline 并在几个地方添加 print。]

首先，您运行了代码吗？它正在做你想要的事情吗？你能看出它和你之前的东西有什么联系吗？我的 expand 与您的 buildAlignment 类似（我希望如此）。

如果你运行它（最新版本），你会看到：

: python2.7 recurse.py
next
expand
next
expand
next
expand
next
expand
next
expand
next
expand
next
expand
next
next
...

这可能会提供有关正在发生的事情的线索。 “技巧”是yield 语句——Python 编译器看到这一点后，不再创建一个普通函数，而是创建一个生成器。

发电机是一个非常奇怪的东西。它基本上是您的函数（在本例中为 expand），“捆绑”在一起，以便可以分阶段运行。运行由 next() 完成，并且每次达到 yield 时函数都会再次停止。

所以 Trampoline 被传递给这个奇怪的包。它调用next()。这就是启动该函数的“神奇”函数。因此，当调用 next 时，该函数开始运行，直到达到 yield，并返回一个新包。，trampoline() 命令保存旧包并开始处理新包，调用 next() ，启动它......等等。

然后 “没有事情可做了”，它会引发 StopIteration。因此，当我们达到表达式无法再增长的程度时，我们就会在 trampoline() 中看到该异常。此时，我们返回到最后一个“旧”包（存储在我们的堆栈中）并再次调用next()。该捆绑包从原来的位置（就在 yield 之后）重新启动并继续，可能在 while 中执行另一个循环，直到它命中 再次yield（或者耗尽并引发StopIteration）。

所以最后，代码的作用与 yield 不存在一样！唯一的区别是我们不断制作这些捆绑包并将其退回。这似乎毫无意义。只是我们不再使用堆栈！因为捆绑包已返回，所以堆栈并未“用完”！这就是为什么我们需要管理自己的堆栈（列表stack） - 否则无法知道之前的调用是什么。

好吧，好吧，我根本不指望你能理解这一点。是的，这有点疯狂。现在你需要离开并谷歌搜索“python 生成器”。并编写一些您自己的代码来测试它。但希望这能指明道路。

哦，还有，我昨晚也在想。我怀疑如果你耗尽了堆栈，那实际上是因为你有循环，而不是因为链太长了。你考虑过循环吗？ A→B、B→C、C→A、……

your code is a disorganised muddle. i can't tell what it is supposed to be doing in detail. if you were more careful (neater, clearer) then you would probably also find it easier to refactor.

anyway, this may do something like what you want:

from collections import defaultdict

def expand(line, links, known):
    print 'expand'
    known.append(line)
    for child in links[line[-1]]:
        newline = line + child
        yield expand(newline, links, known)

def trampoline(generator):
    stack = [generator]
    while stack:
        try:
            generator = stack.pop()
            print 'next'
            child = next(generator)
            stack.append(generator)
            stack.append(child)
        except StopIteration:
            pass

def main(pairs):
    have_successors = set()
    links = defaultdict(lambda: set())
    for (start, end) in pairs:
        have_successors.add(start)
        links[end].add(start)
    known = []
    for node in set(links.keys()):
        if node not in have_successors:
            trampoline(expand(node, links, known))
    for line in known:
        print line

if __name__ == '__main__':
    main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])

i used python2.7 - with earlier versions you may need to replace next(foo) with foo.__next__() or similar.

on writing cleaner code

first, i too am a self-taught programmer who started out as an academic (an astronomer), so you have my sympathy. and if you keep going, you can catch up and pass many "taught" programmers. it's not as hard as you might think...

second there's a difference between using "tricks" like defaultdict, which is just a matter of experience / practice, and being "tidy". i don't expect you to know about defaultdict - that will come with time.

but what you should be able to do now is write clean, simple code:

i think that you have comments that are about earlier versions of the code. one mentions "max length" but i don't see any length calculations. so either the comment is out of date (in which case why is it there) or it needs to be clearer (why are those things of max length?). in general, you should comment as little as possible, because otherwise it does get out of date. but at the same time you should use comments where it's not clear what the "ideas" are behind the code. the code should speak for itself, so don't say "i am adding two numbers here", but do say "fragments here must be maximum length because ..." if there is some "hidden" logic.
be careful with the pictures you use. for some reason your code starts with known terminals. so you're building things from the end back towards the start. why? that's a strange way of looking at the problem. wouldn't it be clearer to start with points that are in start, but not in end? and then use "startIDs" to grow them? that way you are "walking forwards". it might seem like a little thing, but it makes reading the code confusing.
use the right tools for the job. you didn't use startIDs, so why are you building a map? all you needed there was a set. perhaps you didn't know about sets, in which case OK (but you do now! :o). but otherwise, that too is confusing - someone reading your code expects you to be doing things for a reason. so when you do more than is necessary they wonder why.
avoid counting things when you don't need to. you have i and index and count. are they all needed? these kinds of counters are the easiest way to have bugs, because they can have silly little logic errors. and they make the code unclear. is if i < count - 1: really saying "is this the last branch"? if so, it would be much nicer to write if branch == branches [-1]: because that's clear about what you're thinking.
similarly with looping over alignments in main. using i just complicates things. you are processing each alignment, so just say for each alignment in alignments. if that gives an error because alignments is changing, make an unchanging copy: for each alignment in list(alignments).
avoid special cases if they are not needed. in buildAlignment you have a test right at the start for a special case. but was it really needed? would you have got the same result without it? often, when you write the code simply, it turns out that you don't need special cases. in my code i don't need to test whether there is one or no "links" because it works ok in all those cases. this gives you less code and less things to worry about and less chance for bugs.

more than all these things - you have to be obsessively tidy and methodical. you have lots of ideas, but rather than try out half of one then jump to another, write them down and work through them one by one. otherwise you end up with a mess and code that you don't understand. at first it feels like you are wasting time, but you start to see that as a result you are becoming faster because you spend less time confused...

on generators

[i modified the code slightly to separate out newline and to add print in a couple of places.]

first, did you run the code? is it doing the kind of thing you want? can you see how it connects with what you had before? my expand is similar to your buildAlignment (i hope).

if you run it (latest version) you'll see:

: python2.7 recurse.py
next
expand
next
expand
next
expand
next
expand
next
expand
next
expand
next
expand
next
next
...

which may give a clue about what is happening. the "trick" is the yield statement - the python compiler sees that and, instead of making an ordinary function, makes a generator.

a generator is a very strange thing. it's basically your function (in this case, expand), "bundled up" so that it can be run in stages. the running is done by next() and the function stops again each time a yield is reached.

so trampoline is passed this strange bundle. and it calls next(). that's the "magic" function that starts the function. so when next is called the function starts running, until it reaches the yield, where it returns a new bundle. the trampoline() command then saves the old bundle and starts working on the new one, calling next() on that, starting it... etc etc.

when a generator "runs out of things to do" it raises StopIteration. so when we reach a point where the expression cannot grow any more then we see that exception in trampoline(). at that point we return to the last "old" bundle (stored in our stack) and call next() again on that. this bundle restarts from where it was (just after yield) and continues, probably doing another loop in the while, until it hits yield again (or runs out and raises StopIteration).

so in the end, the code does the same as if the yield was not there! the only difference is that we keep making these bundles, and returning them. which seems pointless. except that we are no longer using the stack! because the bundle is returned the stack is not being "used up"! that is why we need to manage out own stack (the list stack) - otherwise there's no way to know what the previous call was.

ok, ok, i don't expect you to understand this at all. yes, it's kind-of crazy. now you need to go away and google for "python generators". and write some code of your own to test it out. but hopefully this points the way.

oh, also, i was thinking last night. and i suspect that if you were exhausting the stack it was actually because you have loops, not because the chains are so long. have you considered loops? A->B, B->C, C->A, ....

回复收藏 0 原文

~没有更多了~