Python递归函数超出递归限制。我如何将其转换为迭代
我创建了一个读取 ID 对列表(即 [("A","B"),("B","C"),("C","D"),...] 和序列的函数ID 从头到尾包括任何分支。
每个有序 ID 列表都保存在一个称为 Alignment 的类中,该函数使用递归来处理分支,方法是从分支与主列表分离的 ID 开始创建新的对齐。
我发现通过某些输入是可能的 想避免这种策略。
为了达到Python设置的最大递归限制,我知道我可以使用sys.setrecursionlimit()来增加这个限制,但由于我不知道可能有多少种分支组合,所以我 一直在阅读几篇有关将递归函数转换为迭代函数的文章,但我无法确定处理此特定函数的最佳方法,因为递归发生在函数的中间,并且可能是指数级的
。建议?
谢谢,Brian
代码发布在下面:
def buildAlignments(alignment, alignmentList, endIDs):
while alignment.start in endIDs:
#If endID only has one preceding ID: add preceding ID to alignment
if len(endIDs[alignment.start]) == 1:
alignment.add(endIDs[alignment.start][0])
else:
#List to hold all branches that end at spanEnd
branches = []
for each in endIDs[alignment.start]:
#New alignment for each branch
al = Alignment(each)
#Recursively process each new alignment
buildAlignments(al, branches, endIDs)
branches.append(al)
count = len(branches)
i = 0
index = 0
#Loop through branches by length
for branch in branches:
if i < count - 1:
#Create copy of original alignment and add branch to alignment
al = Alignment(alignment)
al += branch #branches[index]
alignmentList.append(al)
i += 1
#Add single branch to existing original alignment
else: alignment += branch #branches[index]
index += 1
def main():
IDs = [("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")]
#Gather all startIDs with corresponding endIDs and vice versa
startIDs = {}
endIDs = {}
for pair in IDs:
if not pair[0] in startIDs: startIDs[pair[0]] = []
startIDs[pair[0]].append(pair[1])
if not pair[1] in endIDs: endIDs[pair[1]] = []
endIDs[pair[1]].append(pair[0])
#Create Alignment objects from any endID that does not start another pair (i.e. final ID in sequence)
alignments = [Alignment(end) for end in endIDs if not end in startIDs]
#Build build sequences in each original Alignment
i = len(alignments)
while i:
buildAlignments(alignments[i-1], alignments, endIDs)
i -= 1
编辑:我应该指出,提供的 ID 只是我用于测试该算法的一个小样本。实际上,ID序列可能有几千长,并且有许多分支和分支之间的分支。
解决方案:感谢安德鲁·库克。新方法在调用堆栈上似乎更简单、更容易。我确实对他的代码做了一些细微的调整,以更好地满足我的目的。我已将完整的解决方案包含在下面:
from collections import defaultdict
def expand(line, have_successors, known):
#print line
known.append(line)
for child in have_successors[line[-1]]:
newline = line + [child]
if line in known: known.remove(line)
yield expand(newline, have_successors, known)
def trampoline(generator):
stack = [generator]
while stack:
try:
generator = stack.pop()
child = next(generator)
stack.append(generator)
stack.append(child)
except StopIteration:
pass
def main(pairs):
have_successors = defaultdict(lambda: set())
links = set()
for (start, end) in pairs:
links.add(end)
have_successors[start].add(end)
known = []
for node in set(have_successors.keys()):
if node not in links:
trampoline(expand([node], have_successors, known))
for line in known:
print line
if __name__ == '__main__':
main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])
变更摘要: 交换链接和 have_successors 以从头到尾创建列表 添加了 if line inknown:known.remove(line)
进行扩展,以便仅保留完整的系列 将行变量从字符串更改为列表,以便处理单个 ID 中的多个字符。
更新:所以我刚刚发现我遇到所有这些问题的原因首先是对我提供的 ID 列表中的循环引用进行的。现在循环引用已修复,任何一种方法都可以按预期工作。 - 再次感谢您的所有帮助。
I have created a function that reads lists of ID pairs (i.e. [("A","B"),("B","C"),("C","D"),...] and sequences the ID's from start to finish including any branches.
Each list of ordered ID's is held in a class called an Alignment and this function uses recursion to handle branches by creating a new alignment starting at the ID at which the branch splits from the main list.
I have found that with certain inputs it is possible to reach the maximum recursion limit set by Python. I know I could just increase this limit using sys.setrecursionlimit(), but as I do not know how many combinations of branches are possible, I'd like to avoid this tactic.
I have been reading several articles about converting recursive functions to iterative functions, but I have not been able to determine the best way to handle this particular function because the recursion takes place in the middle of the function and can be exponential.
Could any of you offer any suggestions?
Thanks, Brian
Code is posted below:
def buildAlignments(alignment, alignmentList, endIDs):
while alignment.start in endIDs:
#If endID only has one preceding ID: add preceding ID to alignment
if len(endIDs[alignment.start]) == 1:
alignment.add(endIDs[alignment.start][0])
else:
#List to hold all branches that end at spanEnd
branches = []
for each in endIDs[alignment.start]:
#New alignment for each branch
al = Alignment(each)
#Recursively process each new alignment
buildAlignments(al, branches, endIDs)
branches.append(al)
count = len(branches)
i = 0
index = 0
#Loop through branches by length
for branch in branches:
if i < count - 1:
#Create copy of original alignment and add branch to alignment
al = Alignment(alignment)
al += branch #branches[index]
alignmentList.append(al)
i += 1
#Add single branch to existing original alignment
else: alignment += branch #branches[index]
index += 1
def main():
IDs = [("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")]
#Gather all startIDs with corresponding endIDs and vice versa
startIDs = {}
endIDs = {}
for pair in IDs:
if not pair[0] in startIDs: startIDs[pair[0]] = []
startIDs[pair[0]].append(pair[1])
if not pair[1] in endIDs: endIDs[pair[1]] = []
endIDs[pair[1]].append(pair[0])
#Create Alignment objects from any endID that does not start another pair (i.e. final ID in sequence)
alignments = [Alignment(end) for end in endIDs if not end in startIDs]
#Build build sequences in each original Alignment
i = len(alignments)
while i:
buildAlignments(alignments[i-1], alignments, endIDs)
i -= 1
EDIT: I should point out that the provided IDs are just a small sample I used for testing this algorithm. In actuality, the sequences of IDs may be several thousand long with many branches and branches off of branches.
RESOLUTION: Thanks to Andrew Cooke. The new method seems to be much simpler and much easier on the call stack. I did make some minor adjustments to his code to better suit my purposes. I have included the completed solution below:
from collections import defaultdict
def expand(line, have_successors, known):
#print line
known.append(line)
for child in have_successors[line[-1]]:
newline = line + [child]
if line in known: known.remove(line)
yield expand(newline, have_successors, known)
def trampoline(generator):
stack = [generator]
while stack:
try:
generator = stack.pop()
child = next(generator)
stack.append(generator)
stack.append(child)
except StopIteration:
pass
def main(pairs):
have_successors = defaultdict(lambda: set())
links = set()
for (start, end) in pairs:
links.add(end)
have_successors[start].add(end)
known = []
for node in set(have_successors.keys()):
if node not in links:
trampoline(expand([node], have_successors, known))
for line in known:
print line
if __name__ == '__main__':
main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])
SUMMARY OF CHANGES:
swapped links and have_successors to create list from start to end
added if line in known: known.remove(line)
to expand in order to retain only the complete series
changed line variable from string to list in order to handle multiple characters in a single ID.
UPDATE: So I just discovered the reason I was having an issue with all this in the first place is do to circular references in the list of IDs I was provided. Now that the circular reference is fixed, either method works as expected. - Thanks again for all your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你的代码是一个杂乱无章的混乱。我无法详细说明它应该做什么。如果你更小心(更整洁、更清晰),那么你可能也会发现重构更容易。
无论如何,这可能会做类似您想要的事情:
我使用了 python2.7 - 对于早期版本,您可能需要将
next(foo)
替换为foo.__next__()
或相似的。关于编写更干净的代码
首先,我也是一个自学成才的程序员,最初是一名学者(天文学家),所以我对你表示同情。如果你继续前进,你可以赶上并超过许多“受过训练”的程序员。这并不像你想象的那么难......
其次,使用像defaultdict这样的“技巧”和“整洁”之间是有区别的,这只是经验/实践的问题。我不希望你知道 defaultdict - 这会随着时间的推移而出现。
但是您现在应该能够做的是编写干净、简单的代码:
我认为您有关于早期版本代码的注释。有人提到“最大长度”,但我没有看到任何长度计算。所以要么评论已经过时(在这种情况下为什么它在那里),要么它需要更清晰(为什么那些东西是最大长度的?)。一般来说,您应该尽可能少地发表评论,否则它就会过时。但同时,您应该在不清楚代码背后的“想法”的地方使用注释。代码应该不言而喻,所以不要说“我在这里添加两个数字”,但如果有一些“隐藏”逻辑,请说“这里的片段必须是最大长度,因为......”。
小心您使用的图片。由于某种原因,您的代码以已知的终端开始。所以你要从头开始构建东西。为什么?这是一种奇怪的看待问题的方式。从开始点开始而不是结束点不是更清楚吗?然后使用“startIDs”来增长它们?这样你就“向前走”。这看起来可能是一件小事,但它会让阅读代码变得混乱。
使用正确的工具来完成工作。你没有使用startID,那你为什么要构建地图呢?你所需要的只是一套。也许你不知道集合,在这种情况下没关系(但你现在知道了!:o)。但除此之外,这也令人困惑——阅读你的代码的人希望你做事是有原因的。因此,当您做的事情超出必要范围时,他们会想知道为什么。
避免在不需要的时候数东西。你有
i
和index
和count
。他们都需要吗?这类计数器是最容易出现错误的,因为它们可能会出现愚蠢的小逻辑错误。它们使代码变得不清楚。如果 i则为count - 1:
真的是在说“这是最后一个分支”吗?如果是这样,最好写成ifbranch==branchs[-1]:
,因为这样清楚你在想什么。与 main 中的循环对齐类似。使用
i
只会让事情变得复杂。你正在处理每个对齐,所以只需说对于对齐中的每个对齐
。如果由于对齐方式发生变化而导致错误,请制作一个不变的副本:对于列表中的每个对齐方式(对齐方式)
。如果不需要,请避免特殊情况。在 buildAlignment 中,您从一开始就有一个针对特殊情况的测试。但这真的需要吗?如果没有它你会得到同样的结果吗?通常,当您编写简单的代码时,结果表明您不需要特殊情况。在我的代码中,我不需要测试是否有一个或没有“链接”,因为它在所有这些情况下都可以正常工作。这可以减少代码量,减少需要担心的事情,并减少出现错误的机会。
比所有这些更重要的是 - 你必须非常整洁和有条理。你有很多想法,但与其尝试其中的一半然后跳到另一个,不如把它们写下来并一一解决。否则你最终会得到一团混乱和你不理解的代码。起初,你感觉自己在浪费时间,但你开始发现,结果你变得更快了,因为你花在困惑上的时间更少了......
在生成器上
[我稍微修改了代码以分离出
newline
并在几个地方添加print
。]首先,您运行了代码吗?它正在做你想要的事情吗?你能看出它和你之前的东西有什么联系吗?我的
expand
与您的buildAlignment
类似(我希望如此)。如果你运行它(最新版本),你会看到:
这可能会提供有关正在发生的事情的线索。 “技巧”是yield 语句——Python 编译器看到这一点后,不再创建一个普通函数,而是创建一个生成器。
发电机是一个非常奇怪的东西。它基本上是您的函数(在本例中为
expand
),“捆绑”在一起,以便可以分阶段运行。运行由next()
完成,并且每次达到yield
时函数都会再次停止。所以 Trampoline 被传递给这个奇怪的包。它调用
next()
。这就是启动该函数的“神奇”函数。因此,当调用next
时,该函数开始运行,直到达到yield
,并返回一个新包。 ,trampoline()
命令保存旧包并开始处理新包,调用next()
,启动它......等等。然后 “没有事情可做了”,它会引发
StopIteration
。因此,当我们达到表达式无法再增长的程度时,我们就会在trampoline()
中看到该异常。此时,我们返回到最后一个“旧”包(存储在我们的堆栈
中)并再次调用next()
。该捆绑包从原来的位置(就在yield
之后)重新启动并继续,可能在while
中执行另一个循环,直到它命中再次yield
(或者耗尽并引发StopIteration
)。所以最后,代码的作用与
yield
不存在一样!唯一的区别是我们不断制作这些捆绑包并将其退回。这似乎毫无意义。只是我们不再使用堆栈!因为捆绑包已返回,所以堆栈并未“用完”!这就是为什么我们需要管理自己的堆栈(列表stack
) - 否则无法知道之前的调用是什么。好吧,好吧,我根本不指望你能理解这一点。是的,这有点疯狂。现在你需要离开并谷歌搜索“python 生成器”。并编写一些您自己的代码来测试它。但希望这能指明道路。
哦,还有,我昨晚也在想。我怀疑如果你耗尽了堆栈,那实际上是因为你有循环,而不是因为链太长了。你考虑过循环吗? A→B、B→C、C→A、……
your code is a disorganised muddle. i can't tell what it is supposed to be doing in detail. if you were more careful (neater, clearer) then you would probably also find it easier to refactor.
anyway, this may do something like what you want:
i used python2.7 - with earlier versions you may need to replace
next(foo)
withfoo.__next__()
or similar.on writing cleaner code
first, i too am a self-taught programmer who started out as an academic (an astronomer), so you have my sympathy. and if you keep going, you can catch up and pass many "taught" programmers. it's not as hard as you might think...
second there's a difference between using "tricks" like defaultdict, which is just a matter of experience / practice, and being "tidy". i don't expect you to know about defaultdict - that will come with time.
but what you should be able to do now is write clean, simple code:
i think that you have comments that are about earlier versions of the code. one mentions "max length" but i don't see any length calculations. so either the comment is out of date (in which case why is it there) or it needs to be clearer (why are those things of max length?). in general, you should comment as little as possible, because otherwise it does get out of date. but at the same time you should use comments where it's not clear what the "ideas" are behind the code. the code should speak for itself, so don't say "i am adding two numbers here", but do say "fragments here must be maximum length because ..." if there is some "hidden" logic.
be careful with the pictures you use. for some reason your code starts with known terminals. so you're building things from the end back towards the start. why? that's a strange way of looking at the problem. wouldn't it be clearer to start with points that are in start, but not in end? and then use "startIDs" to grow them? that way you are "walking forwards". it might seem like a little thing, but it makes reading the code confusing.
use the right tools for the job. you didn't use startIDs, so why are you building a map? all you needed there was a set. perhaps you didn't know about sets, in which case OK (but you do now! :o). but otherwise, that too is confusing - someone reading your code expects you to be doing things for a reason. so when you do more than is necessary they wonder why.
avoid counting things when you don't need to. you have
i
andindex
andcount
. are they all needed? these kinds of counters are the easiest way to have bugs, because they can have silly little logic errors. and they make the code unclear. isif i < count - 1:
really saying "is this the last branch"? if so, it would be much nicer to writeif branch == branches [-1]:
because that's clear about what you're thinking.similarly with looping over alignments in main. using
i
just complicates things. you are processing each alignment, so just sayfor each alignment in alignments
. if that gives an error because alignments is changing, make an unchanging copy:for each alignment in list(alignments)
.avoid special cases if they are not needed. in buildAlignment you have a test right at the start for a special case. but was it really needed? would you have got the same result without it? often, when you write the code simply, it turns out that you don't need special cases. in my code i don't need to test whether there is one or no "links" because it works ok in all those cases. this gives you less code and less things to worry about and less chance for bugs.
more than all these things - you have to be obsessively tidy and methodical. you have lots of ideas, but rather than try out half of one then jump to another, write them down and work through them one by one. otherwise you end up with a mess and code that you don't understand. at first it feels like you are wasting time, but you start to see that as a result you are becoming faster because you spend less time confused...
on generators
[i modified the code slightly to separate out
newline
and to addprint
in a couple of places.]first, did you run the code? is it doing the kind of thing you want? can you see how it connects with what you had before? my
expand
is similar to yourbuildAlignment
(i hope).if you run it (latest version) you'll see:
which may give a clue about what is happening. the "trick" is the yield statement - the python compiler sees that and, instead of making an ordinary function, makes a generator.
a generator is a very strange thing. it's basically your function (in this case,
expand
), "bundled up" so that it can be run in stages. the running is done bynext()
and the function stops again each time ayield
is reached.so
trampoline
is passed this strange bundle. and it callsnext()
. that's the "magic" function that starts the function. so whennext
is called the function starts running, until it reaches theyield
, where it returns a new bundle. thetrampoline()
command then saves the old bundle and starts working on the new one, callingnext()
on that, starting it... etc etc.when a generator "runs out of things to do" it raises
StopIteration
. so when we reach a point where the expression cannot grow any more then we see that exception intrampoline()
. at that point we return to the last "old" bundle (stored in ourstack
) and callnext()
again on that. this bundle restarts from where it was (just afteryield
) and continues, probably doing another loop in thewhile
, until it hitsyield
again (or runs out and raisesStopIteration
).so in the end, the code does the same as if the
yield
was not there! the only difference is that we keep making these bundles, and returning them. which seems pointless. except that we are no longer using the stack! because the bundle is returned the stack is not being "used up"! that is why we need to manage out own stack (the liststack
) - otherwise there's no way to know what the previous call was.ok, ok, i don't expect you to understand this at all. yes, it's kind-of crazy. now you need to go away and google for "python generators". and write some code of your own to test it out. but hopefully this points the way.
oh, also, i was thinking last night. and i suspect that if you were exhausting the stack it was actually because you have loops, not because the chains are so long. have you considered loops? A->B, B->C, C->A, ....