避免Python代码中的代码重复
考虑以下 Python 代码片段:
af=open("a",'r')
bf=open("b", 'w')
for i, line in enumerate(af):
if i < K:
bf.write(line)
现在,假设我要处理 K
为 None
的情况, 所以写入会继续到文件末尾。 我目前正在做
if K is None:
for i, line in enumerate(af):
bf.write(line)
else:
for i, line in enumerate(af):
bf.write(line)
if i==K:
break
这显然不是处理这个问题的最佳方法,因为我正在复制代码。 有没有一些更综合的方法可以处理这个问题?自然的事情是 仅当 K
不为 None
时才出现 if/break
代码, 但这涉及到像 Lisp 宏一样即时编写语法, Python 确实无法做到这一点。需要澄清的是,我并不关心特定的 案例(我选择它的部分原因是它的简单性),就像学习一般知识一样 我可能不熟悉技术。
更新:阅读人们发布的答案并进行更多实验后,这里有更多评论。
如上所述,我一直在寻找可推广的通用技术,我认为 @Paul 的答案,即使用 iterrools
中的 takewhile
,最适合。作为奖励,它也比我上面列出的简单方法快得多;我不知道为什么。我不太熟悉 itertools,尽管我已经看过几次了。从我的角度来看,这是函数式编程的一个案例For The Win! (有趣的是,itertools
的作者曾经询问有关删除 takewhile
的反馈。请参阅以 http://mail.python.org/pipermail/python-list/2007-December/522529.html。)我在上面简化了我的情况,实际情况有点多混乱 - 我正在循环中写入两个不同的文件。所以代码看起来更像是:
for i, line in enumerate(af):
if i < K:
bf.write(line)
cf.write(line.split(',')[0].strip('"')+'\n')
鉴于我发布的示例,@Jeff 合理地建议,在 K
为 None
的情况下,我只需复制文件。由于实际上我无论如何都会循环,所以这样做并不是一个明确的选择。然而,takewhile 可以轻松地推广到这种情况。我还有另一个这里没有提到的用例,并且也能够在那里使用 takewhile
,这很好。第二个示例看起来(逐字)
i=0
for line in takewhile(illuminacond, af):
line_split=line.split(',')
pid=line_split[1][0:3]
out = line_split[1] + ',' + line_split[2] + ',' + line_split[3][1] + line_split[3][3] + ',' \
+ line_split[15] + ',' + line_split[9] + ',' + line_split[10]
if pid!='cnv' and pid!='hCV' and pid!='cnv':
i = i+1
of.write(out.strip('"')+'\n')
tf.write(line)
在这里我能够使用
if K is None:
illuminacond = lambda x: x.split(',')[0] != '[Controls]'
else:
illuminacond = lambda x: x.split(',')[0] != '[Controls]' and i < K
@Paul 原始示例的条件。然而,尽管代码有效,但我对从外部作用域获取 i
的事实并不完全满意。有更好的方法吗?或者也许这应该是一个单独的问题。不管怎样,感谢所有回答我问题的人。值得一提的是@Jeff,他提出了一些很好的建议。
Consider the following Python snippet:
af=open("a",'r')
bf=open("b", 'w')
for i, line in enumerate(af):
if i < K:
bf.write(line)
Now, suppose I want to handle the case where K
is None
,
so the writing continues to the end of the file.
I'm currently doing
if K is None:
for i, line in enumerate(af):
bf.write(line)
else:
for i, line in enumerate(af):
bf.write(line)
if i==K:
break
This clearly isn't the best way to handle this, as I'm duplicating the code.
Is there some more integrated way I can handle this? The natural thing would be
to have the if/break
code only be present if K
is not None
,
but this involves writing syntax on the fly a la Lisp macros,
which Python can't really do. Just to be clear, I'm not concerned about the particular
case (which I choose partly for its simplicity), so much as learning about general
techniques I may not be familar with.
UPDATE: After reading answers people have posted, and doing more experimentation, here are some more comments.
As said above, I was looking for general techniques that would be generalizable, and I think @Paul's answer,namely using takewhile
from iterrools
, fits that best. As a bonus, it is also much faster than the naive method i listed above; I'm not sure why. I'm not really familar with itertools
, though I've looked at it a few times. From my perspective this is a case of functional programming For The Win! (Amusingly, the author of itertools
once asked for feedback about dropping takewhile
. See the thread beginning http://mail.python.org/pipermail/python-list/2007-December/522529.html.) I'd simplified my situation above, the actual situation is a bit more messy - I'm writing to two different files in the loop. So the code looks more like:
for i, line in enumerate(af):
if i < K:
bf.write(line)
cf.write(line.split(',')[0].strip('"')+'\n')
Given my posted example, @Jeff reasonably suggested that in the case when K
was None
, I just copy the file. Since in practice I am looping anyway, doing so is not such a clear choice. However, takewhile
generalizes painlessly to this case. I also had another use case I did not mention here, and was able to use takewhile
there too, which was nice. The second example looks like (verbatim)
i=0
for line in takewhile(illuminacond, af):
line_split=line.split(',')
pid=line_split[1][0:3]
out = line_split[1] + ',' + line_split[2] + ',' + line_split[3][1] + line_split[3][3] + ',' \
+ line_split[15] + ',' + line_split[9] + ',' + line_split[10]
if pid!='cnv' and pid!='hCV' and pid!='cnv':
i = i+1
of.write(out.strip('"')+'\n')
tf.write(line)
here I was able to use the condition
if K is None:
illuminacond = lambda x: x.split(',')[0] != '[Controls]'
else:
illuminacond = lambda x: x.split(',')[0] != '[Controls]' and i < K
per @Paul's original example. However, I'm not completely happy about the fact that I'm getting i
from the outer scope, though the code works. Is there a better way of doing this? Or maybe it should be a separate question. Anyway, thanks to everyone who answered my question. Honorable mention to @Jeff, who made some nice suggestions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果一定要循环的话,这个怎么样?
甚至这个?
为什么在
K
为None
的情况下还要循环?If you must loop, how about this?
Or even this?
Why loop at all in the case thatK
isNone
?我认为您面临的情况是,您必须接受 DRY 原则和优化之间的权衡。
我首先会坚持 DRY 原则,并使用诸如
write_until
之类的函数删除重复的代码...然后实际使用这些代码,看看您是否真的需要进行优化。老实说,通过删除
if False
检查,您会看到多少性能改进?如果您确实需要额外的速度提升(我对此表示怀疑),那么您将不得不忍受一些代码重复。I think you're in a situation where you are going to have to accept a trade off between DRY principles and optimizations.
I would start by staying true to DRY principles and remove the duplicate code with a function like
write_until
...Then actually use the code and see if you really need to do optimizations. How much performance improvement will you honestly see from removing an
if False
check? If you really need that extra speed boost (which I doubt) then you'll just have to live with some code duplication.itertools.takewhile 将应用您的条件,然后在条件第一次失败时跳出循环。
如果 K 为 None,那么您不希望 takewhile 停止,因此条件函数应始终返回 True。但是,如果给定 K 的数值,那么一旦元组的第 0 个元素传递给条件 >= K,takewhile 将停止。
itertools.takewhile
will apply your condition, and then break out of the loop the first time the condition fails.If K is None, then you don't want takewhile to ever stop, so the condition function should always return True. But if you are given a numeric value for K, then once the 0'th element of the tuple passed to the condition >= K, then takewhile will stop.
无论 K 是多少,它总是小于无穷大。
或者,设置
K = -1
也同样有效,尽管它在语义上不太正确。理想情况下,您可以在 af 中设置 K = max 行,但我认为数据并不便宜。Whatever K is, it's always going to be less than infinity.
Or, setting
K = -1
works just as well, though it's less semantically correct. Ideally you would set K = max lines in af, but I presume that data is not cheaply available.