在这种情况下使用生成器有什么好处?
我正在从这张幻灯片中学习Python的生成器: http://www.dabeaz.com/generators/Generators .pdf
里面有一个例子,可以这样描述:
你有一个名为log.txt
的日志文件,编写一个程序来观察它的内容,如果有新行添加到其中,则打印它们。两种解决方案:
1. with generator:
import time
def follow(thefile):
while True:
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line
logfile = open("log.txt")
loglines = follow(logfile)
for line in loglines:
print line
2. Without generator:
import time
logfile = open("log.txt")
while True:
line = logfile.readline()
if not line:
time.sleep(0.1)
continue
print line
这里使用生成器有什么好处?
I'm learning Python's generator from this slide: http://www.dabeaz.com/generators/Generators.pdf
There is an example in it, which can be describe like this:
You have a log file called log.txt
, write a program to watch the content of it, if there are new line added to it, print them. Two solutions:
1. with generator:
import time
def follow(thefile):
while True:
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line
logfile = open("log.txt")
loglines = follow(logfile)
for line in loglines:
print line
2. Without generator:
import time
logfile = open("log.txt")
while True:
line = logfile.readline()
if not line:
time.sleep(0.1)
continue
print line
What's the benefit of using generator here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我几乎只想用上面的引文来回答这个问题。仅仅因为你可以这样做并不意味着你需要一直这样做。
但从概念上讲,生成器版本分离了功能,follow 函数的目的是封装在等待新输入时从文件中连续读取的内容。这使您可以在循环中使用您想要的新行执行任何操作。在第二个版本中,从文件读取并打印的代码与控制循环混合在一起。在这个小例子中,这可能并不是真正的问题,但这是您可能需要考虑的问题。
I'd almost just like to answer this question with just the above quote. Just because you can does not mean you need to all the time.
But conceptually the generator version separates functionality, the follow function serves the purpose of encapsulating the continuous reading from a file while waiting for new input. Which frees you to do anything in your loop with the new line that you want. In the second version the code to read from the file, and to print out is intermingled with the control loop. This might not be really an issue in this small example but that is something you might want to think about.
一个好处是能够传递生成器(例如不同的函数)并通过调用
.next()
手动迭代。这是初始生成器示例的稍作修改的版本:首先,我使用上下文管理器打开了文件(
with
语句,当您完成使用它时,它会自动关闭文件,或者在例外)。接下来,在底部我演示了如何使用.next()
方法,允许您手动单步执行。如果您需要从简单的for item in gen
循环中打破逻辑,有时这会很有用。One benefit is the ability to pass your generator around (say to different functions) and iterate manually by calling
.next()
. Here is a slightly modified version of your initial generator example:First of all I opened the file with a context manager (
with
statement, which auto-closes the file when you're done with it, or on exception). Next, at the bottom I've demonstrated using the.next()
method, allowing you to manually step through. This can be useful sometimes if you need to break logic out from a simplefor item in gen
loop.生成器函数的定义与普通函数类似,但每当它需要生成一个值时,它都会使用yield关键字而不是return来生成。它的主要优点是它允许代码随着时间的推移生成一系列值,而不是立即计算它们并像列表一样将它们发送回。例如,
Return 将指定的值发送回其调用者,而 Yield 可以生成一系列值。当我们想要迭代一个序列,但又不想将整个序列存储在内存中时,我们应该使用yield。
A generator function is defined like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return. Its main advantage is it allows its code to produce a series of values over time, rather than computing them at once and sending them back like a list.For example
Return sends a specified value back to its caller whereas Yield can produce a sequence of values. We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory.
理想情况下,大多数循环大致具有以下形式:
但是有时(如示例#2 中),循环实际上更复杂,因为您有时会获得一个元素,有时则不会。这意味着在没有该元素的示例中,您将用于生成元素的代码与用于处理该元素的代码混合在一起。它在示例中没有显示得太清楚,因为生成下一个值的代码实际上并不太复杂,并且处理只是一行,但示例 1 更清晰地分离了这两个概念。
一个更好的例子可能是处理文件中的可变长度段落,并用空行分隔每个段落:尝试使用和不使用生成器编写代码,您应该会看到好处。
Ideally most loops are roughly of the form:
However sometimes (as in your example #2), the loop is actually more complex as you sometimes get an element and sometimes don't. That means in your example without the element you have mixed up code for generating an element with the code for processing it. It doesn't show too clearly in the example because the code to generate the next value isn't actually too complex and the processing is just one line, but example number 1 is separating these two concepts more cleanly.
A better example might be one that processes variable length paragraphs from a file with blank lines separating each paragraph: try writing code for that with and without generators and you should see the benefit.
虽然您的示例可能有点简单,无法充分利用生成器,但我更喜欢使用生成器来封装任何序列数据的生成,其中还存在某种数据过滤。它将“我正在对数据做什么”代码与“我如何获取数据”代码分开。
While your example might be a bit simple to fully take advantage of generators, I prefer to use generators to encapsulate the generation of any sequence data where there is also some kind of filtering of the data. It keeps the 'what I'm doing with the data' code separated from the 'how I get the data' code.