是否存在“太多的yield语句”之类的事情?在Python中?
如果列出目录并读取其中的文件,与返回目录中所有文件的列表相比,yield 的性能在什么时候开始恶化?
在这里,我假设有足够的 RAM 来返回(可能很大)列表。
PS 我在评论中内联代码时遇到问题,所以我将在这里放一些示例。
def list_dirs_list():
# list version
return glob.glob(/some/path/*)
def list_dirs_iter():
# iterator version
return glob.iglob(/some/path/*)
在幕后,对 glob 的调用都使用 os.listdir ,因此看起来它们在性能方面是等效的。但是 这个 Python 文档 似乎暗示了 glob .iglob 更快。
If doing a directory listing and reading the files within, at what point does the performance of yield start to deteriorate, compared to returning a list of all the files in the directory?
Here I'm assuming one has enough RAM to return the (potentially huge) list.
PS I'm having problems inlining code in a comment, so I'll put some examples in here.
def list_dirs_list():
# list version
return glob.glob(/some/path/*)
def list_dirs_iter():
# iterator version
return glob.iglob(/some/path/*)
Behind the scenes both calls to glob use os.listdir so it would seem they are equivalent performance-wise. But this Python doc seems to imply glob.iglob is faster.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
进一步使用
yield
不会导致性能下降。事实上,与在列表中组装事物相比,元素越多,yield
实际上会提高。There is no point at which further use of
yield
results in decreased performance. In fact, as compared to assembling things in a list,yield
actually improves by comparison the more elements there are.这取决于您如何进行目录列表。 Python 中的大多数机制都会将整个目录列表拉入列表中;如果这样做的话,即使是单一的产量也是浪费。如果使用
opendir(3)
那么根据 XKCD 对“随机”的定义,它可能是一个随机数。It depends on how you're doing the directory listing. Most mechanisms in Python pull the entire directory listing into a list; if doing it that way then even a single yield is a waste. If using
opendir(3)
then it's probably a random number, according to XKCD's definition of "random".使用yield 在功能上类似于编写函子类,甚至从实现或性能的角度来看,除了它实际上可能比自制类上的 __call__ 方法更快一点,因为它内置于生成器的 C 实现中。
为了锤击这个家,以下的使用和粗略实现是相同的:
using yield is functionally similar to writing a functor class, even from an implementation or performance perspective, except that it can probably actually call the generator a little bit quicker than the
__call__
method on a self-made class, because that is built in to the generator's C implementation.To hammer this home, the use and rough implementation of the following is the same:
在 Python 2.7 中,
glob
的定义是def glob(pathname): return list(iglob(pathname))
因此至少对于这个版本,
glob
code> 永远不会比iglob
更快。In Python 2.7, the definition of
glob
isdef glob(pathname): return list(iglob(pathname))
So at least for this version,
glob
can never be faster thaniglob
.