如何减慢生成器对象从 Web 服务获取数据的迭代速度?
我正在使用 Freebase-Python 模块来迭代数百个结果。使用:
results = freebase.mqlreaditer(query, extended=True)
我得到一个 Python 生成器,我可以像这样迭代:
for r in results:
#do stuff, like create new object and save to datastore
mqlreaditer() 一次获取 100 个 JSON 结果。结果 100 中的一个条目是一个短字符串,例如:
result:: {u'type': u'/games/game', u'mid': u'/m/0dgf58f', u'key':
{u'namespace': u'/user/pak21/', u'value': u'42617'}}
我在本地遇到错误:
"WARNING 2011-01-29 15:59:48,383 recording.py:365]
Full proto too large to save, cleared variables."
不确定发生了什么,但我怀疑它太快了,所以我想放慢迭代速度或将其分解为大块。我不确定发电机如何工作或我的选择是什么。请注意,这是在 Google App Engine 上运行的,因此 Python 依赖项和使用本地应用程序引擎启动器的怪癖适用。
I'm using the Freebase-Python module to iterate through hundreds of results. Using:
results = freebase.mqlreaditer(query, extended=True)
I get a Python generator that I can iterate through like so:
for r in results:
#do stuff, like create new object and save to datastore
mqlreaditer() fetches JSON results 100 at a time. One entry in that result of 100 is a short String like:
result:: {u'type': u'/games/game', u'mid': u'/m/0dgf58f', u'key':
{u'namespace': u'/user/pak21/', u'value': u'42617'}}
I'm running into an error locally:
"WARNING 2011-01-29 15:59:48,383 recording.py:365]
Full proto too large to save, cleared variables."
Not sure what is happening but I suspect it's just too much too fast, so I want to slow down the iteration OR break it out into chunks. I'm not sure how generators work or what my option are. Note this is running on Google App Engine, so Python dependencies and quirks of using the local app engine launcher apply.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
生成器只是一个看起来像序列的函数,但它一次为您检索一个项目,而不是预先拥有整个数据列表,这通常需要更多的内存。如果你愿意的话,它是一个“即时”迭代。但是,您无法保证为此读取或缓存多少数据。有时它很可能已经拥有完整的数据 - 你只是不知道,不查看文档或代码。
如果确实是速度问题,那么执行
import time
并在循环内添加诸如time.sleep(1.0)
之类的调用,每次都会延迟一秒:但我怀疑这实际上并不是问题所在,也不是解决方案应该是什么。也许您的查询检索了太多数据,或者对象太大?A generator is just a function that appears to look like a sequence but which retrieves the items one at a time for you instead of having the whole list of data up-front, which often requires a lot more memory. It's a "just-in-time" iterable, if you like. But, you have no guarantees as to how much data it is reading or caching to do that. Sometimes it could well have the entire data already - you just don't know, without looking at the docs or the code.
If it really is a question of speed, then doing
import time
and adding a call such astime.sleep(1.0)
inside the loop will delay it for a second each time: but I suspect that is not actually what the problem is, nor what the solution should be. Perhaps your query is retrieving too much data, or the objects are too large?