如何减慢生成器对象从 Web 服务获取数据的迭代速度?

发布于 2024-10-14 19:09:43 字数 762 浏览 7 评论 0原文

我正在使用 Freebase-Python 模块来迭代数百个结果。使用:

results = freebase.mqlreaditer(query, extended=True) 

我得到一个 Python 生成器,我可以像这样迭代:

for r in results:
   #do stuff, like create new object and save to datastore

mqlreaditer() 一次获取 100 个 JSON 结果。结果 100 中的一个条目是一个短字符串,例如:

result:: {u'type': u'/games/game', u'mid': u'/m/0dgf58f', u'key': 
          {u'namespace': u'/user/pak21/', u'value': u'42617'}}

我在本地遇到错误:

"WARNING  2011-01-29 15:59:48,383 recording.py:365] 
 Full proto too large to save, cleared variables."

不确定发生了什么,但我怀疑它太快了,所以我想放慢迭代速度或将其分解为大块。我不确定发电机如何工作或我的选择是什么。请注意,这是在 Google App Engine 上运行的,因此 Python 依赖项和使用本地应用程序引擎启动器的怪癖适用。

I'm using the Freebase-Python module to iterate through hundreds of results. Using:

results = freebase.mqlreaditer(query, extended=True) 

I get a Python generator that I can iterate through like so:

for r in results:
   #do stuff, like create new object and save to datastore

mqlreaditer() fetches JSON results 100 at a time. One entry in that result of 100 is a short String like:

result:: {u'type': u'/games/game', u'mid': u'/m/0dgf58f', u'key': 
          {u'namespace': u'/user/pak21/', u'value': u'42617'}}

I'm running into an error locally:

"WARNING  2011-01-29 15:59:48,383 recording.py:365] 
 Full proto too large to save, cleared variables."

Not sure what is happening but I suspect it's just too much too fast, so I want to slow down the iteration OR break it out into chunks. I'm not sure how generators work or what my option are. Note this is running on Google App Engine, so Python dependencies and quirks of using the local app engine launcher apply.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

深府石板幽径 2024-10-21 19:09:43

生成器只是一个看起来像序列的函数,但它一次为您检索一个项目,而不是预先拥有整个数据列表,这通常需要更多的内存。如果你愿意的话,它是一个“即时”迭代。但是,您无法保证为此读取或缓存多少数据。有时它很可能已经拥有完整的数据 - 你只是不知道,不查看文档或代码。

如果确实是速度问题,那么执行 import time 并在循环内添加诸如 time.sleep(1.0) 之类的调用,每次都会延迟一秒:但我怀疑这实际上并不是问题所在,也不是解决方案应该是什么。也许您的查询检索了太多数据,或者对象太大?

A generator is just a function that appears to look like a sequence but which retrieves the items one at a time for you instead of having the whole list of data up-front, which often requires a lot more memory. It's a "just-in-time" iterable, if you like. But, you have no guarantees as to how much data it is reading or caching to do that. Sometimes it could well have the entire data already - you just don't know, without looking at the docs or the code.

If it really is a question of speed, then doing import time and adding a call such as time.sleep(1.0) inside the loop will delay it for a second each time: but I suspect that is not actually what the problem is, nor what the solution should be. Perhaps your query is retrieving too much data, or the objects are too large?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文