“内存不足”机械化错误
我试图从网站上一页一页地抓取一些信息,基本上这就是我所做的:
import mechanize
MechBrowser = mechanize.Browser()
Counter = 0
while Counter < 5000:
Response = MechBrowser.open("http://example.com/page" + str(Counter))
Html = Response.read()
Response.close()
OutputFile = open("Output.txt", "a")
OutputFile.write(Html)
OutputFile.close()
Counter = Counter + 1
嗯,上面的代码最终抛出了“内存不足”错误,并且在任务管理器中显示该脚本在执行完之后使用了几乎 1GB 内存连续几个小时...怎么会这样?!
有人能告诉我出了什么问题吗?
I was trying to scrape some information from a website page by page, basically here's what I did:
import mechanize
MechBrowser = mechanize.Browser()
Counter = 0
while Counter < 5000:
Response = MechBrowser.open("http://example.com/page" + str(Counter))
Html = Response.read()
Response.close()
OutputFile = open("Output.txt", "a")
OutputFile.write(Html)
OutputFile.close()
Counter = Counter + 1
Well, the above codes ended up throwing out "Out of Memory" error and in task manager it shows that the script used up almost 1GB memory after several hours running... how come?!
Would anybody tell me what went wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这并不完全是内存泄漏,而是一个未记录的功能。基本上,mechanize.Browser() 会将所有浏览器历史记录一起存储在内存中。
如果您在
Response.close()
之后添加对MechBrowser.clear_history()
的调用,则应该可以解决该问题。This is not exactly a memory leak, but rather an undocumented feature. Basically,
mechanize.Browser()
is collectively storing all browser history in memory as it goes.If you add a call to
MechBrowser.clear_history()
afterResponse.close()
, it should resolve the problem.