美丽的小组随机陷入循环
我一直在使用BeautifulSoup来从网站上提取多页的评论,并且它主要是构成了奇迹,但是在大型数据集上,它一直被困在看似随机的点上。
我的代码始终符合以下内容。
for x in range(len(reviews)):
reviewsoups.append(BeautifulSoup(requests.get(reviews[x]).text, ‘html.parser))
我从来没有遇到过任何错误或其他任何内容(除了随机连接冲突错误),但是似乎循环被随机粘住到我一直必须中断内核的地步(通常需要10分钟以上才能实际工作)并从循环卡住的索引中重新启动该过程。
似乎在某些情况下,如果我尝试在代码运行时使用笔记本电脑(例如打开镀铬等)会加剧这种情况。
谁能帮忙?不得不坐在我的笔记本电脑旁等待这样的事情,这真是令人烦恼。
提前致谢。
I’ve been using BeautifulSoup to extract multiple pages of reviews from websites, and it’s worked wonders mostly but on large datasets has constantly been getting stuck at seemingly random points.
My code is always along the lines of the following.
for x in range(len(reviews)):
reviewsoups.append(BeautifulSoup(requests.get(reviews[x]).text, ‘html.parser))
I’ve never gotten any errors or anything (except the random ConnectionReset error), but it just seems as though the loop gets stuck randomly to the point where I consistently have to interrupt the kernel (which often takes 10+ minutes to actually work) and restart the process from the index where the loop got stuck.
It seems as though in some cases, if I try and use my laptop whatsoever while the code is running (like opening Chrome etc) that aggravates the situation.
Can anyone help? It’s just incredibly irritating having to sit by my laptop waiting just in case something like this happens.
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我想我找到了解决方案。
所以我试图“汤” 9000 URL。我所做的是使用Globals()函数进行迭代创建变量,并有每个变量商店100汤的想法,因此,这将是90个变量,每个变量有100个汤,而不是9000列。
我注意到前几百个是非常快速,然后放慢脚步,但是一次运行100次,并且不断延长已经庞大的清单会有所作为。
我也没有撞车。
请记住,在我被困在8000分之后,我只尝试了最后1000左右的尝试,但这更快了,没有技术问题。
下次,我将初始化一个循环,该循环结合了每个变量,并将第1056汤附加到第11个变量为第56个元素,如果有意义。
I think I’ve found a solution.
So I was trying to ‘soup’ 9000 URLs. What I did was iteratively created variables using the globals() function, with the idea of having each variable store 100 soups, so that would be 90 variables with 100 soups each rather than one list with 9000.
I noticed that the first few hundred were very quick and then slowed down, however running 100 at a time and not constantly elongating an already massive list made a difference.
I also got no crashes.
Bear in mind I only tried this with the last 1000 or so after I got stuck at the 8000 mark, but it was so much quicker and no technical issues.
Next time I will initialise a for loop that incorporates each variable and appends eg the 1056th soup to the 11th variable as the 56th element if that makes sense.