Beautifutsoup 4 .find_all()突然停止工作
我正在尝试创建一个使用Google Scholar的自动化科学文献收藏家。 一切进展顺利,我得到了我想要的结果,但是突然有些事情破裂了,尽管数据流入了汤,但在第一个.find_all()之后,它还是空了所有东西。奇怪的是,使用预先下载的.htm文件时不会发生这种情况。
我的代码:
site=requests.get(url)
site1=site.text
soup=bs(site1, 'html.parser')
ri=soup.find_all("div", class_='gs_ri')
以前RI返回了10片HTML代码,从中进一步的过程将我需要的所有内容分开,但是今天早晨,出于我的理解,它开始空空了,以及以前的版本我没有触摸。我可以跟随管道直到
soup=bs(site, 'html.parser')
但不在后来。 “汤”仍然可以恢复一切。
任何帮助将不胜感激,谢谢。
I am trying to create an automated scientific literature collector, that uses google scholar.
All was going well, I was getting the results I wanted, but suddenly something broke and now, despite the data going into soup, it returns everything empty after the first .find_all(). Strangely enough, this does not happen when using a pre-downloaded .htm file.
My code:
site=requests.get(url)
site1=site.text
soup=bs(site1, 'html.parser')
ri=soup.find_all("div", class_='gs_ri')
Previously ri returned 10 pieces of html code from which further processes would separate everything I needed, but today morning, for reasons beyond my comprehension, it started returning empty, as well as the previous version which I did not touch. I can follow the pipeline up until
soup=bs(site, 'html.parser')
but not afterwards. 'Soup' still returns everything in order.
Any help would be much appreciated, thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用
requests.get(url)
从网站提取信息,然后将对象传递到beautifulsoup
并提取内容而不是文本。这样:
编辑:您想使用内容,因为您想将原始字节流传递给BeautifulSoup。
page.text
仅返回可能导致故障的字符串。Use
requests.get(URL)
to extract the information from a website and pass the object toBeautifulSoup
and extract the content, not the text.Like this:
Edit: You want to use the content, because you want to pass the raw byte stream to BeautifulSoup.
page.text
only returns a string which can cause malfunctions.