soup.select()返回一个空列表
我有一个。选择的问题,它总是在练习Webscrap的同时返回一个空列表。 我在以下页面上工作: https://presse.ania.net/news/news/? page = 1 使用BeautifulSoup。
我正在获取和解析html如下:
url = f"https://presse.ania.net/news/?page=1"
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36'
mr = requests.get(url, headers = headers)
soupmp = bs(mr.content, "lxml")
我尝试检索页面上显示的每个文章的URL,在类“标题row-space-1”下(我使用Chrome的开发人员工具来查找类,残疾JavaScript,如其他所建议的JavaScript帖子),然后将它们放入名为“新闻
news = []
for link in soupmp.select("a.title.row-space-1[href]"):
news.append(link.get('href'))
列表
[]
”
- 的
- 中页面下载
- 使用.find_all,。查找和选择,首先使用CSS选择器尝试,然后尝试Kwargs(全部返回空列表或非类型对象)。
这些都不奏效,我陷入了错误。我认为我的理解此HTML并使用CSS选择类有特定的方式,但我找不到什么(部分是因为我成功地将此代码用于其他网站。)。
您能否教育我关于我缺少的东西?
感谢您的帮助!
I have an issue with .select which always returns an empty list while practicing webscraping.
I am working on the following page: https://presse.ania.net/news/?page=1 using BeautifulSoup.
I am getting and parsing HTML as following:
url = f"https://presse.ania.net/news/?page=1"
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36'
mr = requests.get(url, headers = headers)
soupmp = bs(mr.content, "lxml")
I try to retrieve the urls of each articles displayed on the page, under class "title row-space-1" (I use developer tools of chrome to find class, disabled JavaScript like suggested in other posts), and put them in a list called "news"
news = []
for link in soupmp.select("a.title.row-space-1[href]"):
news.append(link.get('href'))
However I keep having an empty list when I print 'news'
[]
Searching on Stackoverflow I tried:
- Disabling JavaScript on website
- Adding a time sleep to let the page download
- Using .find_all, . find and .select, tried with CSS selectors first then kwargs (all return empty list or NoneType object).
None of these worked and I am stuck with my mistake. I think there is something specific in my way of understanding this HTML and selecting class with CSS but I can't find what (partly because I successfully used this code for other websites earlier.).
Could you please educate me on what I am missing?
I appreciate your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试以下操作:
输出:
Try this:
Output:
您可以尝试一下并插入通缉元素的类。
我希望对您有所帮助。
you could try this and insert the class of the wanted element.
I hope that helps you a bit.