使用 BeautifulSoup 从网页中抓取数据框中的 pdf 链接
我想提取所有 pdf 链接,这些链接将我们直接带到可以下载所有 pdf 的页面。我想将这些 pdf 存储在数据框中
url = "https://www.volvogroup.com/en/news-and-media/press-releases.html"
source = requests.get(url)
soup = BeautifulSoup(source.text , "html.parser")
news_check = soup.find_all("a" , class_ = "articlelist__contentDownloadItem")
for i in news_check :
print(i)
break
data = set()
for i in soup.find_all('a'):
for j in i.find_all('href'):
pdf_link = "https://www.volvogroup.com" + j.get('.pdf')
data.add(j)
print(pdf_link)
I want to extract all the pdf links which takes us to the page directly from where we can download all the pdfs . I want to store these pdfs in a data frame
url = "https://www.volvogroup.com/en/news-and-media/press-releases.html"
source = requests.get(url)
soup = BeautifulSoup(source.text , "html.parser")
news_check = soup.find_all("a" , class_ = "articlelist__contentDownloadItem")
for i in news_check :
print(i)
break
data = set()
for i in soup.find_all('a'):
for j in i.find_all('href'):
pdf_link = "https://www.volvogroup.com" + j.get('.pdf')
data.add(j)
print(pdf_link)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以尝试以下代码来获取 pdf 链接:
输出:
You can try below code to get pdf link:
Output :