Python从URL列表中下载/scrape SSRN论文
我有一堆链接,除了末尾ID之外,我的链接完全相同。我要做的就是循环浏览每个链接,然后使用下载为PDF按钮作为PDF下载纸张。在理想的世界中,文件名将是论文的标题,但是如果不可能,我以后可以重命名。下载它们更重要。我有200个链接,但我将在这里提供5个链接。
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3860262
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2521007
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3146924
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2488552
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3330134
我想做什么吗?我很熟悉循环浏览URL来刮擦桌子,但是我从未尝试使用下载按钮做任何事情。
我没有示例代码,因为我不知道从哪里开始。但是类似
for url in urls:
(go to each link)
(download as pdf via the "download this paper" button)
(save file as title of paper)
I have a bunch of links that are the exact same except for the id at the end. All I want to do is loop through each link and download the paper as a PDF using the download as PDF button. In an ideal world, the filename would be the title of the paper but if that isn't possible I can rename them later. Getting them all downloaded is more important. I have like 200 links but I will provide 5 here for an example.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3860262
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2521007
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3146924
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2488552
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3330134
Is what I want to do possible? I have some familiarity with looping through URLs to scrape tables but I have never tried to do anything with a download button.
I don't have example code because I don't know where to start here. But something like
for url in urls:
(go to each link)
(download as pdf via the "download this paper" button)
(save file as title of paper)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试:
打印:
并将PDF保存为:
Try:
Prints:
and saves the PDFs as: