获取 iframe 谷歌驱动器内的元素(链接)
我正在尝试以编程方式下载此页面上的两个 zip 文件:
https: //sites.google.com/site/ucinetsoftware/datasets/covert-networks/siren
这两个 zip 文件实际上位于不同的页面上,但 href
这些页面位于该页面内。所以,我想做的是:
- 获取两个 zip 文件所在页面的链接(它们位于公共谷歌驱动器上)
- 将这两个 zip 文件下载到我的计算机上
(是的,我知道我可以手动下载它们,但我需要下载更多页面,所以我想自动化此过程)
不幸的是,我什至无法迈出第一步。我首先将页面加载到 rvest 中,然后尝试获取元素 div.flip-entry-info ,但这不会产生任何结果。我相信这是因为它是该页面内 iframe
的一部分。那么,如何访问包含指向这些文件实际位置的 href
的元素呢?
对于第二步,我需要找到一种从谷歌驱动器下载数据的方法。
例如,这两个 zip 文件之一可从以下位置获取:https://drive.google.com/file/d/1BFN_1n-5EZ3rLrqrqWsAsBR9exjXuUKF/view。
但我完全不知道从那里下载文件。 Chrome 中的“检查”选项在此页面上不起作用,并且 selectorgadget
也不会显示任何有用的信息。
谁能帮我通过 R 下载这些文件?我完全被困住了。
I am trying to programmatically download the two zip files on this page:
https://sites.google.com/site/ucinetsoftware/datasets/covert-networks/siren
The two zip files are actually on separate pages, but the href
to those pages are inside this page. So, what I want to do:
- get the links to the pages where each of the two zip files reside (they are on a public google drive)
- download the two zip files to my computer
(yes, I know I can download them manually, but there are more pages I need to download from, so I would like to automate this process)
Unfortunately, I can't even get the first step going. I start with loading the page into rvest
and then try to get the element div.flip-entry-info
but this yields no results. I believe this is because it is part of an iframe
inside this page. So, how do access the elements that contain the href
that point to the actual location of these files?
For the second step, I need to find a way to download the data from the google drive.
For example, one of these two zip files is available at: https://drive.google.com/file/d/1BFN_1n-5EZ3rLrqrqWsAsBR9exjXuUKF/view.
But I have absolutely no clue as to download the file from there. The 'inspect' option in Chrome doesn't work on this page and selectorgadget
doesn't reveal anything useful either.
Can anyone help me to download these files through R? I am totally stuck.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我们可以在
iframe
中获取链接,您可以在这里参考教程,
https://github.com/yusuzech/r-web-scraping-cheat-sheet/blob/master/README.md#rvest7.2
要下载文件,我们可以使用
googledrive< /代码> 库。
We can get the links inside the
iframe
You can refer tutorial here,
https://github.com/yusuzech/r-web-scraping-cheat-sheet/blob/master/README.md#rvest7.2
To download the files we can use
googledrive
library.