我想从“您如何为OpenStreetMap贡献?”中刮除所有信息? ( https://hdyc.neis-one.org/ )。为了获得用户配置文件,有必要登录到OSM。
由于需要刮擦很多配置文件,因此我想使用rvest软件包自动刮擦列表()。
到目前为止,我尝试这样做:
> library(rvest)
> url <- "https://hdyc.neis-one.org/?mrsensible"
> pgsession <- session(URL)
> pgsession
<session> https://hdyc.neis-one.org/?mrsensible
Status: 200
Type: text/html
Size: 4245
当我尝试使用 read_html(url)
读取OSM记录的信息时,这就是事实:
> read_html(url)
{html_document}
<html>
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n<meta ...
[2] <body onload="init();">\n <div class="copyright">Copyright © <a target="_bl ...
因此,它并没有真正捕获信息。
可以使用 rvest
代码刮擦数据吗?
非常感谢!
I would like to scrape all the information from the "How did you contribute to OpenStreetMap?" (https://hdyc.neis-one.org/). It is necessary to login to OSM in order to gain a user profile.
Since there are quite a lot of profiles that need to be scraped, I want to automatically scrape the list using the rvest package (https://rvest.tidyverse.org/).
So far I attempted to do this:
> library(rvest)
> url <- "https://hdyc.neis-one.org/?mrsensible"
> pgsession <- session(URL)
> pgsession
<session> https://hdyc.neis-one.org/?mrsensible
Status: 200
Type: text/html
Size: 4245
When I tried to read the information of my OSM record with read_html(url)
, here is what it turns out:
> read_html(url)
{html_document}
<html>
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n<meta ...
[2] <body onload="init();">\n <div class="copyright">Copyright © <a target="_bl ...
So it doesn't really capture the information.
Would it be possible to scrape the data using rvest
codes?
Many thanks in advance!
发布评论