Webscraping“您如何为OpenStreetMap贡献”带rvest

发布于 2025-01-24 20:12:59 字数 1054 浏览 1 评论 0 原文

我想从“您如何为OpenStreetMap贡献？”中刮除所有信息？（ https://hdyc.neis-one.org/ ）。为了获得用户配置文件，有必要登录到OSM。

由于需要刮擦很多配置文件，因此我想使用rvest软件包自动刮擦列表（）。

到目前为止，我尝试这样做：

> library(rvest)

> url <- "https://hdyc.neis-one.org/?mrsensible"
> pgsession <- session(URL)

> pgsession
<session> https://hdyc.neis-one.org/?mrsensible
  Status: 200
  Type:   text/html
  Size:   4245

当我尝试使用 read_html（url）读取OSM记录的信息时，这就是事实：

> read_html(url)

{html_document}
<html>
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n<meta ...
[2] <body onload="init();">\n        <div class="copyright">Copyright © <a target="_bl ...

因此，它并没有真正捕获信息。

可以使用 rvest 代码刮擦数据吗？非常感谢！

原文

I would like to scrape all the information from the "How did you contribute to OpenStreetMap?" (https://hdyc.neis-one.org/). It is necessary to login to OSM in order to gain a user profile.

Since there are quite a lot of profiles that need to be scraped, I want to automatically scrape the list using the rvest package (https://rvest.tidyverse.org/).

So far I attempted to do this:

> library(rvest)

> url <- "https://hdyc.neis-one.org/?mrsensible"
> pgsession <- session(URL)

> pgsession
<session> https://hdyc.neis-one.org/?mrsensible
  Status: 200
  Type:   text/html
  Size:   4245

When I tried to read the information of my OSM record with read_html(url), here is what it turns out:

> read_html(url)

{html_document}
<html>
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n<meta ...
[2] <body onload="init();">\n        <div class="copyright">Copyright © <a target="_bl ...

So it doesn't really capture the information.

Would it be possible to scrape the data using rvest codes?
Many thanks in advance!