如何用r刮擦表格名称？

发布于 2025-02-01 16:24:31 字数 598 浏览 3 评论 0原文

我正在尝试刮擦几个网页，特别是页面中的一些表。
但是问题是表的位置相对于每个页面发生了变化。
这是我的代码。

url <- paste0("https://en.wikipedia.org/wiki/2011%E2%80%9312_Welsh_Premier_League")
webpage <- read_html(url)
j<-webpage%>% html_node(xpath='//*[@id="mw-content-text"]/div[1]/table') %>%html_table(fill=T)

该代码效果很好，但我也想刮擦其他Seaons。桌子的位置在每个季节都会发生变化。
我的问题是，我发现我想刮擦的表班被称为“可鲁特的plainrowheaders”，如下所示。我想知道如何用表格名称刮擦。如何在Wikipedia页面中使用名为“ Wikiable Plainrowheaders”的表类刮擦所有表？

<table class="wikitable plainrowheaders" style="text-align:center;font-size:100%;">

原文

I am tryng to scrape several web pages, particulaty some tables in the pages.
But the problem is the places of tables change with respect to each page.
Here is my code.

url <- paste0("https://en.wikipedia.org/wiki/2011%E2%80%9312_Welsh_Premier_League")
webpage <- read_html(url)
j<-webpage%>% html_node(xpath='//*[@id="mw-content-text"]/div[1]/table') %>%html_table(fill=T)

This code works fine, but I want to scrape the other seaons, too. The place of table changes in every season.
My question is I found that the table class that I want to scrape is named as "wikitable plainrowheaders", as below. I would like to know how to scrape with table class name.
How to scrape all tables with table class named as "wikitable plainrowheaders" in a wikipedia page?

<table class="wikitable plainrowheaders" style="text-align:center;font-size:100%;">

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

少女净妖师 2025-02-08 16:24:31

既然您知道表类名称，则只需更改相应的XPATH即可。

library(rvest)

url <- paste0("https://en.wikipedia.org/wiki/2011%E2%80%9312_Welsh_Premier_League")

webpage <- read_html(url)

j <- webpage %>%
  html_nodes(xpath="//table[@class='wikitable plainrowheaders']") %>%
  html_table(fill=T)

Since you know the table class name, just change the corresponding xpath.

library(rvest)

url <- paste0("https://en.wikipedia.org/wiki/2011%E2%80%9312_Welsh_Premier_League")

webpage <- read_html(url)

j <- webpage %>%
  html_nodes(xpath="//table[@class='wikitable plainrowheaders']") %>%
  html_table(fill=T)

回复收藏 0 原文

~没有更多了~