网络使用R刮擦嵌入式表

发布于 2025-02-06 02:11:43 字数 1149 浏览 2 评论 0原文

我目前正在研究一个项目,以刮擦此网站上的性能特征表的内容

https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-ftse-100-ucits-eetf-inc-inc-inc-fund

我想从此表中提取的数据是

我写的代码的12 m尾声收益率为3.43%:

url <- "https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund"
etf_Data <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="fundamentalsAndRisk"]/div') %>%
  html_table()
etf_Data <- etf_Data[[1]]

它为我提供了一个空列表,其中包含ETF_DATA [[1]]中的错误消息的错误'错误消息:从范围'

使用Google Inspect我尝试了各种XPath,包括在HTML_TEXT中阅读它:

url <- "https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund"
etf_Data <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="fundamentalsAndRisk"]/div/div[4]/span[2]') %>%
  html_text()
etf_Data <- etf_Data[[1]]

但是没有成功。

经过其他堆栈溢出响应后,我无法解决问题。

有人可以帮助吗?

谢谢 c

I am currently working a project to scrape the content of the Performance Characteristics table on this website

https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund

The data I am wanting to extract from this table is the 12 m trailing yield of 3.43%

The code I wrote to do this is:

url <- "https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund"
etf_Data <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="fundamentalsAndRisk"]/div') %>%
  html_table()
etf_Data <- etf_Data[[1]]

which provided me with an empty list with the error message 'Error in etf_Data[[1]] : subscript out of bounds'

Using Google inspect I have tried various XPaths including reading it in html_text:

url <- "https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund"
etf_Data <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="fundamentalsAndRisk"]/div/div[4]/span[2]') %>%
  html_text()
etf_Data <- etf_Data[[1]]

However with no success.

Having gone through other Stack Overflow responses I have not been able to solve my issue.

Would someone be able to assist.

Thank you
C

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

电影里的梦 2025-02-13 02:11:43

几件事:

  1. 为了获得所需的内容,您最终有另一个URI。当您手动接受页面上的某些条件时,
  2. 您想要的数据不在表中时

,您可以使用EntryPassThrough参数= true = true以获取正确的URI,然后使用:包含和相邻的兄弟姐妹组合,以获取所需的内容价值

library(rvest)
library(magrittr)

url <- "https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund?switchLocale=y&siteEntryPassthrough=true"
trailing_12m_yield <- url %>%
  read_html() %>%
  html_element('.caption:contains("12m Trailing Yield") + .data') %>% html_text2()

Couple of things:

  1. There is a different URI you end up at in order to get the content you want. This comes when you manually accept certain conditions on the page
  2. The data you want is not within a table

You can add a queryString with EntryPassthrough parameter = True to get to the right URI and then use :contains and an adjacent sibling combinator to get the desired value

library(rvest)
library(magrittr)

url <- "https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund?switchLocale=y&siteEntryPassthrough=true"
trailing_12m_yield <- url %>%
  read_html() %>%
  html_element('.caption:contains("12m Trailing Yield") + .data') %>% html_text2()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文