网络使用R刮擦嵌入式表
我目前正在研究一个项目,以刮擦此网站上的性能特征表的内容
我想从此表中提取的数据是
我写的代码的12 m尾声收益率为3.43%:
url <- "https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund"
etf_Data <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="fundamentalsAndRisk"]/div') %>%
html_table()
etf_Data <- etf_Data[[1]]
它为我提供了一个空列表,其中包含ETF_DATA [[1]]中的错误消息的错误'错误消息:从范围'
使用Google Inspect我尝试了各种XPath,包括在HTML_TEXT中阅读它:
url <- "https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund"
etf_Data <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="fundamentalsAndRisk"]/div/div[4]/span[2]') %>%
html_text()
etf_Data <- etf_Data[[1]]
但是没有成功。
经过其他堆栈溢出响应后,我无法解决问题。
有人可以帮助吗?
谢谢 c
I am currently working a project to scrape the content of the Performance Characteristics table on this website
https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund
The data I am wanting to extract from this table is the 12 m trailing yield of 3.43%
The code I wrote to do this is:
url <- "https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund"
etf_Data <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="fundamentalsAndRisk"]/div') %>%
html_table()
etf_Data <- etf_Data[[1]]
which provided me with an empty list with the error message 'Error in etf_Data[[1]] : subscript out of bounds'
Using Google inspect I have tried various XPaths including reading it in html_text:
url <- "https://www.ishares.com/uk/individual/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund"
etf_Data <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="fundamentalsAndRisk"]/div/div[4]/span[2]') %>%
html_text()
etf_Data <- etf_Data[[1]]
However with no success.
Having gone through other Stack Overflow responses I have not been able to solve my issue.
Would someone be able to assist.
Thank you
C
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
几件事:
,您可以使用EntryPassThrough参数= true = true以获取正确的URI,然后使用:包含和相邻的兄弟姐妹组合,以获取所需的内容价值
Couple of things:
You can add a queryString with EntryPassthrough parameter = True to get to the right URI and then use :contains and an adjacent sibling combinator to get the desired value