使用“ DataTable”嵌入在渲染的HTML文件中的编程提取/scrape .CSV表
是否可以从.html
使用.html
文件中删除dataTable
的内容或嵌入式.csv
文件?例如,使用dt :: datatable()
与选项button = c('csv')
内部.rmd
file> file。
我的第一种方法是使用rvest
(例如rvest :: read_html(x ='example..html')%>%rvest :: html_table()找不到任何表,我认为这是因为
DataTable
不使用HTML表元素显示数据。
以下是.rmd
文件的最小示例,我从其中编织了一个'.html'文件,其中包括两个我想刮擦的数据列表。
示例rmd
---
title: "Example for scraping"
output: html_document
---
# Example 1
```{r}
tbl1 <- mtcars[1:20, 1:4]
DT::datatable(tbl1, extensions = 'Buttons', options = list(dom = 'Blfrtip',buttons = c('csv'),paging=FALSE))
```
# Example 2
```{r}
tbl2 <- mtcars[21:32, 7:11]
DT::datatable(tbl2, extensions = 'Buttons', options = list(dom = 'Blfrtip',buttons = c('csv'),paging=FALSE))
```
这可以在rstudio中编织,也可以使用rmarkDown :: render(input ='example.rmd')
,在example.html <
中/代码>我想从中刮擦。
谢谢你!
Is it possible to scrape the contents of a datatable
, or the embedded .csv
file, from a .html
file produced using knitr
? E.g. using DT::datatable()
with the option buttons = c('csv')
inside a .Rmd
file.
My first approach is to use rvest
(e.g. rvest::read_html(x = 'example.html') %>% rvest::html_table()
), but it doesn't find any tables, I assume that's because datatable
doesn't use the html table element to display data.
Below is a minimal example of an .Rmd
file that I knit a '.html' file from that includes two datatables I'd like to scrape.
example.Rmd
---
title: "Example for scraping"
output: html_document
---
# Example 1
```{r}
tbl1 <- mtcars[1:20, 1:4]
DT::datatable(tbl1, extensions = 'Buttons', options = list(dom = 'Blfrtip',buttons = c('csv'),paging=FALSE))
```
# Example 2
```{r}
tbl2 <- mtcars[21:32, 7:11]
DT::datatable(tbl2, extensions = 'Buttons', options = list(dom = 'Blfrtip',buttons = c('csv'),paging=FALSE))
```
This can be knitted within RStudio, or using rmarkdown::render(input = 'example.Rmd')
, resulting in the example.html
I'd like to scrape from.
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
htmlwidgets
将其呈现到htmltables
中,将数据存储为JSON,您可以访问和使用以重建数据范围。通过这种方式,您还将解决各种分页的设置。我在下面进行了一个说明性的示例:输出:
更新。
一种使用相同方法获取所有桌子的方法,不是最美丽的代码:
The
htmlwidgets
, that renders intohtmltables
, stores the data as JSON that you could access and use to rebuilt dataframes. In this way you would also get around all kinds of settings for the pagination. I've made an illustrative example of this approach below:Output:
Update.
A way to get all the tables using the same approach, i.e. not the most beautiful code: