如何使用rvest从GNOMAD数据库中提取信息

发布于 2025-02-05 00:51:23 字数 895 浏览 2 评论 0原文

我正在尝试使用rvest从GNOMAD数据库中提取一些信息，尽管遵循了一些教程，但我并未提取正确的信息。 html_nodes的参数来自selectorgadget在Chrome上的扩展。以下是我使用的代码。感谢任何帮助，以指出我在做什么错。

library(rvest)
library(dplyr)
library(xml2)

link="https://gnomad.broadinstitute.org/region/1-55516868-55516908?dataset=gnomad_r3"
page=read_html(link)

varaints=page %>% html_nodes(".cmvTB span")%>% html_text()

输出应为变体ID的列表：1-55516880-TC，1-55516902-TG，1-555516903-G-GC，1-55516905-C-CT。

我还试图使用整个桌子的XML路径，但仍然不运气。

link <- "https://gnomad.broadinstitute.org/region/1-55516868-55516908?dataset=gnomad_r3"
page <- xml2::read_html(link)
dat <- rvest::html_nodes(page, "body")
tab <- xml2::xml_find_all(dat, '//*[contains(concat( " ", @class, " " ), concat( " ", "bxQxTC", " " ))]//div')

原文

I am trying to extract some information from the gnomad database using rvest, despite following some tutorials I am not extracting the right information. The parameter for html_nodes comes from the selectorgadget extension on chrome. Below is the code I used. Appreciate any help to point out what I am doing wrong.

library(rvest)
library(dplyr)
library(xml2)

link="https://gnomad.broadinstitute.org/region/1-55516868-55516908?dataset=gnomad_r3"
page=read_html(link)

varaints=page %>% html_nodes(".cmvTB span")%>% html_text()

The output should be a list of the variant IDs : 1-55516880-T-C, 1-55516902-T-G, 1-55516903-G-GC, 1-55516905-C-CT.

I also attempted to use the XML path of the whole table, but still not luck.

link <- "https://gnomad.broadinstitute.org/region/1-55516868-55516908?dataset=gnomad_r3"
page <- xml2::read_html(link)
dat <- rvest::html_nodes(page, "body")
tab <- xml2::xml_find_all(dat, '//*[contains(concat( " ", @class, " " ), concat( " ", "bxQxTC", " " ))]//div')

分享到QQ

分享到微博