在 R 中,如何组合两组数据并在解析后将它们分别添加到单列中?
library(rvest)
link1 <- "https://somon.tj/adv/7866644_5-komn-kvartira-3-etazh-79-m2-a-sino/"
link2 <- "https://somon.tj/adv/7985721_2-komn-dom-grandzavod/"
house_link <- c(link1, link2)
house_features = lapply(houselink, function(link) {
page_data <-
tryCatch({
read_html(link)
pricing = page_data %>% html_nodes("h1") %>% html_text(trim = T)},
error = function(e) e,
warning = function(w) w)
if(!inherits(page_data, "error")) {
data.frame(
link = link,
parameters = page_data %>% html_nodes(".label") %>% html_text(trim = TRUE),
values = page_data %>% html_nodes(".info") %>% html_text(trim = TRUE)
)
list(
pricing = page_data %>% html_nodes("h1") %>% html_text(trim = T)
)
} else {
NULL
}
})
但是当我使用 do.call(rbind) 时,它会产生错误。
do.call(rbind, house_features) %>%
group_by(link, parameters) %>%
mutate(parameters = if_else(row_number() > 1, paste(parameters,row_number()), parameters)) %>%
pivot_wider(id_cols = link, names_from = parameters, values_from = values)
其中一个链接有 19 个变量,而第二个链接仅包含 5 个变量。你看到了差异。如何将所有变量分别放入单独的列中?如果该变量没有值,例如额外的 14 个变量,我想为变量的值添加 NA。我应该如何完成这个,偷看?
library(rvest)
link1 <- "https://somon.tj/adv/7866644_5-komn-kvartira-3-etazh-79-m2-a-sino/"
link2 <- "https://somon.tj/adv/7985721_2-komn-dom-grandzavod/"
house_link <- c(link1, link2)
house_features = lapply(houselink, function(link) {
page_data <-
tryCatch({
read_html(link)
pricing = page_data %>% html_nodes("h1") %>% html_text(trim = T)},
error = function(e) e,
warning = function(w) w)
if(!inherits(page_data, "error")) {
data.frame(
link = link,
parameters = page_data %>% html_nodes(".label") %>% html_text(trim = TRUE),
values = page_data %>% html_nodes(".info") %>% html_text(trim = TRUE)
)
list(
pricing = page_data %>% html_nodes("h1") %>% html_text(trim = T)
)
} else {
NULL
}
})
But when I use the do.call(rbind)
, it produces an error.
do.call(rbind, house_features) %>%
group_by(link, parameters) %>%
mutate(parameters = if_else(row_number() > 1, paste(parameters,row_number()), parameters)) %>%
pivot_wider(id_cols = link, names_from = parameters, values_from = values)
While one of the links has 19 variables, while the second one contains 5 variables only. You see the discrepancy. How can I make all variables each into individual columns? If it has no value on that variable, say, additional 14 variables, I want to add NA for the value of the variables. How should I accomplish this, peeps?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试这种方法:
do.call
将房屋功能收集到列表中rbind
它们,确保参数名称是唯一的(它们不是/例如 link1 有两个参数称为Floor
),然后pivot_wider
输出:
Try this approach:
rbind
them usingdo.call
, ensure that the parameter names are unique (they are not / for example link1 has two parameters calledFloor
), and thenpivot_wider
Output:
我发现了什么?
尽管变量
pricing
可能会导致数据帧之间的重复和冗余,如您所见,但令人惊讶的是,与传统的 for 相比,lapply
函数以惊人的速度快速工作。 -环形!我是说,你有一整团蜡。谢谢@langtang:)
What I found?
Although the variable
pricing
may cause repetition and redundancy across data frame as you would see, still--surprisingly--lapply
function works rapidly with an astonishing speed compared with a traditional for-loop!You've got a whole ball of wax, I mean. Thanks @langtang :)