将功能应用于多个列表

发布于 2025-02-11 13:13:36 字数 1047 浏览 2 评论 0原文

我正在研究我们游说的研究,他将数据发布为一个开放的API,该API的整合度非常低,似乎只能一次下载250个观察结果。我想将整个数据集编译到一个数据表中,但是在最后一步中挣扎着。到目前为止,

base_url <- sample("https://lda.senate.gov/api/v1/contributions/?page=", 10, rep = TRUE) #Set the number between the commas as how many pages you want
numbers <- 1:10 #Set the second number as how many pages you want
pagesize <- sample("&page_size=250", 10, rep = TRUE) #Set the number between the commas as how many pages you want
pages <- data.frame(base_url, numbers, pagesize)
pages$numbers <- as.character(pages$numbers)
pages$url <- with(pages, paste0(base_url, numbers, pagesize)) # creates list of pages you want. the list is titled pages$url
for (i in 1:length(pages$url)) assign(pages$url[i], GET(pages$url[i])) # Creates all the base lists in need of extraction 

我需要做的最后两件事是从创建列表中提取数据表,然后完全加入所有数据表。我知道如何加入所有这些,但是提取数据框架是具有挑战性的。基本上,对于所有创建的列表,我需要应用函数frofjson(rawTochar(list $ content))。我尝试使用Lapply,但尚未弄清楚。任何帮助都将受到极大的欢迎!

I am doing research on U.S. Lobbying, who publishes their data as an open API that is very poorly integrated and only seems to allow 250 observations to be downloaded at one time. I would like to compile the whole data set into one data table but am struggling with the last step to do so. This is what I have thus far

base_url <- sample("https://lda.senate.gov/api/v1/contributions/?page=", 10, rep = TRUE) #Set the number between the commas as how many pages you want
numbers <- 1:10 #Set the second number as how many pages you want
pagesize <- sample("&page_size=250", 10, rep = TRUE) #Set the number between the commas as how many pages you want
pages <- data.frame(base_url, numbers, pagesize)
pages$numbers <- as.character(pages$numbers)
pages$url <- with(pages, paste0(base_url, numbers, pagesize)) # creates list of pages you want. the list is titled pages$url
for (i in 1:length(pages$url)) assign(pages$url[i], GET(pages$url[i])) # Creates all the base lists in need of extraction 

The last two things I need to do are extract the data table from the created lists and then full join all of them. I know how to join all of them but extracting the data frames is proving to be challenging. basically, to all the created lists I need to apply the function fromJSON(rawToChar(list$content)). I have tried using lapply but have yet to figure it out. any help would be greatly welcomed!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

花海 2025-02-18 13:13:36

当您将get(页面$ url [i]))分配给数据框架时,您将其胁迫到角色向量。最好将其分配到列表中,并将其保存为响应

library(httr)
library(jsonlite)
library(dplyr) # for bind_rows
page_content <- list()
for (i in 1:length(pages$url)) page_content[[i]] <- GET(pages$url[i]) # Creates all the base lists in need of extraction

然后,您可以使用已编写的代码 - frofjson(rawTochar()) - 从RAW提取它字符对字符:

results_list <- lapply(
    page_content,
    \(page) fromJSON(rawToChar(page[["content"]]))["results"][[1]]
)

results_table <- do.call(bind_rows, results_list)

dim(results_table) # 2500 27

names(results_table)
#  [1] "url"                          "filing_uuid"                  "filing_type"                  "filing_type_display"          "filing_year"
#  [6] "filing_period"                "filing_period_display"        "filing_document_url"          "filing_document_content_type" "filer_type"
# [11] "filer_type_display"           "dt_posted"                    "contact_name"                 "comments"                     "address_1"
# [16] "address_2"                    "city"                         "state"                        "state_display"                "zip"
# [21] "country"                      "country_display"              "registrant"                   "lobbyist"                     "no_contributions"
# [26] "pacs"                         "contribution_items"

When you were assigning GET(pages$url[i])) to your data frame you were coercing it to a character vector. Better to assign it to a list and keep it as a response:

library(httr)
library(jsonlite)
library(dplyr) # for bind_rows
page_content <- list()
for (i in 1:length(pages$url)) page_content[[i]] <- GET(pages$url[i]) # Creates all the base lists in need of extraction

Then you can use the code you had written - fromJSON(rawToChar()) - to extract it from raw bytes to characters:

results_list <- lapply(
    page_content,
    \(page) fromJSON(rawToChar(page[["content"]]))["results"][[1]]
)

results_table <- do.call(bind_rows, results_list)

dim(results_table) # 2500 27

names(results_table)
#  [1] "url"                          "filing_uuid"                  "filing_type"                  "filing_type_display"          "filing_year"
#  [6] "filing_period"                "filing_period_display"        "filing_document_url"          "filing_document_content_type" "filer_type"
# [11] "filer_type_display"           "dt_posted"                    "contact_name"                 "comments"                     "address_1"
# [16] "address_2"                    "city"                         "state"                        "state_display"                "zip"
# [21] "country"                      "country_display"              "registrant"                   "lobbyist"                     "no_contributions"
# [26] "pacs"                         "contribution_items"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文