如何将.json的值存储到R数据框中?

发布于 2025-01-17 21:24:30 字数 996 浏览 5 评论 0原文

我面临以下问题:

test_vector <- c('https://data.sec.gov/submissions/CIK0000789019.json', 'https://data.sec.gov/submissions/CIK0001652044.json',
                  'https://data.sec.gov/submissions/CIK0001018724.json')
  
test_df <- lapply(test_vector, jsonlite::fromJSON, flatten= TRUE) %>%
  spread_all()

从SEC网页中,我尝试获取有关公司的一些基本信息。 JSON对象具有混合结构。有些变量是“单列”,而另一些则位于嵌套结构中。我无法解决两个问题:

  1. 当股票和交换的价值不是标量时(换句话说,当公司有两个脚注时),我会得到NAS

  2. 我无法删除最后一列。

。他们都没有成功。

后续问题:

当我尝试在循环中使用此方法时:

cik_df <- data.frame

for (i in cik_vector) {
  
  output <- lapply(cik_vector, jsonlite::fromJSON) %>%
    spread_all
  
  if (i > 1 & i %% 10 == 0) {
    Sys.sleep(1)
  }
  
  cik_df <- rbind (cik_df, output) 
}

我会收到以下错误消息:

if(is.Character(txt)&amp;&amp; length(txt)== 1&amp;&amp; nchar(txt,type,type =“ bytes”)&lt;::::::: 缺少true/false需要的值

I face the following problem:

test_vector <- c('https://data.sec.gov/submissions/CIK0000789019.json', 'https://data.sec.gov/submissions/CIK0001652044.json',
                  'https://data.sec.gov/submissions/CIK0001018724.json')
  
test_df <- lapply(test_vector, jsonlite::fromJSON, flatten= TRUE) %>%
  spread_all()

From the SEC webpage I try to get some basic information about companies. The JSON object has a mixed structure. Some variables are "single columns" whereas others are located in nested structures. There are a couple of problems that I can't solve:

  1. When the value for tickers and exchange is not a scalar (in other words when a company has two tickers) I get NAs

  2. I cant delete the last column ..JSON which turns to be a list

I played around with several options. None of them was successful.

Follow-up question:

When I try to use this approach in a loop:

cik_df <- data.frame

for (i in cik_vector) {
  
  output <- lapply(cik_vector, jsonlite::fromJSON) %>%
    spread_all
  
  if (i > 1 & i %% 10 == 0) {
    Sys.sleep(1)
  }
  
  cik_df <- rbind (cik_df, output) 
}

I get the following error message:

Error in if (is.character(txt) && length(txt) == 1 && nchar(txt, type = "bytes") < :
missing value where TRUE/FALSE needed

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

电影里的梦 2025-01-24 21:24:30

首先,您导入所需的内容,

library(tidyr)
library(stringr)

test_vector <- c('https://data.sec.gov/submissions/CIK0000789019.json',
                 'https://data.sec.gov/submissions/CIK0001652044.json',
                 'https://data.sec.gov/submissions/CIK0001018724.json')

test_df <- lapply(test_vector, jsonlite::fromJSON, flatten= TRUE)

可以使用linist()将每个单独的值从每个公司的列表中取出并将其存储在数据框架中。每个单独的价值都有一个可变名称,以及与之关联的公司。

test_frame <- data.frame(name = c(rep("Microsoft", length(unlist(test_df[[1]]))), 
                                  rep("Alphabet", length(unlist(test_df[[2]]))),
                                  rep("Amazon", length(unlist(test_df[[3]])))),

                         variable_name = c(names(unlist(test_df[[1]])),
                                           names(unlist(test_df[[2]])),
                                           names(unlist(test_df[[3]]))),
                     
                         value = c(unlist(test_df[[1]]), 
                                   unlist(test_df[[2]]),
                                   unlist(test_df[[3]])))

以字母为例,它有两个股票名称。一个具有可变名称“ tickers1”,另一个是“ tickers2”。另外两家公司只有一个股票,简称为“股票”。我们希望将“ trickers”(和其他变量名称)重命名为“ tickers1”,以便所有变量名称均为三家公司之间的标准。

test_frame$variable_name[is.na(as.numeric(str_sub(test_frame$variable_name, - 1, - 1)))] <- paste(test_frame$variable_name[is.na(as.numeric(str_sub(test_frame$variable_name, - 1, - 1)))], "1", sep = "")

现在,我们将此数据框架转换为列名是单个名称的一个数据框架,以便易于搜索。

df <- test_frame %>%
  pivot_wider(names_from = variable_name, values_from = value)

First you import what you need like so,

library(tidyr)
library(stringr)

test_vector <- c('https://data.sec.gov/submissions/CIK0000789019.json',
                 'https://data.sec.gov/submissions/CIK0001652044.json',
                 'https://data.sec.gov/submissions/CIK0001018724.json')

test_df <- lapply(test_vector, jsonlite::fromJSON, flatten= TRUE)

You can use unlist() to take each individual value out of each company's list and store it in a data frame. Each individual value has a variable name, and the company that it's associated with.

test_frame <- data.frame(name = c(rep("Microsoft", length(unlist(test_df[[1]]))), 
                                  rep("Alphabet", length(unlist(test_df[[2]]))),
                                  rep("Amazon", length(unlist(test_df[[3]])))),

                         variable_name = c(names(unlist(test_df[[1]])),
                                           names(unlist(test_df[[2]])),
                                           names(unlist(test_df[[3]]))),
                     
                         value = c(unlist(test_df[[1]]), 
                                   unlist(test_df[[2]]),
                                   unlist(test_df[[3]])))

Taking Alphabet as an example, it has two ticker names. One has the variable name "tickers1" and the other is "tickers2". The other two companies have only one ticker, simply called "tickers". We want to rename "tickers" (and other variable names) to "tickers1" so that all the variable names are standard between the three companies.

test_frame$variable_name[is.na(as.numeric(str_sub(test_frame$variable_name, - 1, - 1)))] <- paste(test_frame$variable_name[is.na(as.numeric(str_sub(test_frame$variable_name, - 1, - 1)))], "1", sep = "")

Now we convert this data frame to one where the column names are the individual names so that it's easy to search through.

df <- test_frame %>%
  pivot_wider(names_from = variable_name, values_from = value)
記憶穿過時間隧道 2025-01-24 21:24:30

同时我找到了一个可接受的解决方案:

listviewer::jsonedit(raw_json, height = "1200px", mode = "view")


# scrape relevant data out of json files using a loop approach

company <- data_frame()


for (i in cik_vector) {
  
  raw_json <- jsonlite::read_json(i)
    
  master_data <- tibble(cik = pluck(raw_json, "cik"),
                        sic = pluck(raw_json, "sic"),
                        sic_desc = pluck(raw_json, "sicDescription"),
                        company_name = pluck(raw_json, "name"),
                        description = pluck(raw_json, "description"),
                        website = pluck(raw_json, "website"),
                        country1= pluck(raw_json, "stateOfIncorporation"),
                        country2= pluck(raw_json, "stateOfIncorporationDescription"))
  
  tickers <- tibble(symbol = pluck(raw_json, "tickers", .default= NA)) %>% 
    mutate(symbol= paste(unique(symbol), collapse= ",")) %>%
    unique()
  
  exchanges <- tibble(exchanges = pluck(raw_json, "exchanges")) %>% 
    mutate(exchanges= paste(unique(exchanges), collapse= ",")) %>%
    unique()
  
  former_names <- tibble(former_names = pluck(raw_json, "formerNames", .default= NA)) %>% 
    mutate(former_names= paste(unique(former_names), collapse= ",")) %>%
    unique()
  
  business <- raw_json %>%
    enter_object('business') %>%
    spread_all %>%
    data_frame() %>%
    select(1:7)
  
  output<- cbind(master_data, tickers, exchanges, former_names, business)
  
  company <- rbind(company, output) %>% unique
  
}

listviewer 包对于探索 json 文件的内部结构是非常有用的资源。

Meanwhile i found a acceptable solution:

listviewer::jsonedit(raw_json, height = "1200px", mode = "view")


# scrape relevant data out of json files using a loop approach

company <- data_frame()


for (i in cik_vector) {
  
  raw_json <- jsonlite::read_json(i)
    
  master_data <- tibble(cik = pluck(raw_json, "cik"),
                        sic = pluck(raw_json, "sic"),
                        sic_desc = pluck(raw_json, "sicDescription"),
                        company_name = pluck(raw_json, "name"),
                        description = pluck(raw_json, "description"),
                        website = pluck(raw_json, "website"),
                        country1= pluck(raw_json, "stateOfIncorporation"),
                        country2= pluck(raw_json, "stateOfIncorporationDescription"))
  
  tickers <- tibble(symbol = pluck(raw_json, "tickers", .default= NA)) %>% 
    mutate(symbol= paste(unique(symbol), collapse= ",")) %>%
    unique()
  
  exchanges <- tibble(exchanges = pluck(raw_json, "exchanges")) %>% 
    mutate(exchanges= paste(unique(exchanges), collapse= ",")) %>%
    unique()
  
  former_names <- tibble(former_names = pluck(raw_json, "formerNames", .default= NA)) %>% 
    mutate(former_names= paste(unique(former_names), collapse= ",")) %>%
    unique()
  
  business <- raw_json %>%
    enter_object('business') %>%
    spread_all %>%
    data_frame() %>%
    select(1:7)
  
  output<- cbind(master_data, tickers, exchanges, former_names, business)
  
  company <- rbind(company, output) %>% unique
  
}

The listviewer package was very a very helpful resource to explore the inner structure of the json file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文