是否可以将数据和元数据从单个CSV文件导入到R

发布于 2025-02-11 17:44:27 字数 1613 浏览 2 评论 0原文

我知道如何使用r导入简单的csv文件。但是,可以将文件导入r,包括变量和值标签(类似于SPSS sav文件)。

还是我有两个csv文件?一个用于数据,另一个用于元数据(变量和值标签)?

类似的东西(由两个csv文件产生)。但是我认为我对val_lab的元组的语法有问题:

> data
# A tibble: 6 × 2
  se    ctr  
  <chr> <chr>
1 1     1    
2 1     2    
3 2     3    
4 2     2    
5 1     1    
6 2     3    
> metadata
# A tibble: 2 × 3
  var   var_label val_lab                        
  <chr> <chr>     <chr>                          
1 se    sex       (1,'Female'),(2,'Male')        
2 ctr   country   (1,'UK'),(2,'USA'),(3,'France')

使用dput

> dput(head(data))
structure(list(se = c("1", "1", "2", "2", "1", "2"), ctr = c("1", 
"2", "3", "2", "1", "3")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))
> dput(metadata)
structure(list(var = c("se", "ctr"), var_label = c("sex", "country"
), val_lab = c("(1,'Female'),(2,'Male')", "(1,'UK'),(2,'USA'),(3,'France')"
)), row.names = c(NA, -2L), spec = structure(list(cols = list(
    var = structure(list(), class = c("collector_character", 
    "collector")), var_label = structure(list(), class = c("collector_character", 
    "collector")), val_lab = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ";"), class = "col_spec"), problems = <pointer: 0x00000149af86d620>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

I know how to import a simple csv file using R. But, is it possible to import a file to R including variable and value labels (similar to SPSS sav files).

Or instead, shall I have two csv files? One for data and the other for metadata (variable and value labels)?

Something similar to this (resulting from two csv files). But I think I have a problem with the syntax of the tuples for val_lab:

> data
# A tibble: 6 × 2
  se    ctr  
  <chr> <chr>
1 1     1    
2 1     2    
3 2     3    
4 2     2    
5 1     1    
6 2     3    
> metadata
# A tibble: 2 × 3
  var   var_label val_lab                        
  <chr> <chr>     <chr>                          
1 se    sex       (1,'Female'),(2,'Male')        
2 ctr   country   (1,'UK'),(2,'USA'),(3,'France')

Using dput:

> dput(head(data))
structure(list(se = c("1", "1", "2", "2", "1", "2"), ctr = c("1", 
"2", "3", "2", "1", "3")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))
> dput(metadata)
structure(list(var = c("se", "ctr"), var_label = c("sex", "country"
), val_lab = c("(1,'Female'),(2,'Male')", "(1,'UK'),(2,'USA'),(3,'France')"
)), row.names = c(NA, -2L), spec = structure(list(cols = list(
    var = structure(list(), class = c("collector_character", 
    "collector")), var_label = structure(list(), class = c("collector_character", 
    "collector")), val_lab = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ";"), class = "col_spec"), problems = <pointer: 0x00000149af86d620>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

岛歌少女 2025-02-18 17:44:27

在这种情况下,您可以这样做:

for(each_var in metadata$var) {

    each_label  <- metadata$val_lab[metadata$var==each_var]

    # Get data out of weird tuple format
    values_list  <- strsplit(
        gsub("\\(|\\)|'", "", strsplit(each_label, "\\),\\(")[[1]]), 
        ","
    ) 
    # 
    values_df  <- do.call(rbind, values_list)  |> data.frame()  |>  setNames(c("values", "labels"))


    data[[each_var]]  <- factor(data[[each_var]], levels = values_df$values, labels = values_df$labels)

    # Set variable label - this will show up in Rstudio viewer 
    for(each_var in metadata$var) {
        attr(data[[each_var]], "labels")  <- metadata$var_label[metadata$var==each_var]
    }

}

data
# A tibble: 6 x 2
#   se     ctr     
#   <fct>  <fct>   
# 1 Female UK      
# 2 Female USA     
# 3 Male   France  
# 4 Male   USA     
# 5 Female UK
# 6 Male   France

数据现在是一个因素表,正如格雷戈尔·托马斯(Gregor Thomas)所说的那样,这是R处理此类数据的方式。

请注意,此代码中的大多数实际上是从元组转换为字符串格式的标签和级别。级别的实际设置是data [[aenter_var]]&lt; - factor(data [[[aenter_var]],latver = values_df $ values,labels = values_df $ labels),因此直接到数据框架而不是元组的水平应该更简单。

You can do it like this in this case:

for(each_var in metadata$var) {

    each_label  <- metadata$val_lab[metadata$var==each_var]

    # Get data out of weird tuple format
    values_list  <- strsplit(
        gsub("\\(|\\)|'", "", strsplit(each_label, "\\),\\(")[[1]]), 
        ","
    ) 
    # 
    values_df  <- do.call(rbind, values_list)  |> data.frame()  |>  setNames(c("values", "labels"))


    data[[each_var]]  <- factor(data[[each_var]], levels = values_df$values, labels = values_df$labels)

    # Set variable label - this will show up in Rstudio viewer 
    for(each_var in metadata$var) {
        attr(data[[each_var]], "labels")  <- metadata$var_label[metadata$var==each_var]
    }

}

data
# A tibble: 6 x 2
#   se     ctr     
#   <fct>  <fct>   
# 1 Female UK      
# 2 Female USA     
# 3 Male   France  
# 4 Male   USA     
# 5 Female UK
# 6 Male   France

data is now a table of factors, which as Gregor Thomas says is how R deals with this type of data.

Note that most of this code is actually getting the labels and levels out of the tuple converted to a string format. The actual setting of the levels is data[[each_var]] <- factor(data[[each_var]], levels = values_df$values, labels = values_df$labels), so if you can write the levels directly to a data frame rather than a tuple then it should be much more straightforward.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文