是否可以将数据和元数据从单个CSV文件导入到R
我知道如何使用r
导入简单的csv
文件。但是,可以将文件导入r
,包括变量和值标签(类似于SPSS sav
文件)。
还是我有两个csv
文件?一个用于数据,另一个用于元数据(变量和值标签)?
类似的东西(由两个csv
文件产生)。但是我认为我对val_lab
的元组的语法有问题:
> data
# A tibble: 6 × 2
se ctr
<chr> <chr>
1 1 1
2 1 2
3 2 3
4 2 2
5 1 1
6 2 3
> metadata
# A tibble: 2 × 3
var var_label val_lab
<chr> <chr> <chr>
1 se sex (1,'Female'),(2,'Male')
2 ctr country (1,'UK'),(2,'USA'),(3,'France')
使用dput
:
> dput(head(data))
structure(list(se = c("1", "1", "2", "2", "1", "2"), ctr = c("1",
"2", "3", "2", "1", "3")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
> dput(metadata)
structure(list(var = c("se", "ctr"), var_label = c("sex", "country"
), val_lab = c("(1,'Female'),(2,'Male')", "(1,'UK'),(2,'USA'),(3,'France')"
)), row.names = c(NA, -2L), spec = structure(list(cols = list(
var = structure(list(), class = c("collector_character",
"collector")), var_label = structure(list(), class = c("collector_character",
"collector")), val_lab = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ";"), class = "col_spec"), problems = <pointer: 0x00000149af86d620>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
I know how to import a simple csv
file using R
. But, is it possible to import a file to R
including variable and value labels (similar to SPSS sav
files).
Or instead, shall I have two csv
files? One for data and the other for metadata (variable and value labels)?
Something similar to this (resulting from two csv
files). But I think I have a problem with the syntax of the tuples for val_lab
:
> data
# A tibble: 6 × 2
se ctr
<chr> <chr>
1 1 1
2 1 2
3 2 3
4 2 2
5 1 1
6 2 3
> metadata
# A tibble: 2 × 3
var var_label val_lab
<chr> <chr> <chr>
1 se sex (1,'Female'),(2,'Male')
2 ctr country (1,'UK'),(2,'USA'),(3,'France')
Using dput
:
> dput(head(data))
structure(list(se = c("1", "1", "2", "2", "1", "2"), ctr = c("1",
"2", "3", "2", "1", "3")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
> dput(metadata)
structure(list(var = c("se", "ctr"), var_label = c("sex", "country"
), val_lab = c("(1,'Female'),(2,'Male')", "(1,'UK'),(2,'USA'),(3,'France')"
)), row.names = c(NA, -2L), spec = structure(list(cols = list(
var = structure(list(), class = c("collector_character",
"collector")), var_label = structure(list(), class = c("collector_character",
"collector")), val_lab = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ";"), class = "col_spec"), problems = <pointer: 0x00000149af86d620>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在这种情况下,您可以这样做:
数据
现在是一个因素表,正如格雷戈尔·托马斯(Gregor Thomas)所说的那样,这是R处理此类数据的方式。请注意,此代码中的大多数实际上是从元组转换为字符串格式的标签和级别。级别的实际设置是
data [[aenter_var]]&lt; - factor(data [[[aenter_var]],latver = values_df $ values,labels = values_df $ labels)
,因此直接到数据框架而不是元组的水平应该更简单。You can do it like this in this case:
data
is now a table of factors, which as Gregor Thomas says is how R deals with this type of data.Note that most of this code is actually getting the labels and levels out of the tuple converted to a string format. The actual setting of the levels is
data[[each_var]] <- factor(data[[each_var]], levels = values_df$values, labels = values_df$labels)
, so if you can write the levels directly to a data frame rather than a tuple then it should be much more straightforward.