创建“词典/参考”桌子
我有此数据集:
col_1 = as.factor(c("a", "a", "b", "c", "b", "a"))
col_2 = c(15, 346, 3564, 99, 10, 2)
col_3 = as.factor(c("bb", "a", "g", "f", "bb", "a"))
index = 1:6
sample_data = data.frame(index, col_1, col_2, col_3)
index col_1 col_2 col_3
1 1 15 4
2 1 346 5
3 2 3564 6
4 3 99 7
5 2 10 4
6 1 2 5
indx <- vapply(sample_data, is.factor, logical(1))
vec <- interaction(stack(type.convert(sample_data[,indx], as.is = TRUE)))
sample_data[indx] <- match(vec, unique(vec))
index col_1 col_2 col_3
1 1 1 15 4
2 2 1 346 5
3 3 2 3564 6
4 4 3 99 7
5 5 2 10 4
6 6 1 2 5
我想尝试创建一个“字典表”(即“传奇”)显示了原始数据与转换数据之间的关系。我想出了一种手动执行此操作的方法:
library(plyr)
col_1_legend = unique(data.frame(original_data$col_1, sample_data$col_1))
col_3_legend = unique(data.frame(original_data$col_3, sample_data$col_3))
dictionary_data<- plyr::rbind.fill(col_1_legend,col_3_legend)
original_data.col_1 sample_data.col_1 original_data.col_3 sample_data.col_3
1 a 1 <NA> NA
2 b 2 <NA> NA
3 c 3 <NA> NA
4 <NA> NA bb 4
5 <NA> NA a 5
6 <NA> NA g 6
7 <NA> NA f 7
但是,这是一种创建“字典表”的非常混乱且效率低下的方法(例如,有很多带有因子变量的列?)。有人可以建议一种更有效的方法吗?
谢谢你!
I have this dataset:
col_1 = as.factor(c("a", "a", "b", "c", "b", "a"))
col_2 = c(15, 346, 3564, 99, 10, 2)
col_3 = as.factor(c("bb", "a", "g", "f", "bb", "a"))
index = 1:6
sample_data = data.frame(index, col_1, col_2, col_3)
index col_1 col_2 col_3
1 1 15 4
2 1 346 5
3 2 3564 6
4 3 99 7
5 2 10 4
6 1 2 5
In another question (Sequentially Replacing Factor Variables with Numerical Values), I learned how to enumerate all factor variables with numbers:
indx <- vapply(sample_data, is.factor, logical(1))
vec <- interaction(stack(type.convert(sample_data[,indx], as.is = TRUE)))
sample_data[indx] <- match(vec, unique(vec))
index col_1 col_2 col_3
1 1 1 15 4
2 2 1 346 5
3 3 2 3564 6
4 4 3 99 7
5 5 2 10 4
6 6 1 2 5
I want to try and create a "dictionary table" (i.e. a "legend") that shows the relationship between the original data and the transformed data. I figured out a way to do this manually:
library(plyr)
col_1_legend = unique(data.frame(original_data$col_1, sample_data$col_1))
col_3_legend = unique(data.frame(original_data$col_3, sample_data$col_3))
dictionary_data<- plyr::rbind.fill(col_1_legend,col_3_legend)
original_data.col_1 sample_data.col_1 original_data.col_3 sample_data.col_3
1 a 1 <NA> NA
2 b 2 <NA> NA
3 c 3 <NA> NA
4 <NA> NA bb 4
5 <NA> NA a 5
6 <NA> NA g 6
7 <NA> NA f 7
But this is a very messy and inefficient way to create the "dictionary table" (e.g. what is there were many columns with factor variables?). Can someone please suggest a more efficient way to do this?
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会如下创建它; )的数据
和
match (
作为长格式数据框架,带有来自
stack()
I would create it as follows; as a long format data frame, with the data from
stack()
andmatch()
that you already have:Sample data
Build legend