创建“词典/参考”桌子

发布于 2025-02-12 07:13:18 字数 2157 浏览 3 评论 0原文

我有此数据集:

col_1 = as.factor(c("a", "a", "b", "c", "b", "a"))
col_2 = c(15, 346, 3564, 99, 10, 2)
col_3 = as.factor(c("bb", "a", "g", "f", "bb", "a"))
index = 1:6

sample_data = data.frame(index, col_1, col_2, col_3)

  index col_1 col_2 col_3
     1     1    15     4
     2     1   346     5
     3     2  3564     6
     4     3    99     7
     5     2    10     4
     6     1     2     5

在另一个问题中( sequentallylylylylylylylylylylylylylyllylylylylylylylyly innumerical actical variables具有数值),我学会了如何用数字列举所有因素变量:

indx <- vapply(sample_data, is.factor, logical(1))
vec <- interaction(stack(type.convert(sample_data[,indx], as.is = TRUE)))
sample_data[indx] <- match(vec, unique(vec))

  index col_1 col_2 col_3
1     1     1    15     4
2     2     1   346     5
3     3     2  3564     6
4     4     3    99     7
5     5     2    10     4
6     6     1     2     5

我想尝试创建一个“字典表”(即“传奇”)显示了原始数据与转换数据之间的关系。我想出了一种手动执行此操作的方法:

library(plyr)

col_1_legend = unique(data.frame(original_data$col_1, sample_data$col_1))
col_3_legend = unique(data.frame(original_data$col_3, sample_data$col_3))

dictionary_data<- plyr::rbind.fill(col_1_legend,col_3_legend)

 original_data.col_1 sample_data.col_1 original_data.col_3 sample_data.col_3
1                   a                 1                <NA>                NA
2                   b                 2                <NA>                NA
3                   c                 3                <NA>                NA
4                <NA>                NA                  bb                 4
5                <NA>                NA                   a                 5
6                <NA>                NA                   g                 6
7                <NA>                NA                   f                 7

但是,这是一种创建“字典表”的非常混乱且效率低下的方法(例如,有很多带有因子变量的列?)。有人可以建议一种更有效的方法吗?

谢谢你!

I have this dataset:

col_1 = as.factor(c("a", "a", "b", "c", "b", "a"))
col_2 = c(15, 346, 3564, 99, 10, 2)
col_3 = as.factor(c("bb", "a", "g", "f", "bb", "a"))
index = 1:6

sample_data = data.frame(index, col_1, col_2, col_3)

  index col_1 col_2 col_3
     1     1    15     4
     2     1   346     5
     3     2  3564     6
     4     3    99     7
     5     2    10     4
     6     1     2     5

In another question (Sequentially Replacing Factor Variables with Numerical Values), I learned how to enumerate all factor variables with numbers:

indx <- vapply(sample_data, is.factor, logical(1))
vec <- interaction(stack(type.convert(sample_data[,indx], as.is = TRUE)))
sample_data[indx] <- match(vec, unique(vec))

  index col_1 col_2 col_3
1     1     1    15     4
2     2     1   346     5
3     3     2  3564     6
4     4     3    99     7
5     5     2    10     4
6     6     1     2     5

I want to try and create a "dictionary table" (i.e. a "legend") that shows the relationship between the original data and the transformed data. I figured out a way to do this manually:

library(plyr)

col_1_legend = unique(data.frame(original_data$col_1, sample_data$col_1))
col_3_legend = unique(data.frame(original_data$col_3, sample_data$col_3))

dictionary_data<- plyr::rbind.fill(col_1_legend,col_3_legend)

 original_data.col_1 sample_data.col_1 original_data.col_3 sample_data.col_3
1                   a                 1                <NA>                NA
2                   b                 2                <NA>                NA
3                   c                 3                <NA>                NA
4                <NA>                NA                  bb                 4
5                <NA>                NA                   a                 5
6                <NA>                NA                   g                 6
7                <NA>                NA                   f                 7

But this is a very messy and inefficient way to create the "dictionary table" (e.g. what is there were many columns with factor variables?). Can someone please suggest a more efficient way to do this?

Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

陌伤ぢ 2025-02-19 07:13:18

我会如下创建它; )的数据

match (
# sample 1
col_1 = as.factor(c("a", "a", "b", "c", "b", "a"))
col_2 = c(15, 346, 3564, 99, 10, 2)
col_3 = as.factor(c("bb", "a", "g", "f", "bb", "a"))
index = 1:6

sample_df1 = data.frame(index, col_1, col_2, col_3)

# sample 2
indx <- vapply(sample_df1, is.factor, logical(1))

stacked <- stack(type.convert(sample_df1[,indx], as.is = TRUE))
vec <- interaction(stacked)
matched <- match(vec, unique(vec))

sample_df2 <- sample_df1
sample_df2[indx] <- matched
作为长格式数据框架,带有来自stack()
# create legend
legend <- cbind(matched, stacked) %>% distinct()
legend <- legend[c(3,2,1)]
colnames(legend) <- c('column', 'original_data', 'sample_data')
> legend
  column original_data sample_data
1  col_1             a           1
2  col_1             b           2
3  col_1             c           3
4  col_3            bb           4
5  col_3             a           5
6  col_3             g           6
7  col_3             f           7

I would create it as follows; as a long format data frame, with the data from stack() and match() that you already have:

Sample data
# sample 1
col_1 = as.factor(c("a", "a", "b", "c", "b", "a"))
col_2 = c(15, 346, 3564, 99, 10, 2)
col_3 = as.factor(c("bb", "a", "g", "f", "bb", "a"))
index = 1:6

sample_df1 = data.frame(index, col_1, col_2, col_3)

# sample 2
indx <- vapply(sample_df1, is.factor, logical(1))

stacked <- stack(type.convert(sample_df1[,indx], as.is = TRUE))
vec <- interaction(stacked)
matched <- match(vec, unique(vec))

sample_df2 <- sample_df1
sample_df2[indx] <- matched
Build legend
# create legend
legend <- cbind(matched, stacked) %>% distinct()
legend <- legend[c(3,2,1)]
colnames(legend) <- c('column', 'original_data', 'sample_data')
> legend
  column original_data sample_data
1  col_1             a           1
2  col_1             b           2
3  col_1             c           3
4  col_3            bb           4
5  col_3             a           5
6  col_3             g           6
7  col_3             f           7
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文