按顺序运行所有列的表

发布于 2025-01-19 15:51:51 字数 1245 浏览 1 评论 0原文

如果这是我的数据集,

Id   Col_A_1   Col_A_2    Col_A_3    .....      Col_A_100
1    87        88         82                    88
2    88        82         82                    87
3    82        87         NA                    82
4    88        87         82                    88
5    87        87         87                    88

则从col_a_1到col_a_100的这些列上执行表函数的有效方法是什么?我正在尝试避免运行表(df $ col_a_1,usena =“ ifany”)table(df $ col_a_2,usena =“ ifany”),... 。 100次。

另外,如果可能的话,我喜欢保存在数据框中的输出。

预期输出

    Column         82        85     87       88     Missing
    Col_A_1        1         0      2        2            0   
    Col_A_2        1         0      3        1            0
    Col_A_3        3         0      1        0            1
    .
    .
    .
    Col_A_100      1         0      1        3            0

预先感谢。

# example data
d <- read.table(text = "
Id   Col_A_1   Col_A_2    Col_A_3   Col_A_100
1    87        88         82        88
2    88        82         82        87
3    82        87         NA        82
4    88        87         82        88
5    87        87         87        88", header = TRUE)

If this is my dataset

Id   Col_A_1   Col_A_2    Col_A_3    .....      Col_A_100
1    87        88         82                    88
2    88        82         82                    87
3    82        87         NA                    82
4    88        87         82                    88
5    87        87         87                    88

What is the efficient way to execute table function on these columns from Col_A_1 to Col_A_100 ? I am trying to avoid running the table(df$Col_A_1 , useNA ="ifany"), table(df$Col_A_2 , useNA ="ifany"), .... table(df$Col_A_100 , useNA ="ifany")
100 times.

Also if possible, I like the output saved in a dataframe .

Expected output

    Column         82        85     87       88     Missing
    Col_A_1        1         0      2        2            0   
    Col_A_2        1         0      3        1            0
    Col_A_3        3         0      1        0            1
    .
    .
    .
    Col_A_100      1         0      1        3            0

Thanks in advance.

# example data
d <- read.table(text = "
Id   Col_A_1   Col_A_2    Col_A_3   Col_A_100
1    87        88         82        88
2    88        82         82        87
3    82        87         NA        82
4    88        87         82        88
5    87        87         87        88", header = TRUE)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

盗心人 2025-01-26 15:51:51

使用 id id 使用 stack 从宽到长的重塑,然后 table 获得包括NAS,t的计数将列名称为行,然后将表对象转换为 dataframe

data.frame(rbind(t(table(stack(d[, -1]), useNA = "always"))))
#           X82 X87 X88 NA.
# Col_A_1     1   2   2   0
# Col_A_2     1   3   1   0
# Col_A_3     3   1   0   1
# Col_A_100   1   1   3   0
# NA.         0   0   0   0

Excluding Id column reshape from wide-to-long using stack, then table to get counts including NAs, transpose to have column names as rows, then convert table object to dataframe:

data.frame(rbind(t(table(stack(d[, -1]), useNA = "always"))))
#           X82 X87 X88 NA.
# Col_A_1     1   2   2   0
# Col_A_2     1   3   1   0
# Col_A_3     3   1   0   1
# Col_A_100   1   1   3   0
# NA.         0   0   0   0
以往的大感动 2025-01-26 15:51:51

我刚刚创建了一个小标题来使用和说明它。

Tibbles 本质上可以被视为列表,因此 lapply 工作得很好。由于结果使用起来可能很麻烦,我将其作为列表条目放在小标题中:

library(dplyr)
x = tibble(col1 = sample(100,replace = T),
           col2 = sample(100,replace = T),
           col3 = sample(100,replace = T),
           col4 = sample(100,replace = T))

res = tibble(cols = colnames(x),
             tables = lapply(x, function(col) table(col, useNA = "ifany"))) 

# A tibble: 4 x 2
# cols  tables      
# <chr> <named list>
# col1  <table [61]>
# col2  <table [69]>
# col3  <table [60]>
# col4  <table [62]>

编辑:我没有注意到输出格式要求。可以这样完成(可能有点不优雅):

#I assume it is all numeric values
unique_names = sapply(res$tables, names) %>% purrr::reduce(union) #get all names present


unique_names_sorted = c(sort(as.numeric(unique_names)), if(any(is.na(unique_names))) "NA") # sort them by value and add in NA, if present

#create dummy matrix
mat = matrix(0, nrow = nrow(res), ncol = length(unique_names_sorted))
#assign corresponding names
colnames(mat) = unique_names_sorted
#populate dummy matrix
for (i in 1:nrow(mat)) {
  tmp = res$tables[[i]]
  if(any(is.na(names(tmp)))) names(tmp)[is.na(names(tmp))] = "NA"
  mat[,names(tmp)] = tmp
}

I just created a small tibble to work with and to illustrate it with.

Tibbles can essentially be considered lists, so lapply works just fine. Since the result can be cumbersome to work with, I put it in a tibble as a list entry:

library(dplyr)
x = tibble(col1 = sample(100,replace = T),
           col2 = sample(100,replace = T),
           col3 = sample(100,replace = T),
           col4 = sample(100,replace = T))

res = tibble(cols = colnames(x),
             tables = lapply(x, function(col) table(col, useNA = "ifany"))) 

# A tibble: 4 x 2
# cols  tables      
# <chr> <named list>
# col1  <table [61]>
# col2  <table [69]>
# col3  <table [60]>
# col4  <table [62]>

EDIT: I did not notice the output format requirement. It can be done (perhaps a bit inelegantly) like this:

#I assume it is all numeric values
unique_names = sapply(res$tables, names) %>% purrr::reduce(union) #get all names present


unique_names_sorted = c(sort(as.numeric(unique_names)), if(any(is.na(unique_names))) "NA") # sort them by value and add in NA, if present

#create dummy matrix
mat = matrix(0, nrow = nrow(res), ncol = length(unique_names_sorted))
#assign corresponding names
colnames(mat) = unique_names_sorted
#populate dummy matrix
for (i in 1:nrow(mat)) {
  tmp = res$tables[[i]]
  if(any(is.na(names(tmp)))) names(tmp)[is.na(names(tmp))] = "NA"
  mat[,names(tmp)] = tmp
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文