如何在 R 中为具有分类数据的列子集创建列联表（交叉表）？

发布于 2024-09-15 04:12:06 字数 537 浏览 10 评论 0原文

我有一个表，其标题如下所示（我已对其进行了简化）：

id, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10

其中除 id 之外的每一行都是一个分类变量。让我们将类别命名为 A、B、C、D、E。

我想为某些列创建一个列联表，如下所示（为简洁起见，我没有将样本数字放入细胞）。获得总列/行会很棒，但不是强制性的，我可以稍后自己计算。

      a1  a2  a3  a4 Total
    ----------------------
    A|
    B|
    C|
    D|
    E|
Total|

因此，问题是如何在 R 中创建基于多列的交叉表？我见过的 table() 和 xtabs() 示例仅使用列。就我而言，这些列是相邻的，因此一个交叉表将汇总列 a1..a4，另一个交叉表将汇总列 a5..a7，依此类推。我希望有一种优雅的方式来做到这一点。

我是一名程序员，但是 R 的新手

。提前谢谢您。

原文

I have a table whose header looks like this (I've simplified it):

id, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10

where each row, except for id, is a categorical variable. Let's name the categories A, B, C, D, E.

I would like to create a contingency table for some of the columns, such as below (for brevity, I have not put sample numbers in the cells). Getting the total column/row would be great, but not mandatory, I can calculate it myself later.

      a1  a2  a3  a4 Total
    ----------------------
    A|
    B|
    C|
    D|
    E|
Total|

Thus, the question is how to create a crosstab based on multiple columns in R? The examples I've seen with table() and xtabs() use a column only. In my case, the columns are adjacent, so one crosstab would summarize columns a1..a4, another a5..a7 and so on. I hope there is an elegant way to do this.

I'm a programmer, but a newbie in R.

Thank you in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

挽心 2024-09-22 04:12:06

为此，您的数据格式不正确。这是使用 reshape 包适当重塑数据的一种方法。

library(reshape)
data.m <- melt(data, id = "id")

要计算带有边距的所有级别的表，您可以使用

cast(data.m, value ~ variable, margins = T)

For a subset，获取 data.m 的相关子集。

Your data is poorly formatted for this purpose. Here's one approach to appropriately reshaping the data with the reshape package.

library(reshape)
data.m <- melt(data, id = "id")

To compute a table for all levels, with margins, you could use

cast(data.m, value ~ variable, margins = T)

For a subset, take the relevant subset of data.m.

回复收藏 0 原文

羁客 2024-09-22 04:12:06

以下是如何使用基本 R 命令来完成此操作。如果每列都具有相同的因子水平，则不需要 for 循环，但该循环将是一个很好的故障保护。

> set.seed(21)
> df <- data.frame(
+   id=1:20,
+   a1=sample(letters[1:4],20,TRUE),
+   a2=sample(letters[1:5],20,TRUE),
+   a3=sample(letters[2:5],20,TRUE),
+   a4=sample(letters[1:5],20,TRUE),
+   a5=sample(letters[1:5],20,TRUE),
+   a6=sample(letters[1:5],20,TRUE) )
> 
> for(i in 2:NCOL(df)) {
+   levels(df[,i]) <- list(a="a",b="b",c="c",d="d",e="e")
+ }
> 
> addmargins(mapply(table,df[,-1]))
    a1 a2 a3 a4 a5 a6 Sum
a    6  2  0  2  5  3  18
b    3  3  7  2  1  3  19
c    5  3  1  6  5  3  23
d    6  8  6  1  5  3  29
e    0  4  6  9  4  8  31
Sum 20 20 20 20 20 20 120

Here's how to do it using base R commands. You don't need the for loop if every column has the same factor levels, but the loop would be a good fail-safe.

> set.seed(21)
> df <- data.frame(
+   id=1:20,
+   a1=sample(letters[1:4],20,TRUE),
+   a2=sample(letters[1:5],20,TRUE),
+   a3=sample(letters[2:5],20,TRUE),
+   a4=sample(letters[1:5],20,TRUE),
+   a5=sample(letters[1:5],20,TRUE),
+   a6=sample(letters[1:5],20,TRUE) )
> 
> for(i in 2:NCOL(df)) {
+   levels(df[,i]) <- list(a="a",b="b",c="c",d="d",e="e")
+ }
> 
> addmargins(mapply(table,df[,-1]))
    a1 a2 a3 a4 a5 a6 Sum
a    6  2  0  2  5  3  18
b    3  3  7  2  1  3  19
c    5  3  1  6  5  3  23
d    6  8  6  1  5  3  29
e    0  4  6  9  4  8  31
Sum 20 20 20 20 20 20 120

回复收藏 0 原文

~没有更多了~