如何在 R 中为具有分类数据的列子集创建列联表(交叉表)?
我有一个表,其标题如下所示(我已对其进行了简化):
id, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10
其中除 id 之外的每一行都是一个分类变量。让我们将类别命名为 A、B、C、D、E。
我想为某些列创建一个列联表,如下所示(为简洁起见,我没有将样本数字放入细胞)。获得总列/行会很棒,但不是强制性的,我可以稍后自己计算。
a1 a2 a3 a4 Total
----------------------
A|
B|
C|
D|
E|
Total|
因此,问题是如何在 R 中创建基于多列的交叉表?我见过的 table() 和 xtabs() 示例仅使用列。就我而言,这些列是相邻的,因此一个交叉表将汇总列 a1..a4,另一个交叉表将汇总列 a5..a7,依此类推。我希望有一种优雅的方式来做到这一点。
我是一名程序员,但是 R 的新手
。提前谢谢您。
I have a table whose header looks like this (I've simplified it):
id, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10
where each row, except for id, is a categorical variable. Let's name the categories A, B, C, D, E.
I would like to create a contingency table for some of the columns, such as below (for brevity, I have not put sample numbers in the cells). Getting the total column/row would be great, but not mandatory, I can calculate it myself later.
a1 a2 a3 a4 Total
----------------------
A|
B|
C|
D|
E|
Total|
Thus, the question is how to create a crosstab based on multiple columns in R? The examples I've seen with table() and xtabs() use a column only. In my case, the columns are adjacent, so one crosstab would summarize columns a1..a4, another a5..a7 and so on. I hope there is an elegant way to do this.
I'm a programmer, but a newbie in R.
Thank you in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为此,您的数据格式不正确。这是使用
reshape
包适当重塑数据的一种方法。要计算带有边距的所有级别的表,您可以使用
For a subset,获取
data.m
的相关子集。Your data is poorly formatted for this purpose. Here's one approach to appropriately reshaping the data with the
reshape
package.To compute a table for all levels, with margins, you could use
For a subset, take the relevant subset of
data.m
.以下是如何使用基本 R 命令来完成此操作。如果每列都具有相同的因子水平,则不需要
for
循环,但该循环将是一个很好的故障保护。Here's how to do it using base R commands. You don't need the
for
loop if every column has the same factor levels, but the loop would be a good fail-safe.