具有多个项目的交叉表

发布于 2024-08-25 10:14:59 字数 828 浏览 15 评论 0原文

在 SPSS 中，使用因子（或值）作为表标题来创建具有多个变量的交叉表（相对）容易。因此，类似于以下内容（编造数据等）。 Q1、Q2 和 Q3 每个人的得分为 1、2 或 3。我只是将这些保留为数字，但它们可能是因素，似乎都无助于解决问题。

                        1 (very Often)   2 (Rarely)   3 (Never)
   Q1. Likes it           12              15             13
   Q2. Recommends it      22              11             10
   Q3. Used it            22              12             9

在 SPSS 中，人们甚至可以请求行、列或总百分比。

我已经尝试过 gmodels 中的 table()、ftable()、xtab()、CrossTable() 和 descr 中的 CrossTable()，但这些都不能处理（据我所知）多个变量；它们似乎主要处理一个变量与另一个变量的交叉，第三个变量创建层。

是否有一个包包含一些好的交叉表/表格示例，我可以用它来解决这个问题？我确信我错过了一些简单的东西，所以我很感谢你指出我错过了什么。也许我必须将每一行生成为单独的列表，然后创建一个数据框并打印该数据框？

更新：我现在在包 catspec 中发现了 ctab() ，它也走在正确的轨道上。有趣的是，R 没有与 SPSS 中的 Ctables 一致的等效项，它基本上是一个“制表”工具，就像用于调查研究的旧制表工具一样。 ctab() 正在尝试，并且是令人钦佩的第一步......但您仍然无法用它制作这个表（上面）。

原文

In SPSS, it is (relatively) easy to create a cross tab with multiple variables using the factors (or values) as the table heading. So, something like the following (made up data, etc.). Q1, Q2, and Q3 each have either a 1, a 2, or a 3 for each person. I just left these as numbers, but they could be factors, neither seemed to help solve the problem.

                        1 (very Often)   2 (Rarely)   3 (Never)
   Q1. Likes it           12              15             13
   Q2. Recommends it      22              11             10
   Q3. Used it            22              12             9

In SPSS, one can even request row, column, or total percentages.

I've tried table(), ftable(), xtab(), CrossTable() from gmodels, and CrossTable() from descr, and none of these can handle (afaik) multiple variables; they mostly seem to handle 1 variable crossed with another variable, and the 3rd creates layers.

Is there a package with some good cross tabbing/table examples that I could use to figure this out? I'm sure I'm missing something simple, so I appreciate you pointing out what I missed. Perhaps I have to generate each row as a separate list and then make a dataframe and print the dataframe?

UPDATE: I've now discovered ctab() in package catspec, which is also on the right track. It's interesting that R has no consistent equivalent to Ctables in SPSS, which is basically a "tabbing" tool ala the old tabulate tools used for survey research. ctab() is trying, and is an admirable 1st step... but you still can't make this table (above) with it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何以畏孤独 2024-09-01 10:15:00

修改前面的示例

library(Hmisc)
library(plyr)
dd <- data.frame(q1=sample(1:3, 20, replace=T),
 q2=sample(1:3, 20, replace=T), 
 q3=sample(1:3, 20, replace=T))  #fake data

cross <- ldply(describe(dd), function(x) x$values[1,])[-1]

rownames(cross) <- c("Q1. Likes it","Q2. Recommends it","Q3. Used it")
names(cross) <- c("1 (very Often)","2 (Rarely)","3 (Never)")

现在交叉看起来像这样

> cross
                  1 (very Often) 2 (Rarely) 3 (Never)
Q1. Likes it                   4         10         6
Q2. Recommends it              7          9         4
Q3. Used it                    6          4        10

Modifying a previous example

library(Hmisc)
library(plyr)
dd <- data.frame(q1=sample(1:3, 20, replace=T),
 q2=sample(1:3, 20, replace=T), 
 q3=sample(1:3, 20, replace=T))  #fake data

cross <- ldply(describe(dd), function(x) x$values[1,])[-1]

rownames(cross) <- c("Q1. Likes it","Q2. Recommends it","Q3. Used it")
names(cross) <- c("1 (very Often)","2 (Rarely)","3 (Never)")

Now cross looks like this

> cross
                  1 (very Often) 2 (Rarely) 3 (Never)
Q1. Likes it                   4         10         6
Q2. Recommends it              7          9         4
Q3. Used it                    6          4        10

回复收藏 0 原文

迷路的信 2024-09-01 10:15:00

根本问题是该数据不是整洁格式。当数据被重新调整为“长”形式时，交叉制表多个变量将会更容易。我们可以使用 tidyr 包中的 gather 来做到这一点。

重塑后，许多交叉表功能将起作用；我将使用 janitor 包中的 tabyl （因为 - 完全披露 - 我维护该包并为此目的构建了该函数）。

# Create reproducible sample data
set.seed(1)
possible_values <- c("1 (Very Often)", "2 (Rarely)", "3 (Never)")
some_values <- sample(possible_values, 100, replace = TRUE)
dat <- data.frame(Q1 = some_values[1:25], Q2 = some_values[26:50], 
                 Q3 = some_values[51:75], Q4 = some_values[76:100])

library(tidyr)
library(janitor)

dat %>%
  gather(question, response) %>% 
  tabyl(question, response)
#>   question 1 (Very Often) 2 (Rarely) 3 (Never)
#> 1       Q1              8          8         9
#> 2       Q2              4         11        10
#> 3       Q3              8         12         5
#> 4       Q4              7          7        11

从那里，您可以使用 janitor::adorn_percentages() 等函数进行格式化。

The underlying issue is that this data is not in tidy format. Crosstabbing multiple variables will be easier when the data is reshaped into "long" form. We can do that with gather from the tidyr package.

After reshaping, many crosstab functions will work; I'll use tabyl from the janitor package (since - full disclosure - I maintain that package and built the function for this purpose).

# Create reproducible sample data
set.seed(1)
possible_values <- c("1 (Very Often)", "2 (Rarely)", "3 (Never)")
some_values <- sample(possible_values, 100, replace = TRUE)
dat <- data.frame(Q1 = some_values[1:25], Q2 = some_values[26:50], 
                 Q3 = some_values[51:75], Q4 = some_values[76:100])

library(tidyr)
library(janitor)

dat %>%
  gather(question, response) %>% 
  tabyl(question, response)
#>   question 1 (Very Often) 2 (Rarely) 3 (Never)
#> 1       Q1              8          8         9
#> 2       Q2              4         11        10
#> 3       Q3              8         12         5
#> 4       Q4              7          7        11

From there, you can format with functions like janitor::adorn_percentages().

回复收藏 0 原文

风吹雪碎 2024-09-01 10:15:00

只需检查 Hadley Wickham 的重塑包。
AFAIS，您需要包中的 cast 函数。

回复收藏 0 原文

笔芯 2024-09-01 10:15:00

xtabs 有一个公式界面，需要一些练习才能习惯，但这是可以做到的。如果数据帧 df 中有数据，并且变量名为 ques 和 resp，则可以使用：

xtabs(~ques+resp,data=df)

例如：

> t1 <- rep(c("A","B","C"),5)
> t2 <- rpois(15,4)
> df <- data.frame(ques=t1,resp=t2)
> xtabs(~ques+resp,data=df)
     resp
names 2 3 4 5 6 7 9
    A 1 0 2 1 0 0 1
    B 1 0 0 2 1 1 0
    C 1 2 0 1 0 1 0

xtabs has a formula interface that can take some practice to get used to, but this can be done. If you have the data in a dataframe df and your variables are called ques and resp, you can use:

xtabs(~ques+resp,data=df)

For example:

> t1 <- rep(c("A","B","C"),5)
> t2 <- rpois(15,4)
> df <- data.frame(ques=t1,resp=t2)
> xtabs(~ques+resp,data=df)
     resp
names 2 3 4 5 6 7 9
    A 1 0 2 1 0 0 1
    B 1 0 0 2 1 1 0
    C 1 2 0 1 0 1 0

回复收藏 0 原文

乜一 2024-09-01 10:15:00

从 epiDisplay 查看 tableStack() 包。我想这可能就是您正在寻找的。

回复收藏 0 原文

記憶穿過時間隧道 2024-09-01 10:15:00

您可以使用自定义函数在多个表上使用 rbind() ，如下所示：

multitab <- function(...){
   tabs<-list(...)
   tablist<-lapply(tabs,table)
   bigtab<-t(sapply(tablist,rbind))
   bigtab }

You could use a custom function to use rbind() on several tables, something like this:

multitab <- function(...){
   tabs<-list(...)
   tablist<-lapply(tabs,table)
   bigtab<-t(sapply(tablist,rbind))
   bigtab }

回复收藏 0 原文

抚你发端 2024-09-01 10:14:59

Hmisc 包具有 summary.formula 函数，可以按照您想要的方式执行某些操作。它非常灵活，因此请查看帮助页面的示例，但这里是您问题的应用程序：

library(Hmisc)
dd <- data.frame(Q1=sample(1:3, 20, replace=T), Q2=sample(1:3, 20, replace=T), 
                 Q3=sample(1:3, 20, replace=T))  #fake data
summary(~Q1+Q2+Q3, data=dd, fun=table)

这给出了以下结果：

 Descriptive Statistics  (N=20)

 +------+-------+
 |      |       |
 +------+-------+
 |Q1 : 1|25% (5)|
 +------+-------+
 |    2 |45% (9)|
 +------+-------+
 |    3 |30% (6)|
 +------+-------+
 |Q2 : 1|30% (6)|
 +------+-------+
 |    2 |35% (7)|
 +------+-------+
 |    3 |35% (7)|
 +------+-------+
 |Q3 : 1|35% (7)|
 +------+-------+
 |    2 |30% (6)|
 +------+-------+
 |    3 |35% (7)|
 +------+-------+

可能的值在行中给出，因为它具有针对不同变量的不同值集的灵活性。您也许可以使用函数参数（例如 method 和 fun）来获得另一个方向。

The Hmisc package has the summary.formula function that can do something along the lines you want. It is very flexible, so look at the help page for examples, but here is an application to your problem:

library(Hmisc)
dd <- data.frame(Q1=sample(1:3, 20, replace=T), Q2=sample(1:3, 20, replace=T), 
                 Q3=sample(1:3, 20, replace=T))  #fake data
summary(~Q1+Q2+Q3, data=dd, fun=table)

This gives the following result:

 Descriptive Statistics  (N=20)

 +------+-------+
 |      |       |
 +------+-------+
 |Q1 : 1|25% (5)|
 +------+-------+
 |    2 |45% (9)|
 +------+-------+
 |    3 |30% (6)|
 +------+-------+
 |Q2 : 1|30% (6)|
 +------+-------+
 |    2 |35% (7)|
 +------+-------+
 |    3 |35% (7)|
 +------+-------+
 |Q3 : 1|35% (7)|
 +------+-------+
 |    2 |30% (6)|
 +------+-------+
 |    3 |35% (7)|
 +------+-------+

The possible values are given in rows, because it has the flexibility of different sets of values for different variables. You might be able to play with the function parameters (like method and fun) to get the other direction.

回复收藏 0 原文

~没有更多了~