我想从多级别分类数据集创建一个用于 Chisq 测试的 2X2 表

发布于 2025-01-14 09:12:34 字数 330 浏览 2 评论 0原文

我有一个比赛和结果的数据集（Y，N），我想列出一个 2X2 表来为每场比赛运行 chisq 测试。

  Asian     584   24
  Black    1721   56
  Hispanic 2400   90
  White    8164  289

一旦我创建了一个 2X2 的表，那么第一行将是亚洲人，第二行将是非亚洲人（根据总计 - asianNo）和（总计 - asian yes）的值计算，作为该行的第二列。然后，一旦我对所有比赛重复该过程，我就可以轻松地在每场比赛中运行 chisq 测试。有没有更简单的方法来对上表中的每场比赛运行 Chisq 测试？

原文

I have a dataset of race and outcome either (Y,N) I want to tabulate a 2X2 table to run a chisq test for each race.

  Asian     584   24
  Black    1721   56
  Hispanic 2400   90
  White    8164  289

Once I create a table 2X2 so the first row will be Asian and second row will be non-Asian (counted from values of total - asianNo) and (total- asian yes) as the second column of that row.
Then I can run a chisq test easily on each race once I repeat that process for all races. Is there an easier way to run a Chisq test for each race in my table above?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不知所踪 2025-01-21 09:12:34

如果 d 是您的数据，那么下面将生成 2x2 表的列表，

tables <- lapply(1:nrow(d), function(x) rbind(d[x,], colSums(d[-x,])))
names(tables) <- rownames(d)

然后您可以将 chi-sq 测试应用于每个

lapply(tables, chisq.test )

输入：

d  = matrix(c(584,24,1721,56,2400,90,8164,289), nrow=4, byrow=T,dimnames = list(c("Asian", "Black", "Hispanic", "White")))

If d is your data, then the below will produce your list of 2x2 tables

tables <- lapply(1:nrow(d), function(x) rbind(d[x,], colSums(d[-x,])))
names(tables) <- rownames(d)

You can then apply the chi-sq test to each

lapply(tables, chisq.test )

Input:

d  = matrix(c(584,24,1721,56,2400,90,8164,289), nrow=4, byrow=T,dimnames = list(c("Asian", "Black", "Hispanic", "White")))

回复收藏 0 原文

后eg是否自 2025-01-21 09:12:34

这是 map2 的一个选项，其中第一行是个人比赛，第二行是其他比赛，然后我根据特定比赛命名每个列表。

library(tidyverse)

pull(df, V1) %>%
  map2(
    .,
    replicate(nrow(df), df, simplify = FALSE),
    .f = function(x, y)
      y %>%
      filter(V1 != x) %>%
      summarise(across(-V1, sum)) %>%
      bind_rows(filter(y, V1 == x) %>% dplyr::select(-V1), .)
  ) %>%
  set_names(., pull(df, V1))

输出

$Asian
     V2  V3
1   584  24
2 12285 435

$Black
     V2  V3
1  1721  56
2 11148 403

$Hispanic
     V2  V3
1  2400  90
2 10469 369

$White
    V2  V3
1 8164 289
2 4705 170

数据

df <- structure(list(V1 = c("Asian", "Black", "Hispanic", "White"), 
    V2 = c(584L, 1721L, 2400L, 8164L), V3 = c(24L, 56L, 90L, 
    289L)), class = "data.frame", row.names = c(NA, -4L))

Here is one option with map2, where the first row is an individual race and the second row are the others, then I name each list according to the specific race.

library(tidyverse)

pull(df, V1) %>%
  map2(
    .,
    replicate(nrow(df), df, simplify = FALSE),
    .f = function(x, y)
      y %>%
      filter(V1 != x) %>%
      summarise(across(-V1, sum)) %>%
      bind_rows(filter(y, V1 == x) %>% dplyr::select(-V1), .)
  ) %>%
  set_names(., pull(df, V1))

Output

$Asian
     V2  V3
1   584  24
2 12285 435

$Black
     V2  V3
1  1721  56
2 11148 403

$Hispanic
     V2  V3
1  2400  90
2 10469 369

$White
    V2  V3
1 8164 289
2 4705 170

Data

df <- structure(list(V1 = c("Asian", "Black", "Hispanic", "White"), 
    V2 = c(584L, 1721L, 2400L, 8164L), V3 = c(24L, 56L, 90L, 
    289L)), class = "data.frame", row.names = c(NA, -4L))

回复收藏 0 原文

依靠 2025-01-21 09:12:34

这是另一种方法。首先设置主表：

tbl <- as.matrix(df[, -1])
Sums <- matrix(colSums(tbl), nrow(tbl), 2, byrow=TRUE)
Tbl <- cbind(tbl, Sums-tbl)
row.names(Tbl) <- df[, 1]
Tbl
#           Yes  No   Yes  No
# Asian     584  24 12285 435
# Black    1721  56 11148 403
# Hispanic 2400  90 10469 369
# White    8164 289  4705 170

现在有一个函数从Tbl中的一行创建2x2表：

ChiSqTable <- function(row) {
    matrix(Tbl[row, ], 2, 2, byrow=TRUE, dimnames=list(Race=c(df[row, 1],
         paste("Not", df[row, 1])), Question=c("Yes", "No")))
}

最后创建卡方表并运行测试：

Tables <- lapply(seq(nrow(Tbl)), ChiSqTable)
names(Tables) <- df[, 1]
ChiSqStats <- lapply(Tables, chisq.test)
names(ChiSqStats) <- df[, 1]

Tables[[1]]   # or Tables[["Asian"]]
#            Question
# Race          Yes  No
#   Asian       584  24
#   Not Asian 12285 435
ChiSqStats[[1]]
# 
#   Pearson's Chi-squared test with Yates' continuity correction
# 
# data:  X[[i]]
# X-squared = 0.33997, df = 1, p-value = 0.5598

访问剩余的表，通过指定数字或Race统计结果。卡方检验的所有结果都会被保存，例如

ChiSqStats[[1]]$expected
#            Question
# Race               Yes        No
#   Asian       587.0612  20.93878
#   Not Asian 12281.9388 438.06122
ChiSqStats[[1]]$residuals
#            Question
# Race                Yes         No
#   Asian     -0.12634367  0.6689899
#   Not Asian  0.02762242 -0.1462607

Here is another approach. First set up the master table:

tbl <- as.matrix(df[, -1])
Sums <- matrix(colSums(tbl), nrow(tbl), 2, byrow=TRUE)
Tbl <- cbind(tbl, Sums-tbl)
row.names(Tbl) <- df[, 1]
Tbl
#           Yes  No   Yes  No
# Asian     584  24 12285 435
# Black    1721  56 11148 403
# Hispanic 2400  90 10469 369
# White    8164 289  4705 170

Now a function to create 2x2 tables from a row in Tbl:

ChiSqTable <- function(row) {
    matrix(Tbl[row, ], 2, 2, byrow=TRUE, dimnames=list(Race=c(df[row, 1],
         paste("Not", df[row, 1])), Question=c("Yes", "No")))
}

Finally create Chi Square tables and run the test:

Tables <- lapply(seq(nrow(Tbl)), ChiSqTable)
names(Tables) <- df[, 1]
ChiSqStats <- lapply(Tables, chisq.test)
names(ChiSqStats) <- df[, 1]

Tables[[1]]   # or Tables[["Asian"]]
#            Question
# Race          Yes  No
#   Asian       584  24
#   Not Asian 12285 435
ChiSqStats[[1]]
# 
#   Pearson's Chi-squared test with Yates' continuity correction
# 
# data:  X[[i]]
# X-squared = 0.33997, df = 1, p-value = 0.5598

Access the remaining tables, statistical results by specifying the number or Race. All of the results of the Chi Square Test are saved, e.g.

ChiSqStats[[1]]$expected
#            Question
# Race               Yes        No
#   Asian       587.0612  20.93878
#   Not Asian 12281.9388 438.06122
ChiSqStats[[1]]$residuals
#            Question
# Race                Yes         No
#   Asian     -0.12634367  0.6689899
#   Not Asian  0.02762242 -0.1462607

回复收藏 0 原文

~没有更多了~