R:在大型数据帧中的多行上迭代渔夫测试以逐行获取输出
我有一个包含多个分类值的大型数据集,这些分类值在两个不同的组中具有不同的整数值(计数)。
例如
Element <- c("zinc", "calcium", "magnesium", "sodium", "carbon", "nitrogen")
no_A <- c(45, 143, 10, 35, 70, 40)
no_B <- c(10, 11, 1, 4, 40, 30)
elements_df <- data.frame(Element, no_A, no_B)
Element | no_A | no_B |
---|---|---|
Zinc | 45 | 10 |
Calcium | 143 | 11 |
Magnesium | 10 | 1 |
Sodium | 35 | 4 |
Carbon | 70 | 40 |
Nitrogen | 40 | 30 |
以前我只是使用下面的代码并手动更改 x 来获取输出值:
x = "calcium"
n1 = (elements_df %>% filter(Element== x))$no_A
n2 = sum(elements_df$no_A) - n1
n3 = (elements_df %>% filter(Element== x))$no_B
n4 = sum(elements_df$no_B) - n3
fisher.test(matrix(c(n1, n2, n3, n4), nrow = 2, ncol = 2, byrow = TRUE))
但是我有一个非常大的值具有 4000 行的数据集,我想要最有效的方法来迭代所有这些行并查看哪些具有重要的 p价值观。
我想我需要一个 for 循环和函数,尽管我已经浏览了一些以前的类似问题(我觉得我没有一个可以使用),并且似乎使用 apply 可能是可行的方法。
那么,简而言之,任何人都可以帮助我编写代码来迭代每行中的 x 并打印出每个元素相应的 p 值和优势比吗?
I have a large dataset with multiple categorical values that have different integer values (counts) in two different groups.
As an example
Element <- c("zinc", "calcium", "magnesium", "sodium", "carbon", "nitrogen")
no_A <- c(45, 143, 10, 35, 70, 40)
no_B <- c(10, 11, 1, 4, 40, 30)
elements_df <- data.frame(Element, no_A, no_B)
Element | no_A | no_B |
---|---|---|
Zinc | 45 | 10 |
Calcium | 143 | 11 |
Magnesium | 10 | 1 |
Sodium | 35 | 4 |
Carbon | 70 | 40 |
Nitrogen | 40 | 30 |
Previously I’ve just been using the code below and changing x manually to get the output values:
x = "calcium"
n1 = (elements_df %>% filter(Element== x))$no_A
n2 = sum(elements_df$no_A) - n1
n3 = (elements_df %>% filter(Element== x))$no_B
n4 = sum(elements_df$no_B) - n3
fisher.test(matrix(c(n1, n2, n3, n4), nrow = 2, ncol = 2, byrow = TRUE))
But I have a very large dataset with 4000 rows and I’d like the most efficient way to iterate through all of them and see which have significant p values.
I imagined I’d need a for loop and function, although I’ve looked through a few previous similar questions (none that I felt I could use) and it seems using apply might be the way to go.
So, in short, can anyone help me with writing code that iterates over x in each row and prints out the corresponding p values and odds ratio for each element?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以将它们全部放在一个漂亮的数据框中,如下所示:
You could get them all in a nice data frame like this: