使用 R 计算比率矩阵

发布于 2025-01-19 04:45:33 字数 1060 浏览 0 评论 0原文

我想知道是否有一种简单的方法来计算数据框中每个元素的比率矩阵。示例 -

gene sample1 sample2 sample3 sample4 .....
aa     2       2       3      2
aa     1       5       2      1
aa     4       1       2      3
bb     1       2       1      2
bb     2       1       1      2

I 是为每列基因中的公共行值计算的从样本 1 到样本 4 的每个元素的比率。计算将是这样的 -

gene sample1 sample2 sample3 sample4 .....
aa     2/7     2/8     3/7      2/6
aa     1/7     5/8     2/7      1/6
aa     4/7     1/8     2/7      3/6
bb     1/3     2/3     1/2      2/4
bb     2/3     1/3     1/2      2/4

结果将是这样的 -

gene  sample1  sample2  sample3  sample4 .....
aa     .28       .25       .42      .33
aa     .14       .62       .28      .16
aa     .57       .12       .28      .5
bb     .33       .66       .5       .5
bb     .66       .33       .5       .5

我在循环中尝试过的是这样的 -

tf <- dd %>%
        group_by(symbol) %>%
        summarise_if(is.numeric, mean)

但这是总结但不计算每个元素并保持初始数据帧的相同矩阵维度（例如这里是 dd ）。任何建议将不胜感激。

原文

I was wondering if there is a simple method to calculate a ratio matrix for each element in a data frame. Example -

gene sample1 sample2 sample3 sample4 .....
aa     2       2       3      2
aa     1       5       2      1
aa     4       1       2      3
bb     1       2       1      2
bb     2       1       1      2

and I was the ratio for each element from sample1 to sample4 calculated for common row values in gene in each column. The calculation would be like this -

gene sample1 sample2 sample3 sample4 .....
aa     2/7     2/8     3/7      2/6
aa     1/7     5/8     2/7      1/6
aa     4/7     1/8     2/7      3/6
bb     1/3     2/3     1/2      2/4
bb     2/3     1/3     1/2      2/4

The result would be like this -

gene  sample1  sample2  sample3  sample4 .....
aa     .28       .25       .42      .33
aa     .14       .62       .28      .16
aa     .57       .12       .28      .5
bb     .33       .66       .5       .5
bb     .66       .33       .5       .5

What I have tried in a loop is this -

tf <- dd %>%
        group_by(symbol) %>%
        summarise_if(is.numeric, mean)

but this summarises but does not calculate for each element and keep the same matrix dimension of initial data frame (e.g here its dd). Any suggestion would be most appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

红衣飘飘貌似仙 2025-01-26 04:45:33

您可以做：

library(dplyr)

dat %>%
  group_by(gene) %>%
  mutate(across(everything(), proportions)) %>% 
  ungroup()

# A tibble: 5 x 5
  gene  sample1 sample2 sample3 sample4
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>
1 aa      0.286   0.25    0.429   0.333
2 aa      0.143   0.625   0.286   0.167
3 aa      0.571   0.125   0.286   0.5  
4 bb      0.333   0.667   0.5     0.5  
5 bb      0.667   0.333   0.5     0.5

如果您想忽略的缺少值，请使用：

dat %>%
  group_by(gene) %>%
  mutate(across(everything(),  ~ .x / sum(.x, na.rm = TRUE)))

数据：

dat <- structure(list(gene = c("aa", "aa", "aa", "bb", "bb"), sample1 = c(2, 
1, 4, 1, 2), sample2 = c(2, 5, 1, 2, 1), sample3 = c(3, 2, 2, 
1, 1), sample4 = c(2, 1, 3, 2, 2)), class = "data.frame", row.names = c(NA, 
-5L))

You can do:

library(dplyr)

dat %>%
  group_by(gene) %>%
  mutate(across(everything(), proportions)) %>% 
  ungroup()

# A tibble: 5 x 5
  gene  sample1 sample2 sample3 sample4
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>
1 aa      0.286   0.25    0.429   0.333
2 aa      0.143   0.625   0.286   0.167
3 aa      0.571   0.125   0.286   0.5  
4 bb      0.333   0.667   0.5     0.5  
5 bb      0.667   0.333   0.5     0.5

If you have missing values that you'd like to ignore, use:

dat %>%
  group_by(gene) %>%
  mutate(across(everything(),  ~ .x / sum(.x, na.rm = TRUE)))

Data:

dat <- structure(list(gene = c("aa", "aa", "aa", "bb", "bb"), sample1 = c(2, 
1, 4, 1, 2), sample2 = c(2, 5, 1, 2, 1), sample3 = c(3, 2, 2, 
1, 1), sample4 = c(2, 1, 3, 2, 2)), class = "data.frame", row.names = c(NA, 
-5L))

回复收藏 0 原文

眼趣 2025-01-26 04:45:33

这是 data.table 的一个选项

> library(data.table)

> setDT(df)[,lapply(.SD,proportions),gene]
   gene   sample1   sample2   sample3   sample4
1:   aa 0.2857143 0.2500000 0.4285714 0.3333333
2:   aa 0.1428571 0.6250000 0.2857143 0.1666667
3:   aa 0.5714286 0.1250000 0.2857143 0.5000000
4:   bb 0.3333333 0.6666667 0.5000000 0.5000000
5:   bb 0.6666667 0.3333333 0.5000000 0.5000000

Here is an option with data.table

> library(data.table)

> setDT(df)[,lapply(.SD,proportions),gene]
   gene   sample1   sample2   sample3   sample4
1:   aa 0.2857143 0.2500000 0.4285714 0.3333333
2:   aa 0.1428571 0.6250000 0.2857143 0.1666667
3:   aa 0.5714286 0.1250000 0.2857143 0.5000000
4:   bb 0.3333333 0.6666667 0.5000000 0.5000000
5:   bb 0.6666667 0.3333333 0.5000000 0.5000000

回复收藏 0 原文

~没有更多了~