逻辑条件中的 Rollapply 百分比(R 中的滚动率)

发布于 2025-01-14 02:12:26 字数 1144 浏览 0 评论 0原文

我在 R 中有一个数据框,其中有两列,其逻辑条件如下所示:

check1 = as.logical(c(rep("TRUE",3),rep("FALSE",2),rep("TRUE",3),rep("FALSE",2)))
check2 = as.logical(c(rep("TRUE",5),rep("FALSE",2),rep("TRUE",3)))
dat = cbind(check1,check2)

结果:

    check1 check2
 [1,]   TRUE   TRUE
 [2,]   TRUE   TRUE
 [3,]   TRUE   TRUE
 [4,]  FALSE   TRUE
 [5,]  FALSE   TRUE
 [6,]   TRUE  FALSE
 [7,]   TRUE  FALSE
 [8,]   TRUE   TRUE
 [9,]  FALSE   TRUE
[10,]  FALSE   TRUE

我想滚动计算每列上 TRUE 的百分比,理想情况下必须如下所示:

check1check2
1/11/1
2/ 22/2
3/33/3
3/44/4
3/55/5
4/65/6
5/75/7
6/86/8
6/97/9
6/108/10

也许...

dat%>%
  mutate(cumsum(check1)/seq_along(check1))

有什么帮助吗?

I have a data frame in R with two columns with logical conditions that looks like this :

check1 = as.logical(c(rep("TRUE",3),rep("FALSE",2),rep("TRUE",3),rep("FALSE",2)))
check2 = as.logical(c(rep("TRUE",5),rep("FALSE",2),rep("TRUE",3)))
dat = cbind(check1,check2)

resulting to :

    check1 check2
 [1,]   TRUE   TRUE
 [2,]   TRUE   TRUE
 [3,]   TRUE   TRUE
 [4,]  FALSE   TRUE
 [5,]  FALSE   TRUE
 [6,]   TRUE  FALSE
 [7,]   TRUE  FALSE
 [8,]   TRUE   TRUE
 [9,]  FALSE   TRUE
[10,]  FALSE   TRUE

I want to roll calculate the percentage of TRUEs on each column which ideally must look like this :

check1check2
1/11/1
2/22/2
3/33/3
3/44/4
3/55/5
4/65/6
5/75/7
6/86/8
6/97/9
6/108/10

maybe ...

dat%>%
  mutate(cumsum(check1)/seq_along(check1))

Any help ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

骑趴 2025-01-21 02:12:26

你快到了;只需使用 across 将函数应用到两列即可。

或者,您可以使用 dplyr::cummean 来计算运行比例。

关于术语的说明:滚动通常是指在固定大小的窗口内计算统计数据(例如平均值或最大值)。另一方面,累积统计信息是在从索引 1(或第一行)开始的不断增加的窗口中计算的。请参阅有关窗口函数的小插图。使用正确的术语可以帮助您在文档中搜索适当的函数。

library("tidyverse")

check1 <- as.logical(c(rep("TRUE", 3), rep("FALSE", 2), rep("TRUE", 3), rep("FALSE", 2)))
check2 <- as.logical(c(rep("TRUE", 5), rep("FALSE", 2), rep("TRUE", 3)))
dat <- cbind(check1, check2)

cummeans <- as_tibble(dat) %>%
  mutate(
    across(c(check1, check2), ~ cumsum(.) / row_number())
  )

cummeans <- as_tibble(dat) %>%
  mutate(
    across(c(check1, check2), cummean)
  )
cummeans
#> # A tibble: 10 × 2
#>    check1 check2
#>     <dbl>  <dbl>
#>  1  1      1    
#>  2  1      1    
#>  3  1      1    
#>  4  0.75   1    
#>  5  0.6    1    
#>  6  0.667  0.833
#>  7  0.714  0.714
#>  8  0.75   0.75 
#>  9  0.667  0.778
#> 10  0.6    0.8

# Plot the cumulative proportions on the y-axis, with one panel for each check
cummeans %>%
  # The example data has no index column; will use the row ids instead
  rowid_to_column() %>%
  pivot_longer(
    c(check1, check2),
    names_to = "check",
    values_to = "cummean"
  ) %>%
  ggplot(
    aes(rowid, cummean, color = check)
  ) +
  geom_line() +
  # Proportions have a natural range from 0 to 1
  scale_y_continuous(
    limits = c(0, 1)
  )

reprex 包 (v2.0.1)

You are almost there; just use across to apply your function to both columns.

Alternatively, you can use dplyr::cummean to compute the running proportions.

A note about terminology: rolling usually refers to computing a statistic (such as the mean or the max) within a fixed-size window. On the other hand, cumulative statistics are computed in an ever-increasig window starting from index 1 (or the first row). See the vignette on window functions. Using the right term may help you to search the documentation for the appropriate function.

library("tidyverse")

check1 <- as.logical(c(rep("TRUE", 3), rep("FALSE", 2), rep("TRUE", 3), rep("FALSE", 2)))
check2 <- as.logical(c(rep("TRUE", 5), rep("FALSE", 2), rep("TRUE", 3)))
dat <- cbind(check1, check2)

cummeans <- as_tibble(dat) %>%
  mutate(
    across(c(check1, check2), ~ cumsum(.) / row_number())
  )

cummeans <- as_tibble(dat) %>%
  mutate(
    across(c(check1, check2), cummean)
  )
cummeans
#> # A tibble: 10 × 2
#>    check1 check2
#>     <dbl>  <dbl>
#>  1  1      1    
#>  2  1      1    
#>  3  1      1    
#>  4  0.75   1    
#>  5  0.6    1    
#>  6  0.667  0.833
#>  7  0.714  0.714
#>  8  0.75   0.75 
#>  9  0.667  0.778
#> 10  0.6    0.8

# Plot the cumulative proportions on the y-axis, with one panel for each check
cummeans %>%
  # The example data has no index column; will use the row ids instead
  rowid_to_column() %>%
  pivot_longer(
    c(check1, check2),
    names_to = "check",
    values_to = "cummean"
  ) %>%
  ggplot(
    aes(rowid, cummean, color = check)
  ) +
  geom_line() +
  # Proportions have a natural range from 0 to 1
  scale_y_continuous(
    limits = c(0, 1)
  )

Created on 2022-03-14 by the reprex package (v2.0.1)

桃扇骨 2025-01-21 02:12:26

1) 这给出了分数形式的结果。

library(zoo)

rollapplyr(dat, 1:nrow(dat), mean)
##          check1    check2
##  [1,] 1.0000000 1.0000000
##  [2,] 1.0000000 1.0000000
##  [3,] 1.0000000 1.0000000
##  [4,] 0.7500000 1.0000000
##  [5,] 0.6000000 1.0000000
##  [6,] 0.6666667 0.8333333
##  [7,] 0.7142857 0.7142857
##  [8,] 0.7500000 0.7500000
##  [9,] 0.6666667 0.7777778
## [10,] 0.6000000 0.8000000

1a) 要获得百分比乘以 100:

100 * rollapplyr(dat, 1:nrow(dat), mean)

2) 或仅使用基本 R:

apply(dat, 2, cumsum) / row(dat)

2a) 或百分比

100 * apply(dat, 2, cumsum) / row(dat)

1) This gives the result as fractions.

library(zoo)

rollapplyr(dat, 1:nrow(dat), mean)
##          check1    check2
##  [1,] 1.0000000 1.0000000
##  [2,] 1.0000000 1.0000000
##  [3,] 1.0000000 1.0000000
##  [4,] 0.7500000 1.0000000
##  [5,] 0.6000000 1.0000000
##  [6,] 0.6666667 0.8333333
##  [7,] 0.7142857 0.7142857
##  [8,] 0.7500000 0.7500000
##  [9,] 0.6666667 0.7777778
## [10,] 0.6000000 0.8000000

1a) To get a percentage multiply that by 100:

100 * rollapplyr(dat, 1:nrow(dat), mean)

2) or using only base R:

apply(dat, 2, cumsum) / row(dat)

2a) or as a percentage

100 * apply(dat, 2, cumsum) / row(dat)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文