R:从Magrittr到本地管道的过渡和功能的翻译

发布于 2025-02-11 17:13:08 字数 5251 浏览 4 评论 0原文

请查看帖子结尾处的Reprex。 由于各种原因,我正在从%>%过渡到本机管道。 有时候我有点挣扎,我需要一些关于几个功能的评论。 在第一种情况下(使用|>的完整_data()函数),我不明白为什么我某些方法有效,而另一种方法则不理解。

在第二种情况下,(move_row()函数),我找到了一个解决方法,但这并不能很好地概括到我拥有的其他功能上。使用Magrittr,我可以创建一系列包含nrow(。)的管道,以传递我当时拥有的任何tibble的行数。我该如何使用本机管道做同样的事情? 多谢!

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

## First look at these functions. They just try to discard incomplete rows in
## a tibble

complete_data <- function(data){

res <- data %>% filter(complete.cases(.))

return(res)

}

## By trial and error, I wrote this

complete_data_native <- function(data){

res <- data |>  (\(data) filter(data, complete.cases(data)))()

return(res)

}

## this was my first attempt, but why does it fail?

complete_data_native_wrong <- function(data){

res <- data |>  (\(x) filter(x, complete.cases(x)))()

return(res)

}


df <- structure(list(x = c(1, 2, NA, 4), y = c(NA, NA, 3, 4)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))


df
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3    NA     3
#> 4     4     4

df |> complete_data()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     4     4

df |> complete_data_native()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     4     4

df |> complete_data_native_wrong() ### why does this fail
#> # A tibble: 3 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3     4     4


## Now another function. Given a tibble, it moves a row from ini_pos to fin_pos

move_row <- function(df, ini_pos, fin_pos){

    row_pick <- slice(df, ini_pos)

    if (fin_pos=="last"){

           res <- df   %>%
        slice(-ini_pos)  %>% 
        add_row(row_pick, .before = nrow(.))    
 
        
} else{
    
    res <- df   %>%
        slice(-ini_pos)  %>% 
        add_row(row_pick, .before = fin_pos)    
}

    return(res)
}


move_row_native_attempt <-  function(df, ini_pos, fin_pos){

    ll <- nrow(df) ## it gets the job done, but I do not want this

    
row_pick <- slice(df, ini_pos)

    if (fin_pos=="last"){

           res <- df  |>  
        slice(-ini_pos) |> 
            add_row(row_pick, .before = ll) ##I want to use the native pipe
        ## to write the equivalent of nrow(.)
        ## with magrittr placeholder but I cannot do that
 
        
} else{
    
    res <- df |>  
        slice(-ini_pos) |>  
        add_row(row_pick, .before = fin_pos)    
}

    return(res)
}



df
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3    NA     3
#> 4     4     4

df |> move_row(1,"last")
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     2    NA
#> 2    NA     3
#> 3     1    NA
#> 4     4     4


df |> move_row_native_attempt(1,"last") ## gets the job done, but it is not what I want. See comments in the function definition
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     2    NA
#> 2    NA     3
#> 3     4     4
#> 4     1    NA


print(sessionInfo())
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 11 (bullseye)
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.0.9
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.39       magrittr_2.0.3   tidyselect_1.1.2 R6_2.5.1        
#>  [5] rlang_1.0.2      fastmap_1.1.0    fansi_1.0.3      stringr_1.4.0   
#>  [9] highr_0.9        tools_4.2.1      xfun_0.31        utf8_1.2.2      
#> [13] DBI_1.1.3        cli_3.3.0        withr_2.5.0      htmltools_0.5.2 
#> [17] ellipsis_0.3.2   assertthat_0.2.1 yaml_2.3.5       digest_0.6.29   
#> [21] tibble_3.1.7     lifecycle_1.0.1  crayon_1.5.1     purrr_0.3.4     
#> [25] vctrs_0.4.1      fs_1.5.2         glue_1.6.2       evaluate_0.15   
#> [29] rmarkdown_2.14   reprex_2.0.1     stringi_1.7.6    compiler_4.2.1  
#> [33] pillar_1.7.0     generics_0.1.2   pkgconfig_2.0.3

Please have a look at the reprex at the end of the post.
For various reasons, I am transitioning from %>% to the native pipe.
I struggle a bit sometimes and I need some comments on a couple of functions.
In the first case (complete_data() function to be rewritten using |> ), I do not understand why I certain approach works and another one does not.

In the second case, (move_row() function), I have found a workaround but this does not generalize well to other functions I have. With magrittr, I can create a series of pipes which contain nrow(.) to pass the number of rows of whatever tibble I have at that point to a function. How can I do the same with the native pipe?
Thanks a lot!

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

## First look at these functions. They just try to discard incomplete rows in
## a tibble

complete_data <- function(data){

res <- data %>% filter(complete.cases(.))

return(res)

}

## By trial and error, I wrote this

complete_data_native <- function(data){

res <- data |>  (\(data) filter(data, complete.cases(data)))()

return(res)

}

## this was my first attempt, but why does it fail?

complete_data_native_wrong <- function(data){

res <- data |>  (\(x) filter(x, complete.cases(x)))()

return(res)

}


df <- structure(list(x = c(1, 2, NA, 4), y = c(NA, NA, 3, 4)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))


df
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3    NA     3
#> 4     4     4

df |> complete_data()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     4     4

df |> complete_data_native()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     4     4

df |> complete_data_native_wrong() ### why does this fail
#> # A tibble: 3 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3     4     4


## Now another function. Given a tibble, it moves a row from ini_pos to fin_pos

move_row <- function(df, ini_pos, fin_pos){

    row_pick <- slice(df, ini_pos)

    if (fin_pos=="last"){

           res <- df   %>%
        slice(-ini_pos)  %>% 
        add_row(row_pick, .before = nrow(.))    
 
        
} else{
    
    res <- df   %>%
        slice(-ini_pos)  %>% 
        add_row(row_pick, .before = fin_pos)    
}

    return(res)
}


move_row_native_attempt <-  function(df, ini_pos, fin_pos){

    ll <- nrow(df) ## it gets the job done, but I do not want this

    
row_pick <- slice(df, ini_pos)

    if (fin_pos=="last"){

           res <- df  |>  
        slice(-ini_pos) |> 
            add_row(row_pick, .before = ll) ##I want to use the native pipe
        ## to write the equivalent of nrow(.)
        ## with magrittr placeholder but I cannot do that
 
        
} else{
    
    res <- df |>  
        slice(-ini_pos) |>  
        add_row(row_pick, .before = fin_pos)    
}

    return(res)
}



df
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1    NA
#> 2     2    NA
#> 3    NA     3
#> 4     4     4

df |> move_row(1,"last")
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     2    NA
#> 2    NA     3
#> 3     1    NA
#> 4     4     4


df |> move_row_native_attempt(1,"last") ## gets the job done, but it is not what I want. See comments in the function definition
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     2    NA
#> 2    NA     3
#> 3     4     4
#> 4     1    NA


print(sessionInfo())
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 11 (bullseye)
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.0.9
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.39       magrittr_2.0.3   tidyselect_1.1.2 R6_2.5.1        
#>  [5] rlang_1.0.2      fastmap_1.1.0    fansi_1.0.3      stringr_1.4.0   
#>  [9] highr_0.9        tools_4.2.1      xfun_0.31        utf8_1.2.2      
#> [13] DBI_1.1.3        cli_3.3.0        withr_2.5.0      htmltools_0.5.2 
#> [17] ellipsis_0.3.2   assertthat_0.2.1 yaml_2.3.5       digest_0.6.29   
#> [21] tibble_3.1.7     lifecycle_1.0.1  crayon_1.5.1     purrr_0.3.4     
#> [25] vctrs_0.4.1      fs_1.5.2         glue_1.6.2       evaluate_0.15   
#> [29] rmarkdown_2.14   reprex_2.0.1     stringi_1.7.6    compiler_4.2.1  
#> [33] pillar_1.7.0     generics_0.1.2   pkgconfig_2.0.3

Created on 2022-06-29 by the reprex package (v2.0.1)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

や莫失莫忘 2025-02-18 17:13:08

完整_data_native_wrong()

complete_data_native_wrong <- function(data){

res <- data |>  (\(x) filter(x, complete.cases(x)))()

return(res)

}

数据蒙版是此可爱函数无法正常工作的原因。

“那么,实际发生了什么?”,你问。

dplyr :: filtr()检查一个名为x的列,它确实找到了它,然后将该列的内容传递给postem.casse.cases()。当您使用y而不是x时,也会发生同样的情况。

完整cases()最终在“向量”上起作用,而不是data.frame,因此结果。

“但是...我如何确保dplyr :: filter()不采取这种行动?”,您询问。

这就是bang-bang运算符!!进来的地方。我们现在可以拥有pountty_data_native_right()

complete_data_native_right <- function(data){

res <- data |>  (\(x) filter(x, complete.cases(!!x)))()
# res <- data |>  (\(y) filter(y, complete.cases(!!y)))()

return(res)

}

move_row_native_attempt()

为此,您没有任何打ic,可以使用速记功能符号:

move_row_native_attempt <-  function(df, ini_pos, fin_pos){
  row_pick <- slice(df, ini_pos)
  
  if (fin_pos=="last"){
    res <- df |> 
      slice(-ini_pos) |> 
      (\(x) add_row(x, row_pick, .before = nrow(x)))()
    
  } else{
    res <- df |> 
      slice(-ini_pos) |> 
      add_row(row_pick, .before = fin_pos)
  }
  
  return(res)
}

complete_data_native_wrong():

complete_data_native_wrong <- function(data){

res <- data |>  (\(x) filter(x, complete.cases(x)))()

return(res)

}

Data masking is the reason that this lovely function doesn't work as expected.

"So, what actually happens?", you ask.

dplyr::filter() checks for a column named x, it indeed finds it, then passes the contents of that column to complete.cases(). The same happens when you use y instead of x.

complete.cases() ends up acting on a "vector" instead of a data.frame, hence the results.

"But... How do I ensure dplyr::filter() doesn't act that way?", you enquire.

That's where the bang-bang operator !! comes in. And we can now have complete_data_native_right():

complete_data_native_right <- function(data){

res <- data |>  (\(x) filter(x, complete.cases(!!x)))()
# res <- data |>  (\(y) filter(y, complete.cases(!!y)))()

return(res)

}

move_row_native_attempt():

For this one you can use the shorthand function notation without any hiccups:

move_row_native_attempt <-  function(df, ini_pos, fin_pos){
  row_pick <- slice(df, ini_pos)
  
  if (fin_pos=="last"){
    res <- df |> 
      slice(-ini_pos) |> 
      (\(x) add_row(x, row_pick, .before = nrow(x)))()
    
  } else{
    res <- df |> 
      slice(-ini_pos) |> 
      add_row(row_pick, .before = fin_pos)
  }
  
  return(res)
}

滿滿的愛 2025-02-18 17:13:08

我认为这仅仅是因为数据框中有列x,而filter是使用此x而不是参数x到您的在线功能。如果您在函数声明中将变量名从x更改为z,我认为它有效。请参阅下面。

尽管如此,我认为这是对iris |&gt的基本管道的罢工。滤波器(完整。Cases(_))引发错误。 _只能用作管道函数的命名参数,并且不能用作可以用作变量的限制?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

complete_data_native_wrong <- function(data){
  res <- data |>  (\(z) filter(z, complete.cases(z)))()  # change to z
  return(res)
}

df <- structure(
  list(x = c(1, 2, NA, 4), 
       y = c(NA, NA, 3, 4)), 
  class = c("tbl_df", 
            "tbl", "data.frame"), 
  row.names = c(NA, -4L)
)

df |> complete_data_native_wrong()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     4     4

I think it's simply because there is a column x in the data frame, and filter is using this x instead of the argument x to your in-line function. If you change the variable name from x to z in your function declaration, I think it works. Please see below.

Still, I think it's a strike against the base pipe that iris |> filter(complete.cases(_)) throws an error. Is the limitation that _ can only be used as a named argument to the piped function, and can't be used as a variable like . can?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

complete_data_native_wrong <- function(data){
  res <- data |>  (\(z) filter(z, complete.cases(z)))()  # change to z
  return(res)
}

df <- structure(
  list(x = c(1, 2, NA, 4), 
       y = c(NA, NA, 3, 4)), 
  class = c("tbl_df", 
            "tbl", "data.frame"), 
  row.names = c(NA, -4L)
)

df |> complete_data_native_wrong()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     4     4

Created on 2022-06-29 by the reprex package (v2.0.1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文