如何将论证传递给函数内部的srvyr？

发布于 2025-01-20 04:23:13 字数 1736 浏览 1 评论 0原文

所以我使用 srvyr 来计算调查对象中变量 (y) 的调查平均值，并按同一调查对象中的分类变量 (x) 进行分组，基本代码如下所示

survey_means <- survey_object %>%
 filter( #remove NAs) %>%
 group_by(x) %>%
 summarise(Mean = survey_mean(y))

假设我想改为放置此块函数内的代码，该函数接受调查对象和两个变量作为参数。这是我实际尝试做的事情的简化版本，该函数将处理最多 4 个左右的变量，但这是基本情况：

SurveyMeanFunc <- function(survey_object, x, y) {

survey_means <- survey_object %>%
 filter( #remove NAs ) %>%
 group_by(survey_object[["variables"]][[x]]) %>%
 summarise(Mean = survey_mean(survey_object[["variables"]][[y]]))
 
return(survey_means) 

}

当尝试使用此函数时，我将始终出现出现一条错误消息

! Assigned data `x` must be compatible with existing data.
x Existing data has n rows.
x Assigned data has m rows. (m > n)
i Only vectors of size 1 are recycled.

，即使我拆分管道，并在使用汇总命令之前验证 x 中的行数与 y 相同，我仍然收到此消息。我不明白 summarise() 在做什么？

[编辑]带有建议更改的完整上下文：

SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1= NULL, categ2= NULL) {
  
  if (is.null(categ1) & is.null(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
              
  } else if (is.null(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  } else {
    
    NULL #fix
    
  }
  
  return(survey_estimate)
  
}

剩下的问题是，使用准引用来解决引用调查变量的问题适用于该 if-else 语句的顶层，但在下一个 else if 块内无法识别函数参数，即使使用 {{}} 以相同的方式对待它们

原文

so I'm using srvyr to calculate survey means of a variable (y) from a survey object, grouping by a categorical variable (x) from that same survey object, and the basic code looks like this

survey_means <- survey_object %>%
 filter( #remove NAs) %>%
 group_by(x) %>%
 summarise(Mean = survey_mean(y))

Suppose I want to instead put this block of code inside a function, which accepts the survey object and two variables as parameters. This is a simplified version of what I'm actually trying to do, which is a function that will handle up to a group of 4 or so variables, but this is the base case:

SurveyMeanFunc <- function(survey_object, x, y) {

survey_means <- survey_object %>%
 filter( #remove NAs ) %>%
 group_by(survey_object[["variables"]][[x]]) %>%
 summarise(Mean = survey_mean(survey_object[["variables"]][[y]]))
 
return(survey_means) 

}

When attempting to use this function I will always be presented with an error message along the lines of

! Assigned data `x` must be compatible with existing data.
x Existing data has n rows.
x Assigned data has m rows. (m > n)
i Only vectors of size 1 are recycled.

Even when I split up the pipes, and verify that the number of rows in x are the same as y right before using the summarise command, I still get this message. What is summarise() doing that I don't understand?

[EDIT] Full Context with suggested changes:

SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1= NULL, categ2= NULL) {
  
  if (is.null(categ1) & is.null(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
              
  } else if (is.null(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  } else {
    
    NULL #fix
    
  }
  
  return(survey_estimate)
  
}

The remaining issue is that using quasiquotation to solve the issue of referencing the survey variables works for the top level of this if-else statement but the function parameters are not recognised inside the next else if block, even though they are treated the same way using {{}}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

与风相奔跑 2025-01-27 04:23:13

您没有给出如何使用该函数的示例，但如果我理解正确，您希望获取第一个代码块并用 x 替换为名称来运行它作为 x 参数传入的变量和作为 y 参数传入的变量名称的 y（仅适用于“remove NA”） ' 行被删除或修复来做某事）

也就是说，你想要SurveyMeanFunc(my_design,species,height) 为

my_design %>%
 group_by(species) %>%
 summarise(Mean = survey_mean(height))

这很复杂，因为您不需要 x 的值或名称 x，您想要名称species。

一种方法是准引用，过去需要 enquo 和 !! 但现在可以使用 {{ }} 运算符更轻松地

SurveyMeanFunc <- function(survey_object, x, y) {
survey_means <- survey_object %>%
 group_by({{ x }}) %>%
 summarise(Mean = survey_mean({{ y }}))
 survey_means
}

完成

> dstrata <- apistrat %>%
+   as_survey(strata = stype, weights = pw)
> 
> SurveyMeanFunc(dstrata, stype, api00)
# A tibble: 3 × 3
  stype  Mean Mean_se
  <fct> <dbl>   <dbl>
1 E      674.    12.5
2 H      626.    15.5
3 M      637.    16.6

< strong>更新

您仍然没有给出如何使用该函数的示例，但我认为这可行

SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1, categ2) {
  
  if (missing(categ1) & missing(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
              
  } else if (missing(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  } else {
    
   survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ categ2 }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  }
  
  return(survey_estimate)
  
}

问题是您无法评估categ1< /code> 或 categ2 中if 条件，如果它们是由用户提供的，因为您没有在调查对象中评估它们。 R 不知道该往哪里看。这是一个问题，因为 tidyverse 使用不带引号的变量名称的方式 - 如果您将它们作为模型公式（就像在调查中那样）或作为带引号的字符串提供，那就没问题了。

missing 函数询问是否提供了参数，在本例中正是您想要的。 rlang 包中有更灵活的 is_missing/maybe_missing 设置；你可以看看另一个选择。但这似乎有效

> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide,comp.imp)
# A tibble: 4 × 5
# Groups:   comp.imp [2]
  comp.imp sch.wide  Mean Mean_low Mean_upp
  <fct>    <fct>    <dbl>    <dbl>    <dbl>
1 No       No       1013.     810.    1216.
2 No       Yes       525.     438.     611.
3 Yes      No        370.     207.     533.
4 Yes      Yes       521.     475.     566.
> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide)
# A tibble: 6 × 5
# Groups:   stype [3]
  stype sch.wide  Mean Mean_low Mean_upp
  <fct> <fct>    <dbl>    <dbl>    <dbl>
1 E     No        420.     340.     499.
2 E     Yes       417.     381.     452.
3 H     No       1520.    1209.    1830.
4 H     Yes      1137.     946.    1328.
5 M     No        967.     709.    1226.
6 M     Yes       775.     669.     881.
> SurveyMeanMedFunc(dstrata,stype,enroll)
# A tibble: 3 × 4
  stype  Mean Mean_low Mean_upp
  <fct> <dbl>    <dbl>    <dbl>
1 E      417.     384.     450.
2 H     1321.    1134.    1508.
3 M      832.     722.     943.

You don't give an example of how you want to use the function, but if I'm understanding correctly, you want to take your first block of code and run it with x replaced by the name of the variable passed in as the x argument and y by the name of the variable passed in as the y argument (only with the 'remove NAs' line deleted or fixed to do something)

That is, you want SurveyMeanFunc(my_design, species, height) to be

my_design %>%
 group_by(species) %>%
 summarise(Mean = survey_mean(height))

This is complicated because you don't want the value of x or the name x, you want the name species.

One way is quasiquotation, which used to require enquo and !! but now can be done more easily with the {{ }} operator

SurveyMeanFunc <- function(survey_object, x, y) {
survey_means <- survey_object %>%
 group_by({{ x }}) %>%
 summarise(Mean = survey_mean({{ y }}))
 survey_means
}

giving

> dstrata <- apistrat %>%
+   as_survey(strata = stype, weights = pw)
> 
> SurveyMeanFunc(dstrata, stype, api00)
# A tibble: 3 × 3
  stype  Mean Mean_se
  <fct> <dbl>   <dbl>
1 E      674.    12.5
2 H      626.    15.5
3 M      637.    16.6

Update

You still don't give an example of how you want to use the function, but I think this works

SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1, categ2) {
  
  if (missing(categ1) & missing(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
              
  } else if (missing(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  } else {
    
   survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ categ2 }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  }
  
  return(survey_estimate)
  
}

The issue is that you can't evaluate categ1 or categ2 in the if condition if they are supplied by the user, because you're not evaluating them in a survey object. R doesn't know where to look. This is a problem because of the way the tidyverse uses unquoted variable names -- if you supplied them as model formulas (as you would in survey) or as quoted strings you'd be ok.

The missing function asks whether an argument was supplied, which in this case is what you want. There's a more flexible is_missing/maybe_missing setup in the rlang package; you could look at that for another option. But this seems to work

> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide,comp.imp)
# A tibble: 4 × 5
# Groups:   comp.imp [2]
  comp.imp sch.wide  Mean Mean_low Mean_upp
  <fct>    <fct>    <dbl>    <dbl>    <dbl>
1 No       No       1013.     810.    1216.
2 No       Yes       525.     438.     611.
3 Yes      No        370.     207.     533.
4 Yes      Yes       521.     475.     566.
> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide)
# A tibble: 6 × 5
# Groups:   stype [3]
  stype sch.wide  Mean Mean_low Mean_upp
  <fct> <fct>    <dbl>    <dbl>    <dbl>
1 E     No        420.     340.     499.
2 E     Yes       417.     381.     452.
3 H     No       1520.    1209.    1830.
4 H     Yes      1137.     946.    1328.
5 M     No        967.     709.    1226.
6 M     Yes       775.     669.     881.
> SurveyMeanMedFunc(dstrata,stype,enroll)
# A tibble: 3 × 4
  stype  Mean Mean_low Mean_upp
  <fct> <dbl>    <dbl>    <dbl>
1 E      417.     384.     450.
2 H     1321.    1134.    1508.
3 M      832.     722.     943.

回复收藏 0 原文

~没有更多了~