创建一个函数来识别缺失值

发布于 2025-01-09 10:10:25 字数 530 浏览 0 评论 0原文

我正在尝试构建一个函数作为 R 中更大函数的一部分。有些部分工作正常,但其他部分则不然。这是给我带来问题的代码片段。

这部分函数旨在识别数据框中的变量是否丢失,然后生成一个新变量来记录该特定情况是否丢失或存在。我希望新变量具有后缀 .zero(q1 变为 q1_zero,q2 变为 q2_zero 等)。我可以毫无问题地生成后缀。创建新变量会导致一些问题。任何见解将不胜感激。

function1 <- function (x, data) {
  # new variable name
  temp <- paste (x, .zero, sep="", collapse = NULL)
  temp
  
  # is variable missing
  # I don't know if I should use this method or ifelse()
  data$temp [is.na (data$x)]<- 0
  data$temp [!is.na (data$x)]<- 1
 return (data$temp)
  }

I am trying to build a function as part of a larger function in R. Some of the pieces are working fine but others are not. Here is the piece of the code that is giving me issues.

This part of the function is designed to identify if a variable in a dataframe is missing, then generate a new variable which records if that specific case is missing or present. I want the new variable to have the suffix .zero (q1 becomes q1_zero, q2 becomes q2_zero, etc.). I can generate the suffix without any issues. Creating the new variable is causing some problems. Any insight would be greatly appreciated.

function1 <- function (x, data) {
  # new variable name
  temp <- paste (x, .zero, sep="", collapse = NULL)
  temp
  
  # is variable missing
  # I don't know if I should use this method or ifelse()
  data$temp [is.na (data$x)]<- 0
  data$temp [!is.na (data$x)]<- 1
 return (data$temp)
  }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

第七度阳光i 2025-01-16 10:10:25

您遇到了一些问题

  • .zero 未定义,您需要带引号的字符串 ".zero"
  • 您不能使用 $列名称存储在字符串中。您需要使用data[[temp]]而不是data$temp如果您想了解更多信息,请参阅以下相关常见问题解答
  • 可能想要返回整个修改后的数据框,而不仅仅是您添加的列(我假设这是因为您将整个数据框传递给了函数)。

我们还可以做一些简化,paste0()paste(sep = "")as.integer(!is.na(data$)的快捷方式x)) 是一种更干净、更有效的创造价值观的方式。

将所有这些放在一起:

function1 <- function (x, data) {
  data[[paste0(x, ".zero")]] = as.integer(!is.na(data[[x]]))
  return(data)
}

我想添加一点注释来说明 .zero 后缀对于值是否缺失而言并不是特别有用。更好的后缀可能类似于 .present —— 1 表示该值存在,0 表示不存在。

同样,对于函数来说,function1 绝对是一个糟糕的名称。使用描述性名称。 add_present_column 会是一个更好的名字。 (通常最好给函数命名为动词。)

由于我看到 Konrad 编辑了问题,我还会提到 R 函数中不需要 return() 。函数的最后一行将被返回,从风格上讲,许多人更喜欢函数的最后一行只是 data 而不是 return(data)

You've got a few issues

  • .zero isn't defined, you want the quoted string ".zero"
  • You can't use $ with column names stored in strings. You need to use data[[temp]] not data$temp. Here's the related FAQ if you want to read more.
  • You probably want to return the whole modified data frame, not just the column you added (I'm assuming this since you passed the whole data frame in to the function).

We can also make some simplifications, paste0() is a shortcut for paste(sep = "") and as.integer(!is.na(data$x)) is a cleaner and more efficient way to create your values.

Putting this all together:

function1 <- function (x, data) {
  data[[paste0(x, ".zero")]] = as.integer(!is.na(data[[x]]))
  return(data)
}

I'd add a little commentary to say that the .zero suffix is not particularly informative for whether or not a value is missing. A better suffix might be something like .present -- a 1 indicates the value is present, a 0 indicates it is not.

Similarly, function1 is an absolutely terrible name for a function. Use descriptive names. add_present_column would be a much better name. (It's often nice to give functions names that are verbs.)

Since I see Konrad editing the question, I'll also mention that return() isn't needed in R functions. The last line of the function will be returned, and stylistically many would prefer that the last line of the function just be data not return(data).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文