从 R 中具有缺失值的 2 个二元变量创建新的条件因子变量

发布于 2025-01-15 19:32:43 字数 869 浏览 3 评论 0原文

我有两个因子变量(T2ENNAT、P2ANYLNG),每个变量都有两个级别 0 = 无多语言,1 = 多语言。 两者都有一些缺失值。

现在我想创建一个新的因子变量,将两者结合起来,条件如下:

  1. 如果两个变量之一为 1,另一个为 0 或缺失 -> 。新变量应为 1。
  2. 如果两个变量均为 0 ->新变量应为 0
  3. 如果两个变量之一为 0,另一个为 missng -> 如果两者都缺失,则新变量应为 0 -
  4. >新变量应该是 NA (缺失)

我一步一步开始并尝试了以下代码:

data$T2Multi = with(data,  
ifelse(T2ENNAT == "Multilingual" & P2ANYLNG == "Multilingual", 1,
ifelse(T2ENNAT == "NO Multilingual" & P2ANYLNG == "NO Multilingual", 0,
ifelse(T2ENNAT == "Multilingual" & P2ANYLNG == "NO Multilingual", 1,
ifelse(T2ENNAT == "NO Multilingual" & P2ANYLNG == "Multilingual", 1,
ifelse(is.na(T2ENNAT) & P2ANYLNG =="Multilingual",1,
       NA))))))

前 4 个条件有效。然而,最后一个却没有。如果 T2ENNAT 缺失且 P2ANYLanguage = 1(多语言),则将 NA 分配给新变量。

我不明白这条线的问题。我认为 is.na(variable) 函数不起作用。你知道如何解决这个问题吗?

I have two factor variables (T2ENNAT, P2ANYLNG) which have each the two levels
0 = NO Multilingual and 1 = Multilingual.
Both have serveral missing values.

Now I want to create a new factor variable that combines the two with the following conditions:

  1. If one of the both variables is 1 and the other is either 0 or missing-> new variable should be 1.
  2. If both variables are 0 -> new variable should be 0
  3. If one of the both variables is 0 and the other is missng -> new variable should be 0
  4. If both are missing -> new variable should be NA (missing)

I startet step by step and tried the following code:

data$T2Multi = with(data,  
ifelse(T2ENNAT == "Multilingual" & P2ANYLNG == "Multilingual", 1,
ifelse(T2ENNAT == "NO Multilingual" & P2ANYLNG == "NO Multilingual", 0,
ifelse(T2ENNAT == "Multilingual" & P2ANYLNG == "NO Multilingual", 1,
ifelse(T2ENNAT == "NO Multilingual" & P2ANYLNG == "Multilingual", 1,
ifelse(is.na(T2ENNAT) & P2ANYLNG =="Multilingual",1,
       NA))))))

The first 4 conditions are working. However, the last one does not. R assings NA to the new variable if T2ENNAT is missing and P2ANYLanguage = 1 (Multilingual).

I do not understand the problem with this line. I think somehow the is.na(variable) function does not work. Do you know how to adress this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

烟燃烟灭 2025-01-22 19:32:43

这是一种向量化的方式。

  1. data == "Multilingual" 返回一个 TRUE 逻辑矩阵,其中数据条目为 "Multilingual"FALSE否则(“禁止多语言”NA);
  2. 将矩阵行值相加,如果总和等于或大于 1,则至少有一个 “多语言”,并且新列为 1
  3. 如果逻辑矩阵 is.na(data[1:2]) 的行总和等于 2,则该行中的所有值都会丢失,并且新列条目为NA

两行基本 R 代码行即可解决该问题。

data$T2Multi <- +(rowSums(data == "Multilingual", na.rm = TRUE) >= 1L)
is.na(data$T2Multi) <- rowSums(is.na(data[1:2])) == 2L
data
#>            T2ENNAT        P2ANYLNG T2Multi
#> 1     Multilingual    Multilingual       1
#> 2     Multilingual            <NA>       1
#> 3             <NA> No Multilingual       0
#> 4             <NA> No Multilingual       0
#> 5  No Multilingual No Multilingual       0
#> 6     Multilingual    Multilingual       1
#> 7  No Multilingual    Multilingual       1
#> 8             <NA>            <NA>      NA
#> 9  No Multilingual No Multilingual       0
#> 10    Multilingual    Multilingual       1
#> 11            <NA> No Multilingual       0
#> 12 No Multilingual    Multilingual       1
#> 13            <NA>            <NA>      NA
#> 14    Multilingual No Multilingual       1
#> 15    Multilingual No Multilingual       1
#> 16 No Multilingual            <NA>       0
#> 17    Multilingual    Multilingual       1
#> 18    Multilingual    Multilingual       1
#> 19    Multilingual    Multilingual       1
#> 20    Multilingual            <NA>       1

reprex 包 (v2.0.1)

测试 于 2022 年 3 月 21 日创建数据集

set.seed(2022)
n <- 20
data <- data.frame(
  T2ENNAT = factor(rbinom(n, 1, 0.5), labels = c("No Multilingual", "Multilingual")),
  P2ANYLNG = factor(rbinom(n, 1, 0.5), labels = c("No Multilingual", "Multilingual"))
)
data[] <- lapply(data, \(x){
  is.na(x) <- sample(n, n/4)
  x
})

reprex 包 (v2.0.1)

Here is a vectorized way.

  1. data == "Multilingual" returns a logical matrix of TRUE where the data entries are "Multilingual" and FALSE otherwise ("No Multilingual" or NA);
  2. the matrix row values are added and if the sums are equal or greater than 1, there's at least one "Multilingual" and the new column is a 1.
  3. if the row sums of the logical matrix is.na(data[1:2]) are equal to 2, then all values in that row are missing and the new column entry is NA.

Two base R code lines will solve the problem.

data$T2Multi <- +(rowSums(data == "Multilingual", na.rm = TRUE) >= 1L)
is.na(data$T2Multi) <- rowSums(is.na(data[1:2])) == 2L
data
#>            T2ENNAT        P2ANYLNG T2Multi
#> 1     Multilingual    Multilingual       1
#> 2     Multilingual            <NA>       1
#> 3             <NA> No Multilingual       0
#> 4             <NA> No Multilingual       0
#> 5  No Multilingual No Multilingual       0
#> 6     Multilingual    Multilingual       1
#> 7  No Multilingual    Multilingual       1
#> 8             <NA>            <NA>      NA
#> 9  No Multilingual No Multilingual       0
#> 10    Multilingual    Multilingual       1
#> 11            <NA> No Multilingual       0
#> 12 No Multilingual    Multilingual       1
#> 13            <NA>            <NA>      NA
#> 14    Multilingual No Multilingual       1
#> 15    Multilingual No Multilingual       1
#> 16 No Multilingual            <NA>       0
#> 17    Multilingual    Multilingual       1
#> 18    Multilingual    Multilingual       1
#> 19    Multilingual    Multilingual       1
#> 20    Multilingual            <NA>       1

Created on 2022-03-21 by the reprex package (v2.0.1)

Test data set

set.seed(2022)
n <- 20
data <- data.frame(
  T2ENNAT = factor(rbinom(n, 1, 0.5), labels = c("No Multilingual", "Multilingual")),
  P2ANYLNG = factor(rbinom(n, 1, 0.5), labels = c("No Multilingual", "Multilingual"))
)
data[] <- lapply(data, \(x){
  is.na(x) <- sample(n, n/4)
  x
})

Created on 2022-03-21 by the reprex package (v2.0.1)

无尽的现实 2025-01-22 19:32:43

该问题的 tidyverse 解决方案:

library(tidyverse)

# Data set of possible cases
d <- crossing(x = c(0:1, NA), 
              y = x) 

d |> 
  mutate(z = +case_when(
    x | y ~ x|y,
    !(x&y) ~ F
  ))
#> # A tibble: 9 × 3
#>       x     y     z
#>   <int> <int> <int>
#> 1     0     0     0
#> 2     0     1     1
#> 3     0    NA     0
#> 4     1     0     1
#> 5     1     1     1
#> 6     1    NA     1
#> 7    NA     0     0
#> 8    NA     1     1
#> 9    NA    NA    NA

reprex 包于 2022 年 3 月 21 日创建(v2.1) 0.1)

A tidyverse solution to the problem:

library(tidyverse)

# Data set of possible cases
d <- crossing(x = c(0:1, NA), 
              y = x) 

d |> 
  mutate(z = +case_when(
    x | y ~ x|y,
    !(x&y) ~ F
  ))
#> # A tibble: 9 × 3
#>       x     y     z
#>   <int> <int> <int>
#> 1     0     0     0
#> 2     0     1     1
#> 3     0    NA     0
#> 4     1     0     1
#> 5     1     1     1
#> 6     1    NA     1
#> 7    NA     0     0
#> 8    NA     1     1
#> 9    NA    NA    NA

Created on 2022-03-21 by the reprex package (v2.0.1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文