从 R 中具有缺失值的 2 个二元变量创建新的条件因子变量
我有两个因子变量(T2ENNAT、P2ANYLNG),每个变量都有两个级别 0 = 无多语言,1 = 多语言。 两者都有一些缺失值。
现在我想创建一个新的因子变量,将两者结合起来,条件如下:
- 如果两个变量之一为 1,另一个为 0 或缺失 -> 。新变量应为 1。
- 如果两个变量均为 0 ->新变量应为 0
- 如果两个变量之一为 0,另一个为 missng -> 如果两者都缺失,则新变量应为 0 -
- >新变量应该是 NA (缺失)
我一步一步开始并尝试了以下代码:
data$T2Multi = with(data,
ifelse(T2ENNAT == "Multilingual" & P2ANYLNG == "Multilingual", 1,
ifelse(T2ENNAT == "NO Multilingual" & P2ANYLNG == "NO Multilingual", 0,
ifelse(T2ENNAT == "Multilingual" & P2ANYLNG == "NO Multilingual", 1,
ifelse(T2ENNAT == "NO Multilingual" & P2ANYLNG == "Multilingual", 1,
ifelse(is.na(T2ENNAT) & P2ANYLNG =="Multilingual",1,
NA))))))
前 4 个条件有效。然而,最后一个却没有。如果 T2ENNAT 缺失且 P2ANYLanguage = 1(多语言),则将 NA 分配给新变量。
我不明白这条线的问题。我认为 is.na(variable) 函数不起作用。你知道如何解决这个问题吗?
I have two factor variables (T2ENNAT, P2ANYLNG) which have each the two levels
0 = NO Multilingual and 1 = Multilingual.
Both have serveral missing values.
Now I want to create a new factor variable that combines the two with the following conditions:
- If one of the both variables is 1 and the other is either 0 or missing-> new variable should be 1.
- If both variables are 0 -> new variable should be 0
- If one of the both variables is 0 and the other is missng -> new variable should be 0
- If both are missing -> new variable should be NA (missing)
I startet step by step and tried the following code:
data$T2Multi = with(data,
ifelse(T2ENNAT == "Multilingual" & P2ANYLNG == "Multilingual", 1,
ifelse(T2ENNAT == "NO Multilingual" & P2ANYLNG == "NO Multilingual", 0,
ifelse(T2ENNAT == "Multilingual" & P2ANYLNG == "NO Multilingual", 1,
ifelse(T2ENNAT == "NO Multilingual" & P2ANYLNG == "Multilingual", 1,
ifelse(is.na(T2ENNAT) & P2ANYLNG =="Multilingual",1,
NA))))))
The first 4 conditions are working. However, the last one does not. R assings NA to the new variable if T2ENNAT is missing and P2ANYLanguage = 1 (Multilingual).
I do not understand the problem with this line. I think somehow the is.na(variable) function does not work. Do you know how to adress this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一种向量化的方式。
data == "Multilingual"
返回一个TRUE
逻辑矩阵,其中数据条目为"Multilingual"
和FALSE
否则(“禁止多语言”
或NA
);“多语言”
,并且新列为1
。is.na(data[1:2])
的行总和等于2
,则该行中的所有值都会丢失,并且新列条目为NA
。两行基本 R 代码行即可解决该问题。
由 reprex 包 (v2.0.1)
测试 于 2022 年 3 月 21 日创建数据集
由 reprex 包 (v2.0.1)
Here is a vectorized way.
data == "Multilingual"
returns a logical matrix ofTRUE
where the data entries are"Multilingual"
andFALSE
otherwise ("No Multilingual"
orNA
);"Multilingual"
and the new column is a1
.is.na(data[1:2])
are equal to2
, then all values in that row are missing and the new column entry isNA
.Two base R code lines will solve the problem.
Created on 2022-03-21 by the reprex package (v2.0.1)
Test data set
Created on 2022-03-21 by the reprex package (v2.0.1)
该问题的 tidyverse 解决方案:
由 reprex 包于 2022 年 3 月 21 日创建(v2.1) 0.1)
A tidyverse solution to the problem:
Created on 2022-03-21 by the reprex package (v2.0.1)