基于从不同列获得的值创建一个新列,使用突变()和case_当r中的case_
我是一个对R的新生,从这里浏览中学到很多东西,最近我一直陷入了一些尝试之后,尝试了几个小时,仍然无法弄清楚该怎么做。让我们提出以下数据集:
ID Y1 Y2 Y3 Y4
1 0 0 1 1
2 0 0 0 0 0
3 Na Na Na Na Na
i要创建一个新列,根据以下条件填充它:
- 如果行包含1 ,返回1,无论na或0
- 和Na的混合物,但不
- 1
为
包含0
是否 0 0 1 1 1
2 0 0 0 0
3
na
Data2 <- Data %>% mutate(Outcome = case_when(
Data$Y1 == "na" &
Data$Y2 == "na" &
Data$Y3 == "na" &
Data$Y4 == "na" ~ "na")) %>%
mutate(Outcome = case_when(Data$Y1 == 1 ~ "1",
Data$Y2 == 1 ~ "1",
Data$Y3 == 1 ~ "1",
Data$Y4 == 1 ~ "1",
TRUE ~ "No"))
na
na
na na na na na na na
0 0 3 na na na na na na a na na na i尝试:将返回:id y1 y2 y2 y3 y4结果1 0 0 1 1 1 1 2 0 0 0 0 0 0 0
na na na na na na na na na na Na 0
似乎忽略了条件3,其中仅包含na,返回na。
关于我做错了什么的任何指示,都将不胜感激。
请原谅格式,我不确定如何使它变得更漂亮,因为这是我第一次在这里问一个问题。
非常感谢!
[编辑]感谢Shah,我注意到有可能造成混乱,因为我深表歉意。我需要澄清一下,这只是数据集的一部分,可以说明要点。我正在处理一个包含更多列的大数据集,其中一些也具有数字值。
I am a student relatively new to R and have learnt a lot from browsing here, I have been stuck on something recently which after hours of trying still haven't been able to figure out what to do. Let's propose the following data set:
ID Y1 Y2 Y3 Y4
1 0 0 1 1
2 0 0 0 0
3 NA NA NA NA
I want to create a new column where it is filled based upon the following the conditions:
- If the row contains 1, return 1 regardless of NA or 0
- If it contains a mix of 0 and NA but not 1, return 0
- If it only contains NA, return NA
So using the example above I wanted to get the following:
ID Y1 Y2 Y3 Y4 Outcome
1 0 0 1 1 1
2 0 0 0 0 0
3 NA NA NA NA NA
However, the code I tried:
Data2 <- Data %>% mutate(Outcome = case_when(
Data$Y1 == "na" &
Data$Y2 == "na" &
Data$Y3 == "na" &
Data$Y4 == "na" ~ "na")) %>%
mutate(Outcome = case_when(Data$Y1 == 1 ~ "1",
Data$Y2 == 1 ~ "1",
Data$Y3 == 1 ~ "1",
Data$Y4 == 1 ~ "1",
TRUE ~ "No"))
will return with:
ID Y1 Y2 Y3 Y4 Outcome
1 0 0 1 1 1
2 0 0 0 0 0
3 NA NA NA NA 0
which seems to ignore condition 3 where if it only contains na, return na.
Any pointers as to what I done wrong would be greatly appreciated.
Please forgive the formatting, I'm not sure how I could make it prettier as this is the first time I asked a question here.
Many thanks in advance!
[Edit] Thanks to Shah I noticed that there is potential for confusion, for that I apologise. I need give some clarification that this is just a segment of the data set to get the point across. I'm dealing with a big dataset which contains more columns, some of which also have numeric values.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
检查每列(
y1
,y2
,y3
等)太乏味了,无法扩展。如果您有100列需要,这将成为一个大问题。如示例所示,您需要忽略第一列(
id
),并在计算中包含所有其他列,可以执行以下操作。-1
在答案中是忽略第一列id
。还使用
is.na
比较na
值。数据
Checking for each column (
Y1
,Y2
,Y3
etc) is too tedious and not scalable. It becomes a big problem if you have 100 columns where you need this.As showed in example you want to ignore the 1st column (
ID
) and include all other columns in the calculation you can do the following.-1
in the answer is to ignore the 1st columnID
.Also use
is.na
to compare theNA
values.data
功能分别处理每行
dplyr
rowwise
函数进行尝试,该You can try this using
dplyr
rowwise
function which treat each row separately