基于从不同列获得的值创建一个新列,使用突变()和case_当r中的case_

发布于 2025-02-11 13:30:15 字数 1348 浏览 2 评论 0原文

我是一个对R的新生,从这里浏览中学到很多东西,最近我一直陷入了一些尝试之后,尝试了几个小时,仍然无法弄清楚该怎么做。让我们提出以下数据集:

ID Y1 Y2 Y3 Y4

1 0 0 1 1

2 0 0 0 0 0

3 Na Na Na Na Na

i要创建一个新列,根据以下条件填充它:

  1. 如果行包含1 ,返回1,无论na或0
  2. 和Na的混合物,但不
  3. 1

包含0

是否 0 0 1 1 1

2 0 0 0 0

3

na

Data2 <- Data %>% mutate(Outcome = case_when( 
                                Data$Y1 == "na" &
                                Data$Y2 == "na" &
                                Data$Y3 == "na" &
                                Data$Y4 == "na" ~ "na"))  %>%                                
          mutate(Outcome = case_when(Data$Y1 == 1 ~ "1", 
                                 Data$Y2 == 1 ~ "1", 
                                 Data$Y3 == 1 ~ "1",
                                 Data$Y4 == 1 ~ "1",
                                 TRUE ~ "No"))

na

na

na na na na na na na

0 0 3 na na na na na na a na na na i尝试:将返回:id y1 y2 y2 y3 y4结果1 0 0 1 1 1 1 2 0 0 0 0 0 0 0

na na na na na na na na na na Na 0

似乎忽略了条件3,其中仅包含na,返回na。

关于我做错了什么的任何指示,都将不胜感激。

请原谅格式,我不确定如何使它变得更漂亮,因为这是我第一次在这里问一个问题。

非常感谢!

[编辑]感谢Shah,我注意到有可能造成混乱,因为我深表歉意。我需要澄清一下,这只是数据集的一部分,可以说明要点。我正在处理一个包含更多列的大数据集,其中一些也具有数字值。

I am a student relatively new to R and have learnt a lot from browsing here, I have been stuck on something recently which after hours of trying still haven't been able to figure out what to do. Let's propose the following data set:

ID Y1 Y2 Y3 Y4

1 0 0 1 1

2 0 0 0 0

3 NA NA NA NA

I want to create a new column where it is filled based upon the following the conditions:

  1. If the row contains 1, return 1 regardless of NA or 0
  2. If it contains a mix of 0 and NA but not 1, return 0
  3. If it only contains NA, return NA

So using the example above I wanted to get the following:

ID Y1 Y2 Y3 Y4 Outcome

1 0 0 1 1 1

2 0 0 0 0 0

3 NA NA NA NA NA

However, the code I tried:

Data2 <- Data %>% mutate(Outcome = case_when( 
                                Data$Y1 == "na" &
                                Data$Y2 == "na" &
                                Data$Y3 == "na" &
                                Data$Y4 == "na" ~ "na"))  %>%                                
          mutate(Outcome = case_when(Data$Y1 == 1 ~ "1", 
                                 Data$Y2 == 1 ~ "1", 
                                 Data$Y3 == 1 ~ "1",
                                 Data$Y4 == 1 ~ "1",
                                 TRUE ~ "No"))

will return with:

ID Y1 Y2 Y3 Y4 Outcome

1 0 0 1 1 1

2 0 0 0 0 0

3 NA NA NA NA 0

which seems to ignore condition 3 where if it only contains na, return na.

Any pointers as to what I done wrong would be greatly appreciated.

Please forgive the formatting, I'm not sure how I could make it prettier as this is the first time I asked a question here.

Many thanks in advance!

[Edit] Thanks to Shah I noticed that there is potential for confusion, for that I apologise. I need give some clarification that this is just a segment of the data set to get the point across. I'm dealing with a big dataset which contains more columns, some of which also have numeric values.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

挥剑断情 2025-02-18 13:30:15

检查每列(y1y2y3等)太乏味了,无法扩展。如果您有100列需要,这将成为一个大问题。

如示例所示,您需要忽略第一列(id),并在计算中包含所有其他列,可以执行以下操作。 -1在答案中是忽略第一列id

还使用is.na比较na值。

#Count number of non-NA values, this is used later to change the rows
#with all NA values to NA in outcome
non_NA <- rowSums(!is.na(df[-1]))
#Assign 1 if the count of 1 is greater than 0 in a row
df$Outcome <- as.integer(rowSums(df[-1], na.rm = TRUE) > 0)
#turn the outcome variable to NA for rows which has all NA values. 
df$Outcome[non_NA == 0] <- NA
df
#  ID Y1 Y2 Y3 Y4 Outcome
#1  1  0  0  1  1       1
#2  2  0  0  0  0       0
#3  3 NA NA NA NA      NA

数据

df <- structure(list(ID = 1:3, Y1 = c(0L, 0L, NA), Y2 = c(0L, 0L, NA
), Y3 = c(1L, 0L, NA), Y4 = c(1L, 0L, NA)), 
class = "data.frame", row.names = c(NA, -3L))

Checking for each column (Y1, Y2, Y3 etc) is too tedious and not scalable. It becomes a big problem if you have 100 columns where you need this.

As showed in example you want to ignore the 1st column (ID) and include all other columns in the calculation you can do the following. -1 in the answer is to ignore the 1st column ID.

Also use is.na to compare the NA values.

#Count number of non-NA values, this is used later to change the rows
#with all NA values to NA in outcome
non_NA <- rowSums(!is.na(df[-1]))
#Assign 1 if the count of 1 is greater than 0 in a row
df$Outcome <- as.integer(rowSums(df[-1], na.rm = TRUE) > 0)
#turn the outcome variable to NA for rows which has all NA values. 
df$Outcome[non_NA == 0] <- NA
df
#  ID Y1 Y2 Y3 Y4 Outcome
#1  1  0  0  1  1       1
#2  2  0  0  0  0       0
#3  3 NA NA NA NA      NA

data

df <- structure(list(ID = 1:3, Y1 = c(0L, 0L, NA), Y2 = c(0L, 0L, NA
), Y3 = c(1L, 0L, NA), Y4 = c(1L, 0L, NA)), 
class = "data.frame", row.names = c(NA, -3L))
野の 2025-02-18 13:30:15

功能分别处理每行

library(dplyr)

df |> rowwise() |> 
mutate(Outcome = case_when(any(c_across(Y1:Y4) == 1) ~ "1" ,
 all(is.na(c_across(Y1:Y4))) ~ NA_character_ , TRUE ~ "0"))

  • 您可以使用dplyr rowwise函数进行尝试,该
# A tibble: 3 × 6
# Rowwise: 
     ID    Y1    Y2    Y3    Y4 Outcome
  <int> <int> <int> <int> <int> <chr>  
1     1     0     0     1     1 1      
2     2     0     0     0     0 0      
3     3    NA    NA    NA    NA NA     

You can try this using dplyr rowwise function which treat each row separately

library(dplyr)

df |> rowwise() |> 
mutate(Outcome = case_when(any(c_across(Y1:Y4) == 1) ~ "1" ,
 all(is.na(c_across(Y1:Y4))) ~ NA_character_ , TRUE ~ "0"))

  • output
# A tibble: 3 × 6
# Rowwise: 
     ID    Y1    Y2    Y3    Y4 Outcome
  <int> <int> <int> <int> <int> <chr>  
1     1     0     0     1     1 1      
2     2     0     0     0     0 0      
3     3    NA    NA    NA    NA NA     
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文