跨多列二进制数据

发布于 2025-01-21 09:31:09 字数 60 浏览 0 评论 0原文

嗨,我有数据框,在我的DF值的13列中,我想用1,2替换为0和3,4,用1和删除5。 ?因为需要更改13列

Hi I have dataframe and in 13 columns of my df values are coded from 1 to 5. I want to replace 1,2 with 0 and 3,4 with 1 and drop 5. How I can make a change in my current data without mutating? Because there are 13 columns needed to be changed

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

伴我心暖 2025-01-28 09:31:09

您可以尝试以下代码(借入@benson23 ,谢谢!)

> df[] <- (df >= 3) * NA^(df == 5)

> df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  1 NA NA  0  0  1  0 NA NA   0   0   0  NA
2  1  1  1  0  0 NA  0 NA NA   0   0   0  NA
3  0  0  1 NA  1  1  0  1  1   1   1   0   1
4  0  0  0  1  1  0  1  0 NA   0   1   0   0
5  1  1  1  0  0 NA  1  0  0  NA   1   1   1
  • df&gt; = 3 产生布尔矩阵,由truefalse
  • na^(df == 5)产生一个由na 或1,因为na^0 = 11^na = na = na,并且此矩阵将播放为屏蔽
  • 元素 -两个矩阵之间的明智产品保留了非na条目,也将布尔值转换为数字

You can try the code below (borrow data from @benson23, thanks!)

> df[] <- (df >= 3) * NA^(df == 5)

> df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  1 NA NA  0  0  1  0 NA NA   0   0   0  NA
2  1  1  1  0  0 NA  0 NA NA   0   0   0  NA
3  0  0  1 NA  1  1  0  1  1   1   1   0   1
4  0  0  0  1  1  0  1  0 NA   0   1   0   0
5  1  1  1  0  0 NA  1  0  0  NA   1   1   1
  • df >=3 yields boolean matrix consisting of TRUE or FALSE
  • NA^(df == 5) yields a matrix consisting of NA or 1, since NA^0 = 1 and 1^NA = NA, and this matrix will play as a mask
  • The element-wise product between two matrix retains the non-NA entries and also turns boolean values to numerics
吻泪 2025-01-28 09:31:09

我要么将转换作为两步过程(因为有两个规则),要么编写一个封装您的规则并应用这些函数。我将在以下内容中使用“ dplyr” 突变,因为这似乎是您正在使用的:

这是两个步骤的过程:

df |>
    mutate(across(everything(), ~ replace(.x, .x == 5L, NA))) |>
    mutate(across(everything(), ~ .x >= 3L))

在这里使用函数:

myrule = function (x) {
    if_else(x == 5L, NA, x >= 3L)
}

df |> mutate(across(everything(), myrule))

这里至关重要。您将功能Myrule一个适合您问题域的描述名称。

I would either perform the conversion as a two-step process (since there are two rules), or write a function that encapsulates your rules, and apply those. I’ll be using ‘dplyr’ mutate in the following since that seems to be what you’re using:

Here’s the two-step process:

df |>
    mutate(across(everything(), ~ replace(.x, .x == 5L, NA))) |>
    mutate(across(everything(), ~ .x >= 3L))

And here it is using a function:

myrule = function (x) {
    if_else(x == 5L, NA, x >= 3L)
}

df |> mutate(across(everything(), myrule))

Here it is crucial that you give the function myrule a descriptive name that fits your problem domain.

苦行僧 2025-01-28 09:31:09

假设我们有此刺激的数据框:

set.seed(123)

df <- matrix(rep(sample(1:5, 5*13, replace = T)), ncol = 13) %>% as.data.frame()

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  2  1  5  5  5  4  5  3  1   1   1   2   4
2  4  3  3  2  4  3  5  4  3   2   3   1   1
3  3  5  4  1  4  3  4  2  3   3   4   4   2
4  5  3  3  2  5  2  4  2  2   2   1   5   2
5  5  3  1  5  4  1  1  2  1   5   3   2   5

基本R

我们可以首先设置df == 5 na,并使用逻辑表达式查看值是否更大或等于3(由@Danlooo提出的评论提出)。

+(df&gt; = 3)语法用于将df&gt; = 3的逻辑输出转换为整数。

df[df == 5] <- NA
df <- as.data.frame(+(df >= 3))

dplyr

或我们可以在中使用突变在组合中使用dplyr软件包中的组合。

library(dplyr)

df <- df %>% mutate(across(everything(), ~case_when(.x %in% 1:2 ~ 0, 
                                                    .x %in% 3:4 ~ 1, 
                                                    TRUE ~ NA_real_)))

输出

df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  0  0 NA NA NA  1 NA  1  0   0   0   0   1
2  1  1  1  0  1  1 NA  1  1   0   1   0   0
3  1 NA  1  0  1  1  1  0  1   1   1   1   0
4 NA  1  1  0 NA  0  1  0  0   0   0  NA   0
5 NA  1  0 NA  1  0  0  0  0  NA   1   0  NA

数据

这是dput(df),以便于更轻松的数据加载。

structure(list(V1 = c(2L, 4L, 3L, 5L, 5L), V2 = c(1L, 3L, 5L, 
3L, 3L), V3 = c(5L, 3L, 4L, 3L, 1L), V4 = c(5L, 2L, 1L, 2L, 5L
), V5 = c(5L, 4L, 4L, 5L, 4L), V6 = c(4L, 3L, 3L, 2L, 1L), V7 = c(5L, 
5L, 4L, 4L, 1L), V8 = c(3L, 4L, 2L, 2L, 2L), V9 = c(1L, 3L, 3L, 
2L, 1L), V10 = c(1L, 2L, 3L, 2L, 5L), V11 = c(1L, 3L, 4L, 1L, 
3L), V12 = c(2L, 1L, 4L, 5L, 2L), V13 = c(4L, 1L, 2L, 2L, 5L)), class = "data.frame", row.names = c(NA, 
-5L))

Let's say we have this stimulated dataframe:

set.seed(123)

df <- matrix(rep(sample(1:5, 5*13, replace = T)), ncol = 13) %>% as.data.frame()

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  2  1  5  5  5  4  5  3  1   1   1   2   4
2  4  3  3  2  4  3  5  4  3   2   3   1   1
3  3  5  4  1  4  3  4  2  3   3   4   4   2
4  5  3  3  2  5  2  4  2  2   2   1   5   2
5  5  3  1  5  4  1  1  2  1   5   3   2   5

Base R

We can first set df == 5 to NA, and use a logical expression to see if values are greater then or equal to 3 (proposed by @danlooo in the comment).

The +(df >= 3) syntax is used to convert logical output of df >= 3 to integer.

df[df == 5] <- NA
df <- as.data.frame(+(df >= 3))

dplyr

Or we can use the mutate with across combination in the dplyr package.

library(dplyr)

df <- df %>% mutate(across(everything(), ~case_when(.x %in% 1:2 ~ 0, 
                                                    .x %in% 3:4 ~ 1, 
                                                    TRUE ~ NA_real_)))

Output

df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  0  0 NA NA NA  1 NA  1  0   0   0   0   1
2  1  1  1  0  1  1 NA  1  1   0   1   0   0
3  1 NA  1  0  1  1  1  0  1   1   1   1   0
4 NA  1  1  0 NA  0  1  0  0   0   0  NA   0
5 NA  1  0 NA  1  0  0  0  0  NA   1   0  NA

Data

Here's the dput(df) for easier data loading.

structure(list(V1 = c(2L, 4L, 3L, 5L, 5L), V2 = c(1L, 3L, 5L, 
3L, 3L), V3 = c(5L, 3L, 4L, 3L, 1L), V4 = c(5L, 2L, 1L, 2L, 5L
), V5 = c(5L, 4L, 4L, 5L, 4L), V6 = c(4L, 3L, 3L, 2L, 1L), V7 = c(5L, 
5L, 4L, 4L, 1L), V8 = c(3L, 4L, 2L, 2L, 2L), V9 = c(1L, 3L, 3L, 
2L, 1L), V10 = c(1L, 2L, 3L, 2L, 5L), V11 = c(1L, 3L, 4L, 1L, 
3L), V12 = c(2L, 1L, 4L, 5L, 2L), V13 = c(4L, 1L, 2L, 2L, 5L)), class = "data.frame", row.names = c(NA, 
-5L))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文