当前位置：文江博客话题详情

跨多列二进制数据

发布于 2025-01-21 09:31:09 字数 60 浏览 0 评论 0原文

嗨，我有数据框，在我的DF值的13列中，我想用1，2替换为0和3,4，用1和删除5。？因为需要更改13列

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伴我心暖 2025-01-28 09:31:09

您可以尝试以下代码（借入@benson23 ，谢谢！）

> df[] <- (df >= 3) * NA^(df == 5)

> df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  1 NA NA  0  0  1  0 NA NA   0   0   0  NA
2  1  1  1  0  0 NA  0 NA NA   0   0   0  NA
3  0  0  1 NA  1  1  0  1  1   1   1   0   1
4  0  0  0  1  1  0  1  0 NA   0   1   0   0
5  1  1  1  0  0 NA  1  0  0  NA   1   1   1

df＆gt; = 3 产生布尔矩阵，由true或false
na^（df == 5）产生一个由na 或1，因为na^0 = 1和1^na = na = na，并且此矩阵将播放为屏蔽
元素 -两个矩阵之间的明智产品保留了非na条目，也将布尔值转换为数字

You can try the code below (borrow data from @benson23, thanks!)

> df[] <- (df >= 3) * NA^(df == 5)

> df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  1 NA NA  0  0  1  0 NA NA   0   0   0  NA
2  1  1  1  0  0 NA  0 NA NA   0   0   0  NA
3  0  0  1 NA  1  1  0  1  1   1   1   0   1
4  0  0  0  1  1  0  1  0 NA   0   1   0   0
5  1  1  1  0  0 NA  1  0  0  NA   1   1   1

df >=3 yields boolean matrix consisting of TRUE or FALSE
NA^(df == 5) yields a matrix consisting of NA or 1, since NA^0 = 1 and 1^NA = NA, and this matrix will play as a mask
The element-wise product between two matrix retains the non-NA entries and also turns boolean values to numerics

回复收藏 0 原文

吻泪 2025-01-28 09:31:09

我要么将转换作为两步过程（因为有两个规则），要么编写一个封装您的规则并应用这些函数。我将在以下内容中使用“ dplyr” 突变，因为这似乎是您正在使用的：

这是两个步骤的过程：

df |>
    mutate(across(everything(), ~ replace(.x, .x == 5L, NA))) |>
    mutate(across(everything(), ~ .x >= 3L))

在这里使用函数：

myrule = function (x) {
    if_else(x == 5L, NA, x >= 3L)
}

df |> mutate(across(everything(), myrule))

这里至关重要。您将功能Myrule一个适合您问题域的描述名称。

I would either perform the conversion as a two-step process (since there are two rules), or write a function that encapsulates your rules, and apply those. I’ll be using ‘dplyr’ mutate in the following since that seems to be what you’re using:

Here’s the two-step process:

df |>
    mutate(across(everything(), ~ replace(.x, .x == 5L, NA))) |>
    mutate(across(everything(), ~ .x >= 3L))

And here it is using a function:

myrule = function (x) {
    if_else(x == 5L, NA, x >= 3L)
}

df |> mutate(across(everything(), myrule))

Here it is crucial that you give the function myrule a descriptive name that fits your problem domain.

回复收藏 0 原文

苦行僧 2025-01-28 09:31:09

假设我们有此刺激的数据框：

set.seed(123)

df <- matrix(rep(sample(1:5, 5*13, replace = T)), ncol = 13) %>% as.data.frame()

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  2  1  5  5  5  4  5  3  1   1   1   2   4
2  4  3  3  2  4  3  5  4  3   2   3   1   1
3  3  5  4  1  4  3  4  2  3   3   4   4   2
4  5  3  3  2  5  2  4  2  2   2   1   5   2
5  5  3  1  5  4  1  1  2  1   5   3   2   5

基本R

我们可以首先设置df == 5 na，并使用逻辑表达式查看值是否更大或等于3（由@Danlooo提出的评论提出）。

+（df＆gt; = 3）语法用于将df＆gt; = 3的逻辑输出转换为整数。

df[df == 5] <- NA
df <- as.data.frame(+(df >= 3))

dplyr

或我们可以在中使用突变在组合中使用dplyr软件包中的组合。

library(dplyr)

df <- df %>% mutate(across(everything(), ~case_when(.x %in% 1:2 ~ 0, 
                                                    .x %in% 3:4 ~ 1, 
                                                    TRUE ~ NA_real_)))

输出

df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  0  0 NA NA NA  1 NA  1  0   0   0   0   1
2  1  1  1  0  1  1 NA  1  1   0   1   0   0
3  1 NA  1  0  1  1  1  0  1   1   1   1   0
4 NA  1  1  0 NA  0  1  0  0   0   0  NA   0
5 NA  1  0 NA  1  0  0  0  0  NA   1   0  NA

数据

这是dput（df），以便于更轻松的数据加载。

structure(list(V1 = c(2L, 4L, 3L, 5L, 5L), V2 = c(1L, 3L, 5L, 
3L, 3L), V3 = c(5L, 3L, 4L, 3L, 1L), V4 = c(5L, 2L, 1L, 2L, 5L
), V5 = c(5L, 4L, 4L, 5L, 4L), V6 = c(4L, 3L, 3L, 2L, 1L), V7 = c(5L, 
5L, 4L, 4L, 1L), V8 = c(3L, 4L, 2L, 2L, 2L), V9 = c(1L, 3L, 3L, 
2L, 1L), V10 = c(1L, 2L, 3L, 2L, 5L), V11 = c(1L, 3L, 4L, 1L, 
3L), V12 = c(2L, 1L, 4L, 5L, 2L), V13 = c(4L, 1L, 2L, 2L, 5L)), class = "data.frame", row.names = c(NA, 
-5L))

Let's say we have this stimulated dataframe:

set.seed(123)

df <- matrix(rep(sample(1:5, 5*13, replace = T)), ncol = 13) %>% as.data.frame()

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  2  1  5  5  5  4  5  3  1   1   1   2   4
2  4  3  3  2  4  3  5  4  3   2   3   1   1
3  3  5  4  1  4  3  4  2  3   3   4   4   2
4  5  3  3  2  5  2  4  2  2   2   1   5   2
5  5  3  1  5  4  1  1  2  1   5   3   2   5

Base R

We can first set df == 5 to NA, and use a logical expression to see if values are greater then or equal to 3 (proposed by @danlooo in the comment).

The +(df >= 3) syntax is used to convert logical output of df >= 3 to integer.

df[df == 5] <- NA
df <- as.data.frame(+(df >= 3))

dplyr

Or we can use the mutate with across combination in the dplyr package.

library(dplyr)

df <- df %>% mutate(across(everything(), ~case_when(.x %in% 1:2 ~ 0, 
                                                    .x %in% 3:4 ~ 1, 
                                                    TRUE ~ NA_real_)))

Output

df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1  0  0 NA NA NA  1 NA  1  0   0   0   0   1
2  1  1  1  0  1  1 NA  1  1   0   1   0   0
3  1 NA  1  0  1  1  1  0  1   1   1   1   0
4 NA  1  1  0 NA  0  1  0  0   0   0  NA   0
5 NA  1  0 NA  1  0  0  0  0  NA   1   0  NA

Data

Here's the dput(df) for easier data loading.

structure(list(V1 = c(2L, 4L, 3L, 5L, 5L), V2 = c(1L, 3L, 5L, 
3L, 3L), V3 = c(5L, 3L, 4L, 3L, 1L), V4 = c(5L, 2L, 1L, 2L, 5L
), V5 = c(5L, 4L, 4L, 5L, 4L), V6 = c(4L, 3L, 3L, 2L, 1L), V7 = c(5L, 
5L, 4L, 4L, 1L), V8 = c(3L, 4L, 2L, 2L, 2L), V9 = c(1L, 3L, 3L, 
2L, 1L), V10 = c(1L, 2L, 3L, 2L, 5L), V11 = c(1L, 3L, 4L, 1L, 
3L), V12 = c(2L, 1L, 4L, 5L, 2L), V13 = c(4L, 1L, 2L, 2L, 5L)), class = "data.frame", row.names = c(NA, 
-5L))

回复收藏 0 原文

~没有更多了~