当前位置：文江博客话题详情

如何折叠类别或重新分类变量？

发布于 2024-09-10 09:02:13 字数 213 浏览 0 评论 0原文

在 R 中，我有 600,000 个分类变量，每个变量都被分类为“0”、“1”或“2”。

我想做的是折叠“1”和“2”并保留“0”本身，这样在重新分类“0”=“0”之后； “1”=“1”，“2”=“1”。最后我只想要“0”和“1”作为每个变量的类别。

另外，如果可能的话，我宁愿不创建 600,000 个新变量，如果我能用新值替换现有变量那就太好了！

最好的方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

赠我空喜 2024-09-17 09:02:13

我发现使用 factor(new.levels[x]) 更加通用：

> x <- factor(sample(c("0","1","2"), 10, replace=TRUE)) 
> x
 [1] 0 2 2 2 1 2 2 0 2 1
Levels: 0 1 2
> new.levels<-c(0,1,1)
> x <- factor(new.levels[x])
> x
 [1] 0 1 1 1 1 1 1 0 1 1
Levels: 0 1

新级别向量的长度必须与 x 中的级别数相同，因此您也可以进行更复杂的重新编码例如使用字符串和 NA

x <- factor(c("old", "new", NA)[x])
> x
 [1] old    <NA>   <NA>   <NA>   new <NA>   <NA>   old   
 [9] <NA>   new    
Levels: new old

I find this is even more generic using factor(new.levels[x]):

> x <- factor(sample(c("0","1","2"), 10, replace=TRUE)) 
> x
 [1] 0 2 2 2 1 2 2 0 2 1
Levels: 0 1 2
> new.levels<-c(0,1,1)
> x <- factor(new.levels[x])
> x
 [1] 0 1 1 1 1 1 1 0 1 1
Levels: 0 1

The new levels vector must the same length as the number of levels in x, so you can do more complicated recodes as well using strings and NAs for example

x <- factor(c("old", "new", NA)[x])
> x
 [1] old    <NA>   <NA>   <NA>   new <NA>   <NA>   old   
 [9] <NA>   new    
Levels: new old

回复收藏 0 原文

风渺 2024-09-17 09:02:13

recode() 对此有点过分了。您的情况取决于当前的编码方式。假设你的变量是 x。

如果它是数字，

x <- ifelse(x>1, 1, x)

如果它是字符，

x <- ifelse(x=='2', '1', x)

如果它是级别为 0,1,2 的因子，则

levels(x) <- c(0,1,1)

这些中的任何一个都可以跨数据框 dta 应用到变量 x 。例如...

 dta$x <- ifelse(dta$x > 1, 1, dta$x)

或者，一个框架的多个列

 df[,c('col1','col2'] <- sapply(df[,c('col1','col2'], FUN = function(x) ifelse(x==0, x, 1))

recode()'s a little overkill for this. Your case depends on how it's currently coded. Let's say your variable is x.

If it's numeric

x <- ifelse(x>1, 1, x)

if it's character

x <- ifelse(x=='2', '1', x)

if it's factor with levels 0,1,2

levels(x) <- c(0,1,1)

Any of those can be applied across a data frame dta to the variable x in place. For example...

 dta$x <- ifelse(dta$x > 1, 1, dta$x)

Or, multiple columns of a frame

 df[,c('col1','col2'] <- sapply(df[,c('col1','col2'], FUN = function(x) ifelse(x==0, x, 1))

回复收藏 0 原文

榆西 2024-09-17 09:02:13

car 包（应用回归的伴侣）中有一个函数 recode：

require("car")    
recode(x, "c('1','2')='1'; else='0'")

或者对于您在普通 R 中的情况：

> x <- factor(sample(c("0","1","2"), 10, replace=TRUE))
> x
 [1] 1 1 1 0 1 0 2 0 1 0
Levels: 0 1 2
> factor(pmin(as.numeric(x), 2), labels=c("0","1"))
 [1] 1 1 1 0 1 0 1 0 1 0
Levels: 0 1

更新： 重新编码所有分类数据框 tmp 的列您可以使用以下内容

recode_fun <- function(x) factor(pmin(as.numeric(x), 2), labels=c("0","1"))
require("plyr")
catcolwise(recode_fun)(tmp)

There is a function recode in package car (Companion to Applied Regression):

require("car")    
recode(x, "c('1','2')='1'; else='0'")

or for your case in plain R:

> x <- factor(sample(c("0","1","2"), 10, replace=TRUE))
> x
 [1] 1 1 1 0 1 0 2 0 1 0
Levels: 0 1 2
> factor(pmin(as.numeric(x), 2), labels=c("0","1"))
 [1] 1 1 1 0 1 0 1 0 1 0
Levels: 0 1

Update: To recode all categorical columns of a data frame tmp you can use the following

recode_fun <- function(x) factor(pmin(as.numeric(x), 2), labels=c("0","1"))
require("plyr")
catcolwise(recode_fun)(tmp)

回复收藏 0 原文

好菇凉咱不稀罕他 2024-09-17 09:02:13

我喜欢 dplyr 中可以快速重新编码值的函数。

 library(dplyr)
 df$x <- recode(df$x, old = "new")

希望这有帮助:)

I liked the function in dplyr that can quickly recode values.

 library(dplyr)
 df$x <- recode(df$x, old = "new")

Hope this helps :)

回复收藏 0 原文

千紇 2024-09-17 09:02:13

请注意，如果您只想结果为 0-1 二元变量，则可以完全放弃因子：

f <- sapply(your.data.frame, is.factor)
your.data.frame[f] <- lapply(your.data.frame[f], function(x) x != "0")

第二行也可以写得更简洁（但可能更神秘），因为

your.data.frame[f] <- lapply(your.data.frame[f], `!=`, "0")

这会将您的因子转换为一系列逻辑变量， “0”映射到FALSE，其他任何值映射到TRUE。大多数代码将 FALSE 和 TRUE 视为 0 和 1，这反过来应该在分析中给出与使用级别为“0”和“0”的因子基本相同的结果。 “1”。事实上，如果它没有给出相同的结果，就会让人怀疑分析的正确性......

Note that if you just want the results to be 0-1 binary variables, you can forego factors altogether:

f <- sapply(your.data.frame, is.factor)
your.data.frame[f] <- lapply(your.data.frame[f], function(x) x != "0")

The second line can also be written more succinctly (but possibly more cryptically) as

your.data.frame[f] <- lapply(your.data.frame[f], `!=`, "0")

This turns your factors into a series of logical variables, with "0" mapping to FALSE and anything else mapping to TRUE. FALSE and TRUE will be treated as 0 and 1 by most code, which in turn should give essentially the same result in an analysis as using a factor with levels "0" and "1". In fact, if it doesn't give the same result, that would cast doubt on the correctness of the analysis....

回复收藏 0 原文

聽兲甴掵 2024-09-17 09:02:13

您可以使用 sjmisc 的 rec 函数包，它可以一次重新编码完整的数据帧（假定所有变量至少具有相同的重新编码值）。

library(sjmisc)
mydf <- data.frame(a = sample(0:2, 10, T),
                   b = sample(0:2, 10, T),
                   c = sample(0:2, 10, T))

> mydf
   a b c
1  1 1 0
2  1 0 1
3  0 2 0
4  0 1 0
5  1 0 0
6  2 1 1
7  0 1 1
8  2 1 2
9  1 1 2
10 2 0 1

mydf <- rec(mydf, "0=0; 1,2=1")

   a b c
1  1 1 0
2  1 0 1
3  0 1 0
4  0 1 0
5  1 0 0
6  1 1 1
7  0 1 1
8  1 1 1
9  1 1 1
10 1 0 1

You could use the rec function of the sjmisc package, which can recode a complete data frame at once (given, that all variables have at least the same recode-values).

library(sjmisc)
mydf <- data.frame(a = sample(0:2, 10, T),
                   b = sample(0:2, 10, T),
                   c = sample(0:2, 10, T))

> mydf
   a b c
1  1 1 0
2  1 0 1
3  0 2 0
4  0 1 0
5  1 0 0
6  2 1 1
7  0 1 1
8  2 1 2
9  1 1 2
10 2 0 1

mydf <- rec(mydf, "0=0; 1,2=1")

   a b c
1  1 1 0
2  1 0 1
3  0 1 0
4  0 1 0
5  1 0 0
6  1 1 1
7  0 1 1
8  1 1 1
9  1 1 1
10 1 0 1

回复收藏 0 原文

影子的影子 2024-09-17 09:02:13

来自 tidyverse 的 forcats 包的解决方案

library(forcats)

> x <- factor(sample(c("0","1","2"), 10, replace=TRUE))
> x
[1] 1 1 1 0 1 0 2 0 1 0
Levels: 0 1 2
    
> fct_collapse(x, "1" = c("1", "2"))
[1] 1 1 1 0 1 0 1 0 1 0
Levels: 0 1

A solution with forcats package from tidyverse

library(forcats)

> x <- factor(sample(c("0","1","2"), 10, replace=TRUE))
> x
[1] 1 1 1 0 1 0 2 0 1 0
Levels: 0 1 2
    
> fct_collapse(x, "1" = c("1", "2"))
[1] 1 1 1 0 1 0 1 0 1 0
Levels: 0 1

回复收藏 0 原文

~没有更多了~