将 data.frame 列从因子转换为字符

发布于 2024-09-02 01:27:58 字数 1107 浏览 1 评论 0原文

我有一个数据框。我们称他为 bob

> head(bob)
                 phenotype                         exclusion
GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399352 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399353 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399354 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399355 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-

我想连接此数据框的行(这将是另一个问题)。但请注意:

> class(bob$phenotype)
[1] "factor"

Bob 的列是因子。因此,举例来说:

> as.character(head(bob))
[1] "c(3, 3, 3, 6, 6, 6)"       "c(3, 3, 3, 3, 3, 3)"      
[3] "c(29, 29, 29, 30, 30, 30)"

我不明白这一点,但我猜这些是鲍勃(卡拉克塔库斯国王的法庭)列的因子水平的索引?不是我需要的。

奇怪的是,我可以手动浏览 bob 的列,并且

bob$phenotype <- as.character(bob$phenotype)

效果很好。而且,经过一些输入后,我可以获得一个 data.frame,其列是字符而不是因子。所以我的问题是:我怎样才能自动做到这一点?如何将包含因子列的 data.frame 转换为包含字符列的 data.frame,而无需手动遍历每一列?

额外问题:为什么手动方法有效?

I have a data frame. Let's call him bob:

> head(bob)
                 phenotype                         exclusion
GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399352 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399353 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399354 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399355 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-

I'd like to concatenate the rows of this data frame (this will be another question). But look:

> class(bob$phenotype)
[1] "factor"

Bob's columns are factors. So, for example:

> as.character(head(bob))
[1] "c(3, 3, 3, 6, 6, 6)"       "c(3, 3, 3, 3, 3, 3)"      
[3] "c(29, 29, 29, 30, 30, 30)"

I don't begin to understand this, but I guess these are indices into the levels of the factors of the columns (of the court of king caractacus) of bob? Not what I need.

Strangely I can go through the columns of bob by hand, and do

bob$phenotype <- as.character(bob$phenotype)

which works fine. And, after some typing, I can get a data.frame whose columns are characters rather than factors. So my question is: how can I do this automatically? How do I convert a data.frame with factor columns into a data.frame with character columns without having to manually go through each column?

Bonus question: why does the manual approach work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(18

江南月 2024-09-09 01:27:58

继续关注马特和德克。如果您想在不更改全局选项的情况下重新创建现有数据框,可以使用 apply 语句重新创建它:

bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)

这会将所有变量转换为“字符”类,如果您只想转换因子,请参阅 下面是 Marek 的解决方案

正如@hadley 指出的,以下内容更加简洁。

bob[] <- lapply(bob, as.character)

在这两种情况下,lapply 都会输出一个列表;然而,由于 R 的神奇特性,在第二种情况下使用 [] 保留了 bob 对象的 data.frame 类,从而无需转换使用 as.data.frame 和参数 stringsAsFactors = FALSE 返回到 data.frame。

Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:

bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)

This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.

As @hadley points out, the following is more concise.

bob[] <- lapply(bob, as.character)

In both cases, lapply outputs a list; however, owing to the magical properties of R, the use of [] in the second case keeps the data.frame class of the bob object, thereby eliminating the need to convert back to a data.frame using as.data.frame with the argument stringsAsFactors = FALSE.

离笑几人歌 2024-09-09 01:27:58

仅替换因子:

i <- sapply(bob, is.factor)
bob[i] <- lapply(bob[i], as.character)

在包 dplyr 版本 0.5.0 中的新函数引入了 mutate_if

library(dplyr)
bob %>% mutate_if(is.factor, as.character) -> bob

...以及 在1.0.0版本中被across取代:

library(dplyr)
bob %>% mutate(across(where(is.factor), as.character)) -> bob

RStudio 的 Package purrr 提供了另一种选择:

library(purrr)
bob %>% modify_if(is.factor, as.character) -> bob

To replace only factors:

i <- sapply(bob, is.factor)
bob[i] <- lapply(bob[i], as.character)

In package dplyr in version 0.5.0 new function mutate_if was introduced:

library(dplyr)
bob %>% mutate_if(is.factor, as.character) -> bob

...and in version 1.0.0 was replaced by across:

library(dplyr)
bob %>% mutate(across(where(is.factor), as.character)) -> bob

Package purrr from RStudio gives another alternative:

library(purrr)
bob %>% modify_if(is.factor, as.character) -> bob
南城旧梦 2024-09-09 01:27:58

全局选项

字符串作为因素:
data.frame 和 read.table 参数的默认设置。

可能是您想要在启动文件(例如 ~/.Rprofile)中设置为 FALSE 的内容。请参阅帮助(选项)

The global option

stringsAsFactors:
The default setting for arguments of data.frame and read.table.

may be something you want to set to FALSE in your startup files (e.g. ~/.Rprofile). Please see help(options).

故人的歌 2024-09-09 01:27:58

如果您了解因素的存储方式,则可以避免使用基于应用的函数来实现此目的。这并不意味着所应用的解决方案效果不佳。

因素的结构为与“级别”列表相关的数字索引。如果将因子转换为数字,就可以看到这一点。所以:

> fact <- as.factor(c("a","b","a","d")
> fact
[1] a b a d
Levels: a b d

> as.numeric(fact)
[1] 1 2 1 3

最后一行返回的数字对应于因子的水平。

> levels(fact)
[1] "a" "b" "d"

请注意,levels() 返回一个字符数组。您可以利用这一事实轻松而紧凑地将因子转换为字符串或数字,如下所示:

> fact_character <- levels(fact)[as.numeric(fact)]
> fact_character
[1] "a" "b" "a" "d"

如果您将表达式包装在 as.numeric() 中,这也适用于数值。

> num_fact <- factor(c(1,2,3,6,5,4))
> num_fact
[1] 1 2 3 6 5 4
Levels: 1 2 3 4 5 6
> num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
> num_num
[1] 1 2 3 6 5 4

If you understand how factors are stored, you can avoid using apply-based functions to accomplish this. Which isn't at all to imply that the apply solutions don't work well.

Factors are structured as numeric indices tied to a list of 'levels'. This can be seen if you convert a factor to numeric. So:

> fact <- as.factor(c("a","b","a","d")
> fact
[1] a b a d
Levels: a b d

> as.numeric(fact)
[1] 1 2 1 3

The numbers returned in the last line correspond to the levels of the factor.

> levels(fact)
[1] "a" "b" "d"

Notice that levels() returns an array of characters. You can use this fact to easily and compactly convert factors to strings or numerics like this:

> fact_character <- levels(fact)[as.numeric(fact)]
> fact_character
[1] "a" "b" "a" "d"

This also works for numeric values, provided you wrap your expression in as.numeric().

> num_fact <- factor(c(1,2,3,6,5,4))
> num_fact
[1] 1 2 3 6 5 4
Levels: 1 2 3 4 5 6
> num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
> num_num
[1] 1 2 3 6 5 4
童话里做英雄 2024-09-09 01:27:58

如果您想要一个新的数据框 bobc,其中 bobf 中的每个因子向量都转换为字符向量,请尝试以下操作:

bobc <- rapply(bobf, as.character, classes="factor", how="replace")

如果您想将其转换回来,您可以创建一个逻辑向量,其中列是因子,并使用它来有选择地应用因子

f <- sapply(bobf, class) == "factor"
bobc[,f] <- lapply(bobc[,f], factor)

If you want a new data frame bobc where every factor vector in bobf is converted to a character vector, try this:

bobc <- rapply(bobf, as.character, classes="factor", how="replace")

If you then want to convert it back, you can create a logical vector of which columns are factors, and use that to selectively apply factor

f <- sapply(bobf, class) == "factor"
bobc[,f] <- lapply(bobc[,f], factor)
メ斷腸人バ 2024-09-09 01:27:58

我通常将此功能从我的所有项目中分离出来。快速又简单。

unfactorize <- function(df){
  for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
  return(df)
}

I typically make this function apart of all my projects. Quick and easy.

unfactorize <- function(df){
  for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
  return(df)
}
桜花祭 2024-09-09 01:27:58

另一种方法是使用 apply 进行转换

bob2 <- apply(bob,2,as.character)

,还有一个更好的方法(前一个是“矩阵”类)

bob2 <- as.data.frame(as.matrix(bob),stringsAsFactors=F)

Another way is to convert it using apply

bob2 <- apply(bob,2,as.character)

And a better one (the previous is of class 'matrix')

bob2 <- as.data.frame(as.matrix(bob),stringsAsFactors=F)
作死小能手 2024-09-09 01:27:58

更新:这是一个不起作用的示例。我认为可以,但我认为 stringsAsFactors 选项仅适用于字符串 - 它只保留因素。

试试这个:

bob2 <- data.frame(bob, stringsAsFactors = FALSE)

一般来说,每当您遇到应该是字符的因素问题时,都会有一个 < code>stringsAsFactors 在某处设置可以帮助您(包括全局设置)。

Update: Here's an example of something that doesn't work. I thought it would, but I think that the stringsAsFactors option only works on character strings - it leaves the factors alone.

Try this:

bob2 <- data.frame(bob, stringsAsFactors = FALSE)

Generally speaking, whenever you're having problems with factors that should be characters, there's a stringsAsFactors setting somewhere to help you (including a global setting).

谷夏 2024-09-09 01:27:58

或者您可以尝试转换

newbob <- transform(bob, phenotype = as.character(phenotype))

只需确保将您想要转换为字符的每个因素都放入即可。

或者你可以做这样的事情,一击杀死所有害虫:

newbob_char <- as.data.frame(lapply(bob[sapply(bob, is.factor)], as.character), stringsAsFactors = FALSE)
newbob_rest <- bob[!(sapply(bob, is.factor))]
newbob <- cbind(newbob_char, newbob_rest)

将数据推送到这样的代码中并不是一个好主意,我可以这样做sapply 部分分开(实际上,这样做更容易),但你明白了......我还没有检查代码,因为我不在家,所以我希望它能起作用! =)

然而,这种方法有一个缺点......之后您必须重新组织列,而使用 transform 您可以做任何您喜欢的事情,但代价是“pedestrian-style-code-”写“...

所以那里... =)

Or you can try transform:

newbob <- transform(bob, phenotype = as.character(phenotype))

Just be sure to put every factor you'd like to convert to character.

Or you can do something like this and kill all the pests with one blow:

newbob_char <- as.data.frame(lapply(bob[sapply(bob, is.factor)], as.character), stringsAsFactors = FALSE)
newbob_rest <- bob[!(sapply(bob, is.factor))]
newbob <- cbind(newbob_char, newbob_rest)

It's not good idea to shove the data in code like this, I could do the sapply part separately (actually, it's much easier to do it like that), but you get the point... I haven't checked the code, 'cause I'm not at home, so I hope it works! =)

This approach, however, has a downside... you must reorganize columns afterwards, while with transform you can do whatever you like, but at cost of "pedestrian-style-code-writting"...

So there... =)

蓝梦月影 2024-09-09 01:27:58

在数据框的开头包含 stringsAsFactors = FALSE 以忽略所有误解。

At the beginning of your data frame include stringsAsFactors = FALSE to ignore all misunderstandings.

记忆で 2024-09-09 01:27:58

如果您使用 data.table 包对 data.frame 进行操作,那么问题就不存在。

library(data.table)
dt = data.table(col1 = c("a","b","c"), col2 = 1:3)
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 

如果数据集中已有因子列并且想要将它们转换为字符,您可以执行以下操作。

library(data.table)
dt = data.table(col1 = factor(c("a","b","c")), col2 = 1:3)
sapply(dt, class)
#     col1      col2 
# "factor" "integer" 
upd.cols = sapply(dt, is.factor)
dt[, names(dt)[upd.cols] := lapply(.SD, as.character), .SDcols = upd.cols]
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 

If you would use data.table package for the operations on data.frame then the problem is not present.

library(data.table)
dt = data.table(col1 = c("a","b","c"), col2 = 1:3)
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 

If you have a factor columns in you dataset already and you want to convert them to character you can do the following.

library(data.table)
dt = data.table(col1 = factor(c("a","b","c")), col2 = 1:3)
sapply(dt, class)
#     col1      col2 
# "factor" "integer" 
upd.cols = sapply(dt, is.factor)
dt[, names(dt)[upd.cols] := lapply(.SD, as.character), .SDcols = upd.cols]
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 
夏末染殇 2024-09-09 01:27:58

这对我有用 - 我终于找到了一个衬垫

df <- as.data.frame(lapply(df,function (y) if(class(y)=="factor" ) as.character(y) else y),stringsAsFactors=F)

This works for me - I finally figured a one liner

df <- as.data.frame(lapply(df,function (y) if(class(y)=="factor" ) as.character(y) else y),stringsAsFactors=F)
久光 2024-09-09 01:27:58

dplyr 版本 1.0.0 中引入了新函数“across”。新函数将取代作用域变量(_if、_at、_all)。这是官方文档

library(dplyr)
bob <- bob %>% 
       mutate(across(where(is.factor), as.character))

New function "across" was introduced in dplyr version 1.0.0. The new function will supersede scoped variables (_if, _at, _all). Here's the official documentation

library(dplyr)
bob <- bob %>% 
       mutate(across(where(is.factor), as.character))
短暂陪伴 2024-09-09 01:27:58

您应该在 hablar 中使用 convert ,它提供与 tidyverse 管道兼容的可读语法:

library(dplyr)
library(hablar)

df <- tibble(a = factor(c(1, 2, 3, 4)),
             b = factor(c(5, 6, 7, 8)))

df %>% convert(chr(a:b))

它为您提供:

  a     b    
  <chr> <chr>
1 1     5    
2 2     6    
3 3     7    
4 4     8   

You should use convert in hablar which gives readable syntax compatible with tidyverse pipes:

library(dplyr)
library(hablar)

df <- tibble(a = factor(c(1, 2, 3, 4)),
             b = factor(c(5, 6, 7, 8)))

df %>% convert(chr(a:b))

which gives you:

  a     b    
  <chr> <chr>
1 1     5    
2 2     6    
3 3     7    
4 4     8   
无风消散 2024-09-09 01:27:58

可以使用已加载的 dplyr 包。

bob=bob%>%mutate_at("phenotype", as.character)

如果您只想专门更改 phenotype 列,则

With the dplyr-package loaded use

bob=bob%>%mutate_at("phenotype", as.character)

if you only want to change the phenotype-column specifically.

日暮斜阳 2024-09-09 01:27:58

这个函数可以解决问题

df <- stacomirtools::killfactor(df)

This function does the trick

df <- stacomirtools::killfactor(df)
半窗疏影 2024-09-09 01:27:58

也许是一个更新的选择?

library("tidyverse")

bob <- bob %>% group_by_if(is.factor, as.character)

Maybe a newer option?

library("tidyverse")

bob <- bob %>% group_by_if(is.factor, as.character)
留一抹残留的笑 2024-09-09 01:27:58

这可以将所有内容转换为字符,然后将数字转换为数字:

makenumcols<-function(df){
  df<-as.data.frame(df)
  df[] <- lapply(df, as.character)
  cond <- apply(df, 2, function(x) {
    x <- x[!is.na(x)]
    all(suppressWarnings(!is.na(as.numeric(x))))
  })
  numeric_cols <- names(df)[cond]
  df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
  return(df)
}

改编自: 获取列自动生成 Excel 工作表类型

This works transforming all to character and then the numeric to numeric:

makenumcols<-function(df){
  df<-as.data.frame(df)
  df[] <- lapply(df, as.character)
  cond <- apply(df, 2, function(x) {
    x <- x[!is.na(x)]
    all(suppressWarnings(!is.na(as.numeric(x))))
  })
  numeric_cols <- names(df)[cond]
  df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
  return(df)
}

Adapted from: Get column types of excel sheet automatically

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文