如何连接因子而不将它们转换为整数级别?

发布于 2024-09-13 18:17:52 字数 400 浏览 6 评论 0原文

我很惊讶地发现 R 在连接向量时会将因子强制转换为数字。即使级别相同,也会发生这种情况。例如:

> facs <- as.factor(c("i", "want", "to", "be", "a", "factor", "not", "an", "integer"))
> facs
[1] i       want    to      be      a       factor  not     an      integer
Levels: a an be factor i integer not to want
> c(facs[1 : 3], facs[4 : 5])
[1] 5 9 8 3 1

在 R 中执行此操作的惯用方法是什么(在我的例子中,这些向量可能非常大)?谢谢。

I was surprised to see that R will coerce factors into a number when concatenating vectors. This happens even when the levels are the same. For example:

> facs <- as.factor(c("i", "want", "to", "be", "a", "factor", "not", "an", "integer"))
> facs
[1] i       want    to      be      a       factor  not     an      integer
Levels: a an be factor i integer not to want
> c(facs[1 : 3], facs[4 : 5])
[1] 5 9 8 3 1

what is the idiomatic way to do this in R (in my case these vectors can be pretty large)? Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

不羁少年 2024-09-20 18:17:53

来自 R 邮件列表

unlist(list(facs[1 : 3], facs[4 : 5]))

至 'cbind ' 因素,做

data.frame(facs[1 : 3], facs[4 : 5])

From the R Mailing list:

unlist(list(facs[1 : 3], facs[4 : 5]))

To 'cbind' factors, do

data.frame(facs[1 : 3], facs[4 : 5])
浅沫记忆 2024-09-20 18:17:53

另一种解决方法是将因子转换为字符向量,然后在完成连接时转换回来。

cfacs <- as.character(facs)
x <- c(cfacs[1:3], cfacs[4:5]) 

# Now choose between
factor(x)
# and
factor(x, levels = levels(facs))

An alternate workaround is to convert the factor to be a character vector, then convert back when you are finshed concatenating.

cfacs <- as.character(facs)
x <- c(cfacs[1:3], cfacs[4:5]) 

# Now choose between
factor(x)
# and
factor(x, levels = levels(facs))
木槿暧夏七纪年 2024-09-20 18:17:53

使用 forcats 包中的 fct_ctidyverse)。

> library(forcats)
> facs <- as.factor(c("i", "want", "to", "be", "a", "factor", "not", "an", "integer"))
> fct_c(facs[1:3], facs[4:5])
[1] i    want to   be   a
Levels: a an be factor i integer not to want

fct_c 不会被具有不同数字编码的因素串联所愚弄:

> x <- as.factor(c('c', 'z'))
> x
[1] c z
Levels: c z
> y <- as.factor(c('a', 'b', 'z'))
> y
[1] a b z
Levels: a b z
> c(x, y)
[1] 1 2 1 2 3
> fct_c(x, y)
[1] c z a b z
Levels: c z a b
> as.numeric(fct_c(x, y))
[1] 1 2 3 4 2

Use fct_c from the forcats package (part of the tidyverse).

> library(forcats)
> facs <- as.factor(c("i", "want", "to", "be", "a", "factor", "not", "an", "integer"))
> fct_c(facs[1:3], facs[4:5])
[1] i    want to   be   a
Levels: a an be factor i integer not to want

fct_c isn't fooled by concatenations of factors with discrepant numerical codings:

> x <- as.factor(c('c', 'z'))
> x
[1] c z
Levels: c z
> y <- as.factor(c('a', 'b', 'z'))
> y
[1] a b z
Levels: a b z
> c(x, y)
[1] 1 2 1 2 3
> fct_c(x, y)
[1] c z a b z
Levels: c z a b
> as.numeric(fct_c(x, y))
[1] 1 2 3 4 2
后知后觉 2024-09-20 18:17:53

哇,我从来没有意识到它做到了这一点。这是一个解决方法:

x <- c(facs[1 : 3], facs[4 : 5]) 
x <- factor(x, levels=1:nlevels(facs), labels=levels(facs))
x

对于输出:

[1] i    want to   be   a   
Levels: a an be factor i integer not to want

只有当两个向量具有与此处相同的级别时,它才会起作用。

Wow, I never realized it did that. Here is a work-around:

x <- c(facs[1 : 3], facs[4 : 5]) 
x <- factor(x, levels=1:nlevels(facs), labels=levels(facs))
x

With the output:

[1] i    want to   be   a   
Levels: a an be factor i integer not to want

It will only work if the two vectors have the same levels as here.

心在旅行 2024-09-20 18:17:53

这是一个非常糟糕的 R 陷阱。沿着这些思路,这是一个刚刚吞噬了我几个小时时间的事情。

x <- factor(c("Yes","Yes","No", "No", "Yes", "No"))
y <- c("Yes", x)

> y
[1] "Yes" "2"   "2"   "1"   "1"   "2"   "1"  
> is.factor(y)
[1] FALSE

在我看来,更好的解决办法是里奇的,它强制性格。

> y <- c("Yes", as.character(x))
> y
[1] "Yes" "Yes" "Yes" "No"  "No"  "Yes" "No" 
> y <- as.factor(y)
> y
[1] Yes Yes Yes No  No  Yes No 
Levels: No Yes

正如里奇提到的,只要你正确设置级别。

This is a really bad R gotcha. Along those lines, here's one that just swallowed several hours of my time.

x <- factor(c("Yes","Yes","No", "No", "Yes", "No"))
y <- c("Yes", x)

> y
[1] "Yes" "2"   "2"   "1"   "1"   "2"   "1"  
> is.factor(y)
[1] FALSE

It appears to me the better fix is Richie's, which coerces to character.

> y <- c("Yes", as.character(x))
> y
[1] "Yes" "Yes" "Yes" "No"  "No"  "Yes" "No" 
> y <- as.factor(y)
> y
[1] Yes Yes Yes No  No  Yes No 
Levels: No Yes

As long as you get the levels set properly, as Richie mentions.

过去的过去 2024-09-20 18:17:53

根据使用转换为字符的其他答案,我使用以下函数来连接因素:

concat.factor <- function(...){
  as.factor(do.call(c, lapply(list(...), as.character)))
}

您可以像使用 c 一样使用此函数。

Based on the other answers which use converting to character I'm using the following function to concatenate factors:

concat.factor <- function(...){
  as.factor(do.call(c, lapply(list(...), as.character)))
}

You can use this function just as you would use c.

和我恋爱吧 2024-09-20 18:17:53

出于这个原因,我更喜欢使用 data.frames: 内的因素

df <- data.frame(facs = as.factor(
      c("i", "want", "to", "be", "a", "factor", "not", "an", "integer") ))

,并使用subset() 或 dplyr::filter() 等而不是行索引对其进行子集化。因为在这种情况下我没有有意义的子集标准,所以我将只使用 head() 和 tail():

df1 <- head(df, 4)
df2 <- tail(df, 2)

然后您可以很容易地操作它们,例如:

dfc <- rbind(df1, df2)
dfc$facs
#[1] i       want    to      be      an      integer
#Levels: a an be factor i integer not to want

For this reason I prefer to work with factors inside data.frames:

df <- data.frame(facs = as.factor(
      c("i", "want", "to", "be", "a", "factor", "not", "an", "integer") ))

and subset it using subset() or dplyr::filter() etc. rather than row indexes. Because I don't have meaningful subset criteria in this case, I will just use head() and tail():

df1 <- head(df, 4)
df2 <- tail(df, 2)

Then you can manipulate them quite easily, e.g.:

dfc <- rbind(df1, df2)
dfc$facs
#[1] i       want    to      be      an      integer
#Levels: a an be factor i integer not to want
千と千尋 2024-09-20 18:17:53

快速说明一下,从 R 4.1.0 开始,这个问题已在基本 R 中直接解决。您现在可以直观地执行

c(facs[1 : 3], facs[4 : 5])

Just a quick note to point out that as of of R 4.1.0, this is directly addressed in base R. You can now just intuitively do

c(facs[1 : 3], facs[4 : 5])

当设置略有不同时,这是添加因子变量的另一种方法:

facs <- factor(1:3, levels=1:9,
               labels=c("i", "want", "to", "be", "a", "factor", "not", "an", "integer"))
facs
# [1] i       want    to      be      a       factor  not     an      integer
# Levels: a an be factor i integer not to want
facs[4:6] <- levels(facs)[4:6]
facs
# [1] i      want   to     be     a      factor
# Levels: i want to be a factor not an integer

Here's another way to add to a factor variable when the setup is slightly different:

facs <- factor(1:3, levels=1:9,
               labels=c("i", "want", "to", "be", "a", "factor", "not", "an", "integer"))
facs
# [1] i       want    to      be      a       factor  not     an      integer
# Levels: a an be factor i integer not to want
facs[4:6] <- levels(facs)[4:6]
facs
# [1] i      want   to     be     a      factor
# Levels: i want to be a factor not an integer
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文