如何连接因子而不将它们转换为整数级别?
我很惊讶地发现 R 在连接向量时会将因子强制转换为数字。即使级别相同,也会发生这种情况。例如:
> facs <- as.factor(c("i", "want", "to", "be", "a", "factor", "not", "an", "integer"))
> facs
[1] i want to be a factor not an integer
Levels: a an be factor i integer not to want
> c(facs[1 : 3], facs[4 : 5])
[1] 5 9 8 3 1
在 R 中执行此操作的惯用方法是什么(在我的例子中,这些向量可能非常大)?谢谢。
I was surprised to see that R will coerce factors into a number when concatenating vectors. This happens even when the levels are the same. For example:
> facs <- as.factor(c("i", "want", "to", "be", "a", "factor", "not", "an", "integer"))
> facs
[1] i want to be a factor not an integer
Levels: a an be factor i integer not to want
> c(facs[1 : 3], facs[4 : 5])
[1] 5 9 8 3 1
what is the idiomatic way to do this in R (in my case these vectors can be pretty large)? Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
来自 R 邮件列表:
至 'cbind ' 因素,做
From the R Mailing list:
To 'cbind' factors, do
另一种解决方法是将因子转换为字符向量,然后在完成连接时转换回来。
An alternate workaround is to convert the factor to be a character vector, then convert back when you are finshed concatenating.
使用
forcats
包中的fct_c
(tidyverse)。fct_c
不会被具有不同数字编码的因素串联所愚弄:Use
fct_c
from theforcats
package (part of the tidyverse).fct_c
isn't fooled by concatenations of factors with discrepant numerical codings:哇,我从来没有意识到它做到了这一点。这是一个解决方法:
对于输出:
只有当两个向量具有与此处相同的级别时,它才会起作用。
Wow, I never realized it did that. Here is a work-around:
With the output:
It will only work if the two vectors have the same levels as here.
这是一个非常糟糕的 R 陷阱。沿着这些思路,这是一个刚刚吞噬了我几个小时时间的事情。
在我看来,更好的解决办法是里奇的,它强制性格。
正如里奇提到的,只要你正确设置级别。
This is a really bad R gotcha. Along those lines, here's one that just swallowed several hours of my time.
It appears to me the better fix is Richie's, which coerces to character.
As long as you get the levels set properly, as Richie mentions.
根据使用转换为字符的其他答案,我使用以下函数来连接因素:
您可以像使用
c
一样使用此函数。Based on the other answers which use converting to character I'm using the following function to concatenate factors:
You can use this function just as you would use
c
.出于这个原因,我更喜欢使用 data.frames: 内的因素
,并使用subset() 或 dplyr::filter() 等而不是行索引对其进行子集化。因为在这种情况下我没有有意义的子集标准,所以我将只使用 head() 和 tail():
然后您可以很容易地操作它们,例如:
For this reason I prefer to work with factors inside data.frames:
and subset it using subset() or dplyr::filter() etc. rather than row indexes. Because I don't have meaningful subset criteria in this case, I will just use head() and tail():
Then you can manipulate them quite easily, e.g.:
快速说明一下,从 R 4.1.0 开始,这个问题已在基本 R 中直接解决。您现在可以直观地执行
Just a quick note to point out that as of of R 4.1.0, this is directly addressed in base R. You can now just intuitively do
当设置略有不同时,这是添加因子变量的另一种方法:
Here's another way to add to a factor variable when the setup is slightly different: