未拆分列表,合并因素
我在 R 中有以下数据框:
c1 c2
1 10 a
2 20 a
3 30 b
4 40 b
然后按如下方式split
它:z = lapply(split(test$c1, test$c2), function(x) {cut(x, 2)})
。 z
则:
$a
[1] (9.99,15] (15,20]
Levels: (9.99,15] (15,20]
$b
[1] (30,35] (35,40]
Levels: (30,35] (35,40]
我想通过取消拆分列表 unsplit(z, test$c2)
将因子合并回来。这会生成一个警告:
[1] (9.99,15] (15,20] <NA> <NA>
Levels: (9.99,15] (15,20]
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 1:2) :
invalid factor level, NAs generated
我想对所有因子级别进行联合,然后进行拆分,以便不会发生此错误:
z$a = factor(z$a, levels=c(levels(z$a), levels(z$b)))
unsplit(z, test$c2)
[1] (9.99,15] (15,20] (30,35] (35,40]
Levels: (9.99,15] (15,20] (30,35] (35,40]
在我的真实数据框中,我有一个非常大的列表,因此我需要迭代所有列表元素(不只是两个)。最好的方法是什么?
I have the following data frame in R:
c1 c2
1 10 a
2 20 a
3 30 b
4 40 b
I then split
it as follows: z = lapply(split(test$c1, test$c2), function(x) {cut(x,2)})
. z
is then:
$a
[1] (9.99,15] (15,20]
Levels: (9.99,15] (15,20]
$b
[1] (30,35] (35,40]
Levels: (30,35] (35,40]
I would like to then merge the factors back by unsplitting the list unsplit(z, test$c2)
. This generates a warning:
[1] (9.99,15] (15,20] <NA> <NA>
Levels: (9.99,15] (15,20]
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 1:2) :
invalid factor level, NAs generated
I would like to take a union of all the factor levels and then unsplit so that this error does not happen:
z$a = factor(z$a, levels=c(levels(z$a), levels(z$b)))
unsplit(z, test$c2)
[1] (9.99,15] (15,20] (30,35] (35,40]
Levels: (9.99,15] (15,20] (30,35] (35,40]
In my real data frame I have a very big list so I need to iterate over all the list elements (not just two). What is the best way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我正确理解你的问题,我认为你让这个问题变得比需要的更复杂了。这是使用
plyr
的一种解决方案。我们将按c2
变量进行分组:该变量返回:
并具有以下结构:
If I understood your question properly, I think you are making this a bit more complicated than needed. Here's one solution using
plyr
. We will group by thec2
variable:which returns:
and has a structure of:
你能不能只用
unlist()
z
来代替?或者在结果因子上没有名称:
您可以将所有内容合并到一个简单的单行代码中,不需要附加包:
Can you not just
unlist()
z
instead?or without the names on the resulting factor:
You can merge everything together into a simple one-liner that needs no add-on packages: