未拆分列表,合并因素

发布于 2024-11-04 07:34:08 字数 925 浏览 0 评论 0原文

我在 R 中有以下数据框:

  c1 c2  
1 10  a  
2 20  a  
3 30  b  
4 40  b

然后按如下方式split它:z = lapply(split(test$c1, test$c2), function(x) {cut(x, 2)})z 则:

$a  
[1] (9.99,15] (15,20]  
Levels: (9.99,15] (15,20]

$b  
[1] (30,35] (35,40]
Levels: (30,35] (35,40]  

我想通过取消拆分列表 unsplit(z, test$c2) 将因子合并回来。这会生成一个警告:

[1] (9.99,15] (15,20]   <NA>      <NA>     
Levels: (9.99,15] (15,20]
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 1:2) :
  invalid factor level, NAs generated

我想对所有因子级别进行联合,然后进行拆分,以便不会发生此错误:

z$a = factor(z$a, levels=c(levels(z$a), levels(z$b)))
unsplit(z, test$c2)
[1] (9.99,15] (15,20]   (30,35]   (35,40]  
Levels: (9.99,15] (15,20] (30,35] (35,40]    

在我的真实数据框中,我有一个非常大的列表,因此我需要迭代所有列表元素(不只是两个)。最好的方法是什么?

I have the following data frame in R:

  c1 c2  
1 10  a  
2 20  a  
3 30  b  
4 40  b

I then split it as follows: z = lapply(split(test$c1, test$c2), function(x) {cut(x,2)})
. z is then:

$a  
[1] (9.99,15] (15,20]  
Levels: (9.99,15] (15,20]

$b  
[1] (30,35] (35,40]
Levels: (30,35] (35,40]  

I would like to then merge the factors back by unsplitting the list unsplit(z, test$c2). This generates a warning:

[1] (9.99,15] (15,20]   <NA>      <NA>     
Levels: (9.99,15] (15,20]
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 1:2) :
  invalid factor level, NAs generated

I would like to take a union of all the factor levels and then unsplit so that this error does not happen:

z$a = factor(z$a, levels=c(levels(z$a), levels(z$b)))
unsplit(z, test$c2)
[1] (9.99,15] (15,20]   (30,35]   (35,40]  
Levels: (9.99,15] (15,20] (30,35] (35,40]    

In my real data frame I have a very big list so I need to iterate over all the list elements (not just two). What is the best way to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

仙女山的月亮 2024-11-11 07:34:08

如果我正确理解你的问题,我认为你让这个问题变得比需要的更复杂了。这是使用 plyr 的一种解决方案。我们将按 c2 变量进行分组:

require(plyr)
ddply(test, "c2", transform, newvar = cut(c1, 2))

该变量返回:

  c1 c2    newvar
1 10  a (9.99,15]
2 20  a   (15,20]
3 30  b   (30,35]
4 40  b   (35,40]

并具有以下结构:

'data.frame':   4 obs. of  3 variables:
 $ c1    : num  10 20 30 40
 $ c2    : Factor w/ 2 levels "a","b": 1 1 2 2
 $ newvar: Factor w/ 4 levels "(9.99,15]","(15,20]",..: 1 2 3 4

If I understood your question properly, I think you are making this a bit more complicated than needed. Here's one solution using plyr. We will group by the c2 variable:

require(plyr)
ddply(test, "c2", transform, newvar = cut(c1, 2))

which returns:

  c1 c2    newvar
1 10  a (9.99,15]
2 20  a   (15,20]
3 30  b   (30,35]
4 40  b   (35,40]

and has a structure of:

'data.frame':   4 obs. of  3 variables:
 $ c1    : num  10 20 30 40
 $ c2    : Factor w/ 2 levels "a","b": 1 1 2 2
 $ newvar: Factor w/ 4 levels "(9.99,15]","(15,20]",..: 1 2 3 4
べ繥欢鉨o。 2024-11-11 07:34:08

你能不能只用 unlist() z 来代替?

> unlist(z)
       a1        a2        b1        b2 
(9.99,15]   (15,20]   (30,35]   (35,40] 
Levels: (9.99,15] (15,20] (30,35] (35,40]

或者在结果因子上没有名称:

> unlist(z, use.names=FALSE)
[1] (9.99,15] (15,20]   (30,35]   (35,40]  
Levels: (9.99,15] (15,20] (30,35] (35,40]

您可以将所有内容合并到一个简单的单行代码中,不需要附加包:

> (test2 <- within(test, newvar <- unlist(lapply(split(c1, c2), cut, 2))))
  c1 c2    newvar
1 10  a (9.99,15]
2 20  a   (15,20]
3 30  b   (30,35]
4 40  b   (35,40]

Can you not just unlist() z instead?

> unlist(z)
       a1        a2        b1        b2 
(9.99,15]   (15,20]   (30,35]   (35,40] 
Levels: (9.99,15] (15,20] (30,35] (35,40]

or without the names on the resulting factor:

> unlist(z, use.names=FALSE)
[1] (9.99,15] (15,20]   (30,35]   (35,40]  
Levels: (9.99,15] (15,20] (30,35] (35,40]

You can merge everything together into a simple one-liner that needs no add-on packages:

> (test2 <- within(test, newvar <- unlist(lapply(split(c1, c2), cut, 2))))
  c1 c2    newvar
1 10  a (9.99,15]
2 20  a   (15,20]
3 30  b   (30,35]
4 40  b   (35,40]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文