未拆分列表，合并因素

发布于 2024-11-04 07:34:08 字数 925 浏览 4 评论 0原文

我在 R 中有以下数据框：

然后按如下方式split它：z = lapply(split(test$c1, test$c2), function(x) {cut(x, 2)})。 z 则：

$a  
[1] (9.99,15] (15,20]  
Levels: (9.99,15] (15,20]

$b  
[1] (30,35] (35,40]
Levels: (30,35] (35,40]

我想通过取消拆分列表 unsplit(z, test$c2) 将因子合并回来。这会生成一个警告：

[1] (9.99,15] (15,20]   <NA>      <NA>     
Levels: (9.99,15] (15,20]
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 1:2) :
  invalid factor level, NAs generated

我想对所有因子级别进行联合，然后进行拆分，以便不会发生此错误：

z$a = factor(z$a, levels=c(levels(z$a), levels(z$b)))
unsplit(z, test$c2)
[1] (9.99,15] (15,20]   (30,35]   (35,40]  
Levels: (9.99,15] (15,20] (30,35] (35,40]

在我的真实数据框中，我有一个非常大的列表，因此我需要迭代所有列表元素（不只是两个）。最好的方法是什么？

原文

I have the following data frame in R:

I then split it as follows: z = lapply(split(test$c1, test$c2), function(x) {cut(x,2)}). z is then:

$a  
[1] (9.99,15] (15,20]  
Levels: (9.99,15] (15,20]

$b  
[1] (30,35] (35,40]
Levels: (30,35] (35,40]

I would like to then merge the factors back by unsplitting the list unsplit(z, test$c2). This generates a warning:

[1] (9.99,15] (15,20]   <NA>      <NA>     
Levels: (9.99,15] (15,20]
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 1:2) :
  invalid factor level, NAs generated

I would like to take a union of all the factor levels and then unsplit so that this error does not happen:

z$a = factor(z$a, levels=c(levels(z$a), levels(z$b)))
unsplit(z, test$c2)
[1] (9.99,15] (15,20]   (30,35]   (35,40]  
Levels: (9.99,15] (15,20] (30,35] (35,40]

In my real data frame I have a very big list so I need to iterate over all the list elements (not just two). What is the best way to do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

仙女山的月亮 2024-11-11 07:34:08

如果我正确理解你的问题，我认为你让这个问题变得比需要的更复杂了。这是使用 plyr 的一种解决方案。我们将按 c2 变量进行分组：

require(plyr)
ddply(test, "c2", transform, newvar = cut(c1, 2))

该变量返回：

  c1 c2    newvar
1 10  a (9.99,15]
2 20  a   (15,20]
3 30  b   (30,35]
4 40  b   (35,40]

并具有以下结构：

'data.frame':   4 obs. of  3 variables:
 $ c1    : num  10 20 30 40
 $ c2    : Factor w/ 2 levels "a","b": 1 1 2 2
 $ newvar: Factor w/ 4 levels "(9.99,15]","(15,20]",..: 1 2 3 4

If I understood your question properly, I think you are making this a bit more complicated than needed. Here's one solution using plyr. We will group by the c2 variable:

require(plyr)
ddply(test, "c2", transform, newvar = cut(c1, 2))

which returns:

  c1 c2    newvar
1 10  a (9.99,15]
2 20  a   (15,20]
3 30  b   (30,35]
4 40  b   (35,40]

and has a structure of:

'data.frame':   4 obs. of  3 variables:
 $ c1    : num  10 20 30 40
 $ c2    : Factor w/ 2 levels "a","b": 1 1 2 2
 $ newvar: Factor w/ 4 levels "(9.99,15]","(15,20]",..: 1 2 3 4

回复收藏 0 原文

べ繥欢鉨o。 2024-11-11 07:34:08

你能不能只用 unlist() z 来代替？

> unlist(z)
       a1        a2        b1        b2 
(9.99,15]   (15,20]   (30,35]   (35,40] 
Levels: (9.99,15] (15,20] (30,35] (35,40]

或者在结果因子上没有名称：

> unlist(z, use.names=FALSE)
[1] (9.99,15] (15,20]   (30,35]   (35,40]  
Levels: (9.99,15] (15,20] (30,35] (35,40]

您可以将所有内容合并到一个简单的单行代码中，不需要附加包：

> (test2 <- within(test, newvar <- unlist(lapply(split(c1, c2), cut, 2))))
  c1 c2    newvar
1 10  a (9.99,15]
2 20  a   (15,20]
3 30  b   (30,35]
4 40  b   (35,40]

Can you not just unlist() z instead?

> unlist(z)
       a1        a2        b1        b2 
(9.99,15]   (15,20]   (30,35]   (35,40] 
Levels: (9.99,15] (15,20] (30,35] (35,40]

or without the names on the resulting factor:

> unlist(z, use.names=FALSE)
[1] (9.99,15] (15,20]   (30,35]   (35,40]  
Levels: (9.99,15] (15,20] (30,35] (35,40]

You can merge everything together into a simple one-liner that needs no add-on packages:

> (test2 <- within(test, newvar <- unlist(lapply(split(c1, c2), cut, 2))))
  c1 c2    newvar
1 10  a (9.99,15]
2 20  a   (15,20]
3 30  b   (30,35]
4 40  b   (35,40]

回复收藏 0 原文

~没有更多了~