进行以下分组的 R 方法是什么?
我有一些像这样的数据集:
# date # value class
1984-04-01 95.32384 A
1984-04-01 39.86818 B
1984-07-01 43.57983 A
1984-07-01 10.83754 B
现在我想按数据对数据进行分组,并从 A 类中减去 B 类的值。 我研究了 ddply、summarize、melt 和aggregate,但无法完全得到我想要的。有没有办法可以轻松做到?请注意,我每个日期都有两个值,一个是 A 类,一个是 B 类。我的意思是我可以将其重新排列成两个 dfs,按日期和类排序,然后再次合并,但我觉得还有一种更 R 的方式去做它。
I have some dataset like this:
# date # value class
1984-04-01 95.32384 A
1984-04-01 39.86818 B
1984-07-01 43.57983 A
1984-07-01 10.83754 B
Now I would like to group the data by data and subtract the value of class B from class A.
I looked into ddply, summarize, melt and aggregate but cannot quite get what I want. Is there a way to do it easily? Note that I have exactly two values per date one of class A and one of class B. I mean i could re-arrange it into two dfs order it by date and class and merge it again, but I feel there is a more R way to do it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
假设这个数据框(按照 Prasad 的帖子生成,但使用
set.seed
来实现可重复性):那么我们考虑七个解决方案:
1) zoo 可以为我们提供一个单行解决方案(不包括
library
语句):给出这个
zoo
系列:另请注意
as.data.frame(z)
或data .frame(时间= time(z), value = coredata(z))
给出一个数据框;但是,您可能希望将其保留为动物园对象,因为它是一个时间序列,并且以这种形式更方便地对其进行其他操作,例如plot(z)
2) sqldf< /strong> 还可以提供单语句解决方案(除了
library
调用之外):3) tapply 可以用作受 sqldf 解决方案启发的解决方案的基础:
4) aggregate 的使用方式与上面的
sqldf
和tapply
相同(尽管也基于aggregate
的解决方案略有不同) > 已经出现):5) doBy 包中的 summaryBy 可以提供另一种解决方案,尽管它确实需要
transform
来帮助它:6) remix< /strong> 来自混音包也可以做到这一点,但是使用
transform
并具有特别漂亮的输出:7) Hmisc 包中的 summary.formula 也有漂亮的输出:
Assuming this data frame (generated as in Prasad's post but with a
set.seed
for reproducibility):then we consider seven solutions:
1) zoo can give us a one line solution (not counting the
library
statement):giving this
zoo
series:Also note that
as.data.frame(z)
ordata.frame(time = time(z), value = coredata(z))
gives a data frame; however, you may wish to leave it as a zoo object since it is a time series and other operations are more conveniently done on it in this form, e.g.plot(z)
2) sqldf can also give a one statement solution (aside from the
library
invocation):3) tapply can be used as the basis of a solution inspired by the sqldf solution:
4) aggregate can be used in the same way as
sqldf
andtapply
above (although a slightly different solution also based onaggregate
has already appeared):5) summaryBy from the doBy package can provide yet another solution although it does need a
transform
to help it along:6) remix from the remix package can do it too but with a
transform
and features particularly pretty output:7) summary.formula in the Hmisc package also has pretty output:
我能想到的最简单的方法是使用
reshape2
包中的dcast
来创建一个数据框,每行和列包含一个日期A
和B
,然后使用transform
执行AB
:The easiest way I can think of is to use
dcast
from thereshape2
package, to create a data-frame with one date per row and columnsA
andB
, then usetransform
to doA-B
:在基础 R 中,我将使用
aggregate
和sum
来解决这个问题。这是通过将 B 类的每个值转换为其负数来实现的:(使用 @PrasadChalasani 提供的数据)
In base R, I would approach the problem by using
aggregate
andsum
. This works by converting each value of class B to its negative:(Using the data provided by @PrasadChalasani)
根据记录,我最喜欢重塑选项。这是使用 summarise 的 plyr 选项:
For the record, I like the reshape option the best. Here's a plyr option using summarise: