进行以下分组的 R 方法是什么?

发布于 2024-11-16 02:43:46 字数 358 浏览 2 评论 0原文

我有一些像这样的数据集:

# date     # value    class
1984-04-01 95.32384   A
1984-04-01 39.86818   B
1984-07-01 43.57983   A
1984-07-01 10.83754   B

现在我想按数据对数据进行分组,并从 A 类中减去 B 类的值。 我研究了 ddply、summarize、melt 和aggregate,但无法完全得到我想要的。有没有办法可以轻松做到?请注意,我每个日期都有两个值,一个是 A 类,一个是 B 类。我的意思是我可以将其重新排列成两个 dfs,按日期和类排序,然后再次合并,但我觉得还有一种更 R 的方式去做它。

I have some dataset like this:

# date     # value    class
1984-04-01 95.32384   A
1984-04-01 39.86818   B
1984-07-01 43.57983   A
1984-07-01 10.83754   B

Now I would like to group the data by data and subtract the value of class B from class A.
I looked into ddply, summarize, melt and aggregate but cannot quite get what I want. Is there a way to do it easily? Note that I have exactly two values per date one of class A and one of class B. I mean i could re-arrange it into two dfs order it by date and class and merge it again, but I feel there is a more R way to do it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

篱下浅笙歌 2024-11-23 02:43:46

假设这个数据框(按照 Prasad 的帖子生成,但使用 set.seed 来实现可重复性):

set.seed(123)
DF <- data.frame( date = rep(seq(as.Date('1984-04-01'), 
                                 as.Date('1984-04-01') + 3, by=1), 
                            1, each=2),
                  class = rep(c('A','B'), 4),
                  value = sample(1:8))

那么我们考虑七个解决方案:

1) zoo 可以为我们提供一个单行解决方案(不包括 library 语句):

library(zoo)
z <- with(read.zoo(DF, split = 2), A - B)

给出这个 zoo 系列:

> z
1984-04-01 1984-04-02 1984-04-03 1984-04-04 
        -3          3          3         -5 

另请注意 as.data.frame(z)data .frame(时间= time(z), value = coredata(z)) 给出一个数据框;但是,您可能希望将其保留为动物园对象,因为它是一个时间序列,并且以这种形式更方便地对其进行其他操作,例如 plot(z)

2) sqldf< /strong> 还可以提供单语句解决方案(除了 library 调用之外):

> library(sqldf)
> sqldf("select date, sum(((class = 'A') - (class = 'B')) * value) as value
+ from DF group by date")
        date value
1 1984-04-01    -3
2 1984-04-02     3
3 1984-04-03     3
4 1984-04-04    -5

3) tapply 可以用作受 sqldf 解决方案启发的解决方案的基础:

> with(DF, tapply(((class =="A") - (class == "B")) * value, date, sum))
1984-04-01 1984-04-02 1984-04-03 1984-04-04 
        -3          3          3         -5 

4) aggregate 的使用方式与上面的 sqldftapply 相同(尽管也基于 aggregate 的解决方案略有不同) > 已经出现):

> aggregate(((DF$class=="A") - (DF$class=="B")) * DF["value"], DF["date"], sum)
        date value
1 1984-04-01    -3
2 1984-04-02     3
3 1984-04-03     3
4 1984-04-04    -5

5) doBy 包中的 summaryBy 可以提供另一种解决方案,尽管它确实需要 transform 来帮助它:

> library(doBy)
> summaryBy(value ~ date, transform(DF, value = ((class == "A") - (class == "B")) * value), FUN = sum, keep.names = TRUE)
        date value
1 1984-04-01    -3
2 1984-04-02     3
3 1984-04-03     3
4 1984-04-04    -5

6) remix< /strong> 来自混音包也可以做到这一点,但是使用 transform 并具有特别漂亮的输出:

> library(remix)
> remix(value ~ date, transform(DF, value = ((class == "A") - (class == "B")) * value), sum)
value ~ date
============

+------+------------+-------+-----+
|                           | sum |
+======+============+=======+=====+
| date | 1984-04-01 | value | -3  |
+      +------------+-------+-----+
|      | 1984-04-02 | value | 3   |
+      +------------+-------+-----+
|      | 1984-04-03 | value | 3   |
+      +------------+-------+-----+
|      | 1984-04-04 | value | -5  |
+------+------------+-------+-----+

7) Hmisc 包中的 summary.formula 也有漂亮的输出:

> library(Hmisc)
> summary(value ~ date, data = transform(DF, value = ((class == "A") - (class == "B")) * value), fun = sum, overall = FALSE)
value    N=8

+----+----------+-+-----+
|    |          |N|value|
+----+----------+-+-----+
|date|1984-04-01|2|-3   |
|    |1984-04-02|2| 3   |
|    |1984-04-03|2| 3   |
|    |1984-04-04|2|-5   |
+----+----------+-+-----+

Assuming this data frame (generated as in Prasad's post but with a set.seed for reproducibility):

set.seed(123)
DF <- data.frame( date = rep(seq(as.Date('1984-04-01'), 
                                 as.Date('1984-04-01') + 3, by=1), 
                            1, each=2),
                  class = rep(c('A','B'), 4),
                  value = sample(1:8))

then we consider seven solutions:

1) zoo can give us a one line solution (not counting the library statement):

library(zoo)
z <- with(read.zoo(DF, split = 2), A - B)

giving this zoo series:

> z
1984-04-01 1984-04-02 1984-04-03 1984-04-04 
        -3          3          3         -5 

Also note that as.data.frame(z) or data.frame(time = time(z), value = coredata(z)) gives a data frame; however, you may wish to leave it as a zoo object since it is a time series and other operations are more conveniently done on it in this form, e.g. plot(z)

2) sqldf can also give a one statement solution (aside from the library invocation):

> library(sqldf)
> sqldf("select date, sum(((class = 'A') - (class = 'B')) * value) as value
+ from DF group by date")
        date value
1 1984-04-01    -3
2 1984-04-02     3
3 1984-04-03     3
4 1984-04-04    -5

3) tapply can be used as the basis of a solution inspired by the sqldf solution:

> with(DF, tapply(((class =="A") - (class == "B")) * value, date, sum))
1984-04-01 1984-04-02 1984-04-03 1984-04-04 
        -3          3          3         -5 

4) aggregate can be used in the same way as sqldf and tapply above (although a slightly different solution also based on aggregate has already appeared):

> aggregate(((DF$class=="A") - (DF$class=="B")) * DF["value"], DF["date"], sum)
        date value
1 1984-04-01    -3
2 1984-04-02     3
3 1984-04-03     3
4 1984-04-04    -5

5) summaryBy from the doBy package can provide yet another solution although it does need a transform to help it along:

> library(doBy)
> summaryBy(value ~ date, transform(DF, value = ((class == "A") - (class == "B")) * value), FUN = sum, keep.names = TRUE)
        date value
1 1984-04-01    -3
2 1984-04-02     3
3 1984-04-03     3
4 1984-04-04    -5

6) remix from the remix package can do it too but with a transform and features particularly pretty output:

> library(remix)
> remix(value ~ date, transform(DF, value = ((class == "A") - (class == "B")) * value), sum)
value ~ date
============

+------+------------+-------+-----+
|                           | sum |
+======+============+=======+=====+
| date | 1984-04-01 | value | -3  |
+      +------------+-------+-----+
|      | 1984-04-02 | value | 3   |
+      +------------+-------+-----+
|      | 1984-04-03 | value | 3   |
+      +------------+-------+-----+
|      | 1984-04-04 | value | -5  |
+------+------------+-------+-----+

7) summary.formula in the Hmisc package also has pretty output:

> library(Hmisc)
> summary(value ~ date, data = transform(DF, value = ((class == "A") - (class == "B")) * value), fun = sum, overall = FALSE)
value    N=8

+----+----------+-+-----+
|    |          |N|value|
+----+----------+-+-----+
|date|1984-04-01|2|-3   |
|    |1984-04-02|2| 3   |
|    |1984-04-03|2| 3   |
|    |1984-04-04|2|-5   |
+----+----------+-+-----+
坠似风落 2024-11-23 02:43:46

我能想到的最简单的方法是使用 reshape2 包中的 dcast 来创建一个数据框,每行和列包含一个日期 AB,然后使用 transform 执行 AB

df <- data.frame( date = rep(seq(as.Date('1984-04-01'), 
                                 as.Date('1984-04-01') + 3, by=1), 
                            1, each=2),
                  class = rep(c('A','B'), 4),
                  value = sample(1:8))

require(reshape2)
df_wide <- dcast(df, date  ~ class, value_var = 'value')

> df_wide
        date A B
1 1984-04-01 8 7
2 1984-04-02 6 1
3 1984-04-03 3 4
4 1984-04-04 5 2

> transform( df_wide, A_B = A - B )

        date A B A_B
1 1984-04-01 8 7   1
2 1984-04-02 6 1   5
3 1984-04-03 3 4  -1
4 1984-04-04 5 2   3

The easiest way I can think of is to use dcast from the reshape2 package, to create a data-frame with one date per row and columns A and B, then use transform to do A-B:

df <- data.frame( date = rep(seq(as.Date('1984-04-01'), 
                                 as.Date('1984-04-01') + 3, by=1), 
                            1, each=2),
                  class = rep(c('A','B'), 4),
                  value = sample(1:8))

require(reshape2)
df_wide <- dcast(df, date  ~ class, value_var = 'value')

> df_wide
        date A B
1 1984-04-01 8 7
2 1984-04-02 6 1
3 1984-04-03 3 4
4 1984-04-04 5 2

> transform( df_wide, A_B = A - B )

        date A B A_B
1 1984-04-01 8 7   1
2 1984-04-02 6 1   5
3 1984-04-03 3 4  -1
4 1984-04-04 5 2   3
浪漫人生路 2024-11-23 02:43:46

在基础 R 中,我将使用 aggregatesum 来解决这个问题。这是通过将 B 类的每个值转换为其负数来实现的:(

使用 @PrasadChalasani 提供的数据)

df <- within(df, value[class=="B"] <- -value[class=="B"])
aggregate(df$value, by=list(date=df$date), sum)

        date x
1 1984-04-01 3
2 1984-04-02 2
3 1984-04-03 2
4 1984-04-04 1

In base R, I would approach the problem by using aggregate and sum. This works by converting each value of class B to its negative:

(Using the data provided by @PrasadChalasani)

df <- within(df, value[class=="B"] <- -value[class=="B"])
aggregate(df$value, by=list(date=df$date), sum)

        date x
1 1984-04-01 3
2 1984-04-02 2
3 1984-04-03 2
4 1984-04-04 1
月下凄凉 2024-11-23 02:43:46

根据记录,我最喜欢重塑选项。这是使用 summarise 的 plyr 选项:

library(plyr)

ddply(df, "date", summarise
    , A = value[class == "A"]
    , B = value[class == "B"]
    , A_B = value[class == "A"] - value[class == "B"]
)

For the record, I like the reshape option the best. Here's a plyr option using summarise:

library(plyr)

ddply(df, "date", summarise
    , A = value[class == "A"]
    , B = value[class == "B"]
    , A_B = value[class == "A"] - value[class == "B"]
)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文