sqldf中按组累计的总和？

发布于 2024-12-21 19:01:22 字数 289 浏览 3 评论 0原文

我有一个包含 3 个变量的数据框：地点、时间和值 (P、T、X)。我想创建第四个变量，它将是 X 的累积和。通常我喜欢使用 sqldf 进行分组计算，但似乎找不到 cumsum 的等效项>。那就是：

sqldf("select P,T,X, cumsum(X) as X_CUM from df group by P,T")

不起作用。这对于 sqldf 是否可行？我尝试了doBy，但这也不全是cumsum。

原文

I have a data frame with 3 variables: place, time, and value (P, T, X). I want to create a fourth variable which will be the cumulative sum of X. Normally I like to do grouping calculations with sqldf, but can't seem to find an equivalent for cumsum. That is:

sqldf("select P,T,X, cumsum(X) as X_CUM from df group by P,T")

doesn't work. Is this even possible with sqldf? I tried doBy, but that doesn't all cumsum either.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

对你的占有欲 2024-12-28 19:01:22

设置一些测试数据：

DF <- data.frame(t = 1:4, p = rep(1:3, each = 4), value = 1:12)

现在我们有三种解决方案。首先，我们根据要求使用 sqldf，使用默认的 SQLite 数据库。接下来我们再次使用 sqldf 进行此操作，但这次使用 PostgreSQL 使用 RPostgreSQL 或 RpgSQL 驱动程序。 PostgreSQL 支持分析窗口函数，可以简化 SQL。您需要首先设置一个 PostgreSQL 数据库才能执行此操作。最后，我们展示了一个仅使用 R 核心的纯 R 解决方案。

1) sqldf/RSQLite

library(sqldf)

sqldf("select a.*, sum(b.value) as cumsum 
    from DF a join DF b 
    using (p)
    where a.t >= b.t
    group by p, a.t"
)

2) sqldf/RPostgreSQL

library(RPostgreSQL)
library(sqldf)

sqldf('select *,
    sum(value) over (partition by p order by t) as cumsum 
    from "DF"'
)

（这也适用于 RpgSQL PostgreSQL 驱动程序。使用您必须安装 Java 并设置 PostgreSQL 数据库，然后代替上述使用： 1ibrary(RpgSQL); sqldf(...) 其中使用相同的 SQL 字符串，但应该是DF 周围没有引号。）

3) Plain R

transform(DF, cumsum = ave(value, p, FUN = cumsum))

Set up some test data:

DF <- data.frame(t = 1:4, p = rep(1:3, each = 4), value = 1:12)

and now we have three solutions. First we use sqldf, as requested, using the default SQLite database. Next we do it with sqldf again but this time with PostgreSQL using RPostgreSQL or RpgSQL driver. PostgreSQL supports analytical windowing functions which simplify the SQL. You will need to set up a PostgreSQL database first to do that one. Finally we show a pure R solution which only uses the core of R.

1) sqldf/RSQLite

library(sqldf)

sqldf("select a.*, sum(b.value) as cumsum 
    from DF a join DF b 
    using (p)
    where a.t >= b.t
    group by p, a.t"
)

2) sqldf/RPostgreSQL

library(RPostgreSQL)
library(sqldf)

sqldf('select *,
    sum(value) over (partition by p order by t) as cumsum 
    from "DF"'
)

(This also works with the RpgSQL PostgreSQL driver. To use that you must have Java installed and a PostgreSQL database set up and then in place of the above use: 1ibrary(RpgSQL); sqldf(...) where the same SQL string is used except there should be no quotes around DF.)

3) Plain R

transform(DF, cumsum = ave(value, p, FUN = cumsum))

回复收藏 0 原文

不羁少年 2024-12-28 19:01:22

我希望我明白你想要什么：

library(plyr)
ddply(df, .(P,T), summarize, cumsum(X))

这对你有帮助吗？

I hope i understood what you want:

library(plyr)
ddply(df, .(P,T), summarize, cumsum(X))

does this help you?

回复收藏 0 原文

水染的天色ゝ 2024-12-28 19:01:22

或者，另一个选项是 data.table。

> library(data.table)
> DT = data.table(place = 1:4, time = rep(1:3, each = 4), value = 1:3)
> setkey(DT,place,time)   # order by place and time
> DT
      place time value
 [1,]     1    1     1
 [2,]     1    2     2
 [3,]     1    3     3
 [4,]     2    1     2
 [5,]     2    2     3
 [6,]     2    3     1
 [7,]     3    1     3
 [8,]     3    2     1
 [9,]     3    3     2
[10,]     4    1     1
[11,]     4    2     2
[12,]     4    3     3
> DT[,list(time,value,cumsum(value)),by=place]
      place time value V3
 [1,]     1    1     1  1
 [2,]     1    2     2  3
 [3,]     1    3     3  6
 [4,]     2    1     2  2
 [5,]     2    2     3  5
 [6,]     2    3     1  6
 [7,]     3    1     3  3
 [8,]     3    2     1  4
 [9,]     3    3     2  6
[10,]     4    1     1  1
[11,]     4    2     2  3
[12,]     4    3     3  6
>

Or, another option is data.table.

> library(data.table)
> DT = data.table(place = 1:4, time = rep(1:3, each = 4), value = 1:3)
> setkey(DT,place,time)   # order by place and time
> DT
      place time value
 [1,]     1    1     1
 [2,]     1    2     2
 [3,]     1    3     3
 [4,]     2    1     2
 [5,]     2    2     3
 [6,]     2    3     1
 [7,]     3    1     3
 [8,]     3    2     1
 [9,]     3    3     2
[10,]     4    1     1
[11,]     4    2     2
[12,]     4    3     3
> DT[,list(time,value,cumsum(value)),by=place]
      place time value V3
 [1,]     1    1     1  1
 [2,]     1    2     2  3
 [3,]     1    3     3  6
 [4,]     2    1     2  2
 [5,]     2    2     3  5
 [6,]     2    3     1  6
 [7,]     3    1     3  3
 [8,]     3    2     1  4
 [9,]     3    3     2  6
[10,]     4    1     1  1
[11,]     4    2     2  3
[12,]     4    3     3  6
>

回复收藏 0 原文

~没有更多了~