sqldf中按组累计的总和?

发布于 2024-12-21 19:01:22 字数 289 浏览 3 评论 0原文

我有一个包含 3 个变量的数据框:地点、时间和值 (P、T、X)。我想创建第四个变量,它将是 X 的累积和。通常我喜欢使用 sqldf 进行分组计算,但似乎找不到 cumsum 的等效项>。那就是:

sqldf("select P,T,X, cumsum(X) as X_CUM from df group by P,T") 

不起作用。这对于 sqldf 是否可行?我尝试了doBy,但这也不全是cumsum

I have a data frame with 3 variables: place, time, and value (P, T, X). I want to create a fourth variable which will be the cumulative sum of X. Normally I like to do grouping calculations with sqldf, but can't seem to find an equivalent for cumsum. That is:

sqldf("select P,T,X, cumsum(X) as X_CUM from df group by P,T") 

doesn't work. Is this even possible with sqldf? I tried doBy, but that doesn't all cumsum either.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

对你的占有欲 2024-12-28 19:01:22

设置一些测试数据:

DF <- data.frame(t = 1:4, p = rep(1:3, each = 4), value = 1:12)

现在我们有三种解决方案。首先,我们根据要求使用 sqldf,使用默认的 SQLite 数据库。接下来我们再次使用 sqldf 进行此操作,但这次使用 PostgreSQL 使用 RPostgreSQLRpgSQL 驱动程序。 PostgreSQL 支持分析窗口函数,可以简化 SQL。您需要首先设置一个 PostgreSQL 数据库才能执行此操作。最后,我们展示了一个仅使用 R 核心的纯 R 解决方案。

1) sqldf/RSQLite

library(sqldf)

sqldf("select a.*, sum(b.value) as cumsum 
    from DF a join DF b 
    using (p)
    where a.t >= b.t
    group by p, a.t"
)

2) sqldf/RPostgreSQL

library(RPostgreSQL)
library(sqldf)

sqldf('select *,
    sum(value) over (partition by p order by t) as cumsum 
    from "DF"'
)

(这也适用于 RpgSQL PostgreSQL 驱动程序。使用您必须安装 Java 并设置 PostgreSQL 数据库,然后代替上述使用: 1ibrary(RpgSQL); sqldf(...) 其中使用相同的 SQL 字符串,但应该是DF 周围没有引号。)

3) Plain R

transform(DF, cumsum = ave(value, p, FUN = cumsum))

Set up some test data:

DF <- data.frame(t = 1:4, p = rep(1:3, each = 4), value = 1:12)

and now we have three solutions. First we use sqldf, as requested, using the default SQLite database. Next we do it with sqldf again but this time with PostgreSQL using RPostgreSQL or RpgSQL driver. PostgreSQL supports analytical windowing functions which simplify the SQL. You will need to set up a PostgreSQL database first to do that one. Finally we show a pure R solution which only uses the core of R.

1) sqldf/RSQLite

library(sqldf)

sqldf("select a.*, sum(b.value) as cumsum 
    from DF a join DF b 
    using (p)
    where a.t >= b.t
    group by p, a.t"
)

2) sqldf/RPostgreSQL

library(RPostgreSQL)
library(sqldf)

sqldf('select *,
    sum(value) over (partition by p order by t) as cumsum 
    from "DF"'
)

(This also works with the RpgSQL PostgreSQL driver. To use that you must have Java installed and a PostgreSQL database set up and then in place of the above use: 1ibrary(RpgSQL); sqldf(...) where the same SQL string is used except there should be no quotes around DF.)

3) Plain R

transform(DF, cumsum = ave(value, p, FUN = cumsum))
不羁少年 2024-12-28 19:01:22

我希望我明白你想要什么:

library(plyr)
ddply(df, .(P,T), summarize, cumsum(X))

这对你有帮助吗?

I hope i understood what you want:

library(plyr)
ddply(df, .(P,T), summarize, cumsum(X))

does this help you?

水染的天色ゝ 2024-12-28 19:01:22

或者,另一个选项是 data.table

> library(data.table)
> DT = data.table(place = 1:4, time = rep(1:3, each = 4), value = 1:3)
> setkey(DT,place,time)   # order by place and time
> DT
      place time value
 [1,]     1    1     1
 [2,]     1    2     2
 [3,]     1    3     3
 [4,]     2    1     2
 [5,]     2    2     3
 [6,]     2    3     1
 [7,]     3    1     3
 [8,]     3    2     1
 [9,]     3    3     2
[10,]     4    1     1
[11,]     4    2     2
[12,]     4    3     3
> DT[,list(time,value,cumsum(value)),by=place]
      place time value V3
 [1,]     1    1     1  1
 [2,]     1    2     2  3
 [3,]     1    3     3  6
 [4,]     2    1     2  2
 [5,]     2    2     3  5
 [6,]     2    3     1  6
 [7,]     3    1     3  3
 [8,]     3    2     1  4
 [9,]     3    3     2  6
[10,]     4    1     1  1
[11,]     4    2     2  3
[12,]     4    3     3  6
> 

Or, another option is data.table.

> library(data.table)
> DT = data.table(place = 1:4, time = rep(1:3, each = 4), value = 1:3)
> setkey(DT,place,time)   # order by place and time
> DT
      place time value
 [1,]     1    1     1
 [2,]     1    2     2
 [3,]     1    3     3
 [4,]     2    1     2
 [5,]     2    2     3
 [6,]     2    3     1
 [7,]     3    1     3
 [8,]     3    2     1
 [9,]     3    3     2
[10,]     4    1     1
[11,]     4    2     2
[12,]     4    3     3
> DT[,list(time,value,cumsum(value)),by=place]
      place time value V3
 [1,]     1    1     1  1
 [2,]     1    2     2  3
 [3,]     1    3     3  6
 [4,]     2    1     2  2
 [5,]     2    2     3  5
 [6,]     2    3     1  6
 [7,]     3    1     3  3
 [8,]     3    2     1  4
 [9,]     3    3     2  6
[10,]     4    1     1  1
[11,]     4    2     2  3
[12,]     4    3     3  6
> 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文