汇总每日内容

发布于 2024-09-17 06:12:31 字数 1142 浏览 8 评论 0原文

我一直在尝试汇总（有些不稳定的）每日数据。我实际上正在处理 csv 数据，但如果我重新创建它 - 它看起来会像这样：

library(zoo)

dates <- c("20100505", "20100505", "20100506", "20100507")
val1 <- c("10", "11", "1", "6")
val2 <- c("5", "31", "2", "7")

x <- data.frame(dates = dates, val1=val1, val2=val2)
z <- read.zoo(x, format = "%Y%m%d")

现在我想每天汇总它（请注意，有时一天有 > 1 个数据点，并且有时

我尝试了很多很多的变化，但我似乎无法聚合，所以例如这失败了：

aggregate(z, as.Date(time(z)), sum)
# Error in Summary.factor(2:3, na.rm = FALSE) : sum not meaningful for factors

似乎有很多关于聚合的内容，我尝试了很多版本，但似乎无法聚合除了每日求和之外，我还想运行 cummax 和累积平均值，

非常感谢

更新

代码。我实际上使用的是如下：

z <- read.zoo(file = "data.csv", sep = ",", header = TRUE, stringsAsFactors = FALSE, blank.lines.skip = T, na.strings="NA", format = "%Y%m%d");

看来我（无意的）对上面数字的引用与实践中发生的情况类似，因为当我这样做时：

aggregate(z, index(z), sum)
#Error in Summary.factor(25L, na.rm = FALSE) : sum not meaningful for factors

有很多列（100 左右），我如何将它们指定为自动为 as.numeric ？（stringAsFactors = False 似乎不起作用？）

原文

I've been attempting to aggregate (some what erratic) daily data. I'm actually working with csv data, but if i recreate it - it would look something like this:

library(zoo)

dates <- c("20100505", "20100505", "20100506", "20100507")
val1 <- c("10", "11", "1", "6")
val2 <- c("5", "31", "2", "7")

x <- data.frame(dates = dates, val1=val1, val2=val2)
z <- read.zoo(x, format = "%Y%m%d")

Now i'd like to aggregate this on a daily basis (notice that some times there are >1 datapoint for a day, and sometimes there arent.

I've tried lots and lots of variations, but i cant seem to aggregate, so for instance this fails:

aggregate(z, as.Date(time(z)), sum)
# Error in Summary.factor(2:3, na.rm = FALSE) : sum not meaningful for factors

There seems to be a lot of content regarding aggregate, and i've tried a number of versions but cant seem to sum this on a daily level. I'd also like to run cummax and cumulative averages in addition to the daily summing.

Any help woudl be greatly appreciated.

Update

The code I am actually using is as follows:

z <- read.zoo(file = "data.csv", sep = ",", header = TRUE, stringsAsFactors = FALSE, blank.lines.skip = T, na.strings="NA", format = "%Y%m%d");

It seems my (unintentional) quotation of the numbers above is similar to what is happening in practice, because when I do:

aggregate(z, index(z), sum)
#Error in Summary.factor(25L, na.rm = FALSE) : sum not meaningful for factors

There a number of columns (100 or so), how can i specify them to be as.numeric automatically ? (stringAsFactors = False doesnt appear to work?)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我们只是彼此的过ke 2024-09-24 06:12:31

或者在使用 Zoo 之前进行聚合（尽管 val1 和 val2 需要是数字）。

x <- data.frame(dates = dates, val1=as.numeric(val1), val2=as.numeric(val2))
y <- aggregate(x[,2:3],by=list(x[,1]),FUN=sum)

然后将y喂入动物园。

您可以避免警告:)

Or you aggregate before using zoo (val1 and val2 need to be numeric though).

x <- data.frame(dates = dates, val1=as.numeric(val1), val2=as.numeric(val2))
y <- aggregate(x[,2:3],by=list(x[,1]),FUN=sum)

and then feed y into zoo.

You avoid the warning:)

回复收藏 0 原文

始于初秋 2024-09-24 06:12:31

你一开始就走在正确的道路上，但犯了一些错误。

首先，zoo 只消耗矩阵，而不消耗 data.frames。其次，这些需要数字输入：

> z <- zoo(as.matrix(data.frame(val1=c(10,11,1,6), val2=c(5,31,2,7))), 
+          order.by=as.Date(c("20100505","20100505","20100506","20100507"),
+                           "%Y%m%d"))
Warning message:
In zoo(as.matrix(data.frame(val1 = c(10, 11, 1, 6), val2 = c(5,  :
  some methods for "zoo" objects do not work if the index entries in 
  'order.by' are not unique

这给我们带来了动物园中的标准警告：它不喜欢相同的时间索引。

显示数据结构总是一个好主意，也许也可以通过 str() ，也可以在其上运行 summary()：

> z
           val1 val2
2010-05-05   10    5
2010-05-05   11   31
2010-05-06    1    2
2010-05-07    6    7

然后，一旦我们有了它，聚合就是简单的：

> aggregate(z, index(z), sum)
           val1 val2
2010-05-05   21   36
2010-05-06    1    2
2010-05-07    6    7
>

You started on the right path but made a couple of mistakes.

First, zoo only consumes matrices, not data.frames. Second, those need numeric inputs:

> z <- zoo(as.matrix(data.frame(val1=c(10,11,1,6), val2=c(5,31,2,7))), 
+          order.by=as.Date(c("20100505","20100505","20100506","20100507"),
+                           "%Y%m%d"))
Warning message:
In zoo(as.matrix(data.frame(val1 = c(10, 11, 1, 6), val2 = c(5,  :
  some methods for "zoo" objects do not work if the index entries in 
  'order.by' are not unique

This gets us a warning which is standard in zoo: it does not like identical time indices.

Always a good idea to show the data structure, maybe via str() as well, maybe run summary() on it:

> z
           val1 val2
2010-05-05   10    5
2010-05-05   11   31
2010-05-06    1    2
2010-05-07    6    7

And then, once we have it, aggregation is easy:

> aggregate(z, index(z), sum)
           val1 val2
2010-05-05   21   36
2010-05-06    1    2
2010-05-07    6    7
>

回复收藏 0 原文

愿得七秒忆 2024-09-24 06:12:31

val1 和 val2 是字符串。 data.frame() 将它们转换为因子。求和因素没有意义。您可能想要：

x <- data.frame(dates = dates, val1=as.numeric(val1), val2=as.numeric(val2))
z <- read.zoo(x, format = "%Y%m%d")
aggregate(z, as.Date(time(z)), sum)

从而产生：

           val1 val2
2010-05-05   21   36
2010-05-06    1    2
2010-05-07    6    7

val1 and val2 are character strings. data.frame() converts them to factors. Summing factors doesn't make sense. You probably intended:

x <- data.frame(dates = dates, val1=as.numeric(val1), val2=as.numeric(val2))
z <- read.zoo(x, format = "%Y%m%d")
aggregate(z, as.Date(time(z)), sum)

which yields:

           val1 val2
2010-05-05   21   36
2010-05-06    1    2
2010-05-07    6    7

回复收藏 0 原文

无边思念无边月 2024-09-24 06:12:31

将字符列转换为数字，然后使用 read.zoo 并利用其 aggregate 参数：

> x[-1] <- lapply(x[-1], function(x) as.numeric(as.character(x)))
> read.zoo(x, format = "%Y%m%d", aggregate = sum)
             val1 val2
2010-05-05   21   36
2010-05-06    1    2
2010-05-07    6    7

Convert the character columns to numeric and then use read.zoo making use of its aggregate argument:

> x[-1] <- lapply(x[-1], function(x) as.numeric(as.character(x)))
> read.zoo(x, format = "%Y%m%d", aggregate = sum)
             val1 val2
2010-05-05   21   36
2010-05-06    1    2
2010-05-07    6    7

回复收藏 0 原文

~没有更多了~