汇总每日内容
我一直在尝试汇总(有些不稳定的)每日数据。我实际上正在处理 csv 数据,但如果我重新创建它 - 它看起来会像这样:
library(zoo)
dates <- c("20100505", "20100505", "20100506", "20100507")
val1 <- c("10", "11", "1", "6")
val2 <- c("5", "31", "2", "7")
x <- data.frame(dates = dates, val1=val1, val2=val2)
z <- read.zoo(x, format = "%Y%m%d")
现在我想每天汇总它(请注意,有时一天有 > 1 个数据点,并且有时
我尝试了很多很多的变化,但我似乎无法聚合,所以例如这失败了:
aggregate(z, as.Date(time(z)), sum)
# Error in Summary.factor(2:3, na.rm = FALSE) : sum not meaningful for factors
似乎有很多关于聚合的内容,我尝试了很多版本,但似乎无法聚合除了每日求和之外,我还想运行 cummax 和累积平均值,
非常感谢
更新
代码 。我实际上使用的是如下:
z <- read.zoo(file = "data.csv", sep = ",", header = TRUE, stringsAsFactors = FALSE, blank.lines.skip = T, na.strings="NA", format = "%Y%m%d");
看来我(无意的)对上面数字的引用与实践中发生的情况类似,因为当我这样做时:
aggregate(z, index(z), sum)
#Error in Summary.factor(25L, na.rm = FALSE) : sum not meaningful for factors
有很多列(100 左右),我如何将它们指定为自动为 as.numeric ?(stringAsFactors = False
似乎不起作用?)
I've been attempting to aggregate (some what erratic) daily data. I'm actually working with csv data, but if i recreate it - it would look something like this:
library(zoo)
dates <- c("20100505", "20100505", "20100506", "20100507")
val1 <- c("10", "11", "1", "6")
val2 <- c("5", "31", "2", "7")
x <- data.frame(dates = dates, val1=val1, val2=val2)
z <- read.zoo(x, format = "%Y%m%d")
Now i'd like to aggregate this on a daily basis (notice that some times there are >1 datapoint for a day, and sometimes there arent.
I've tried lots and lots of variations, but i cant seem to aggregate, so for instance this fails:
aggregate(z, as.Date(time(z)), sum)
# Error in Summary.factor(2:3, na.rm = FALSE) : sum not meaningful for factors
There seems to be a lot of content regarding aggregate, and i've tried a number of versions but cant seem to sum this on a daily level. I'd also like to run cummax and cumulative averages in addition to the daily summing.
Any help woudl be greatly appreciated.
Update
The code I am actually using is as follows:
z <- read.zoo(file = "data.csv", sep = ",", header = TRUE, stringsAsFactors = FALSE, blank.lines.skip = T, na.strings="NA", format = "%Y%m%d");
It seems my (unintentional) quotation of the numbers above is similar to what is happening in practice, because when I do:
aggregate(z, index(z), sum)
#Error in Summary.factor(25L, na.rm = FALSE) : sum not meaningful for factors
There a number of columns (100 or so), how can i specify them to be as.numeric automatically ? (stringAsFactors = False
doesnt appear to work?)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
或者在使用 Zoo 之前进行聚合(尽管 val1 和 val2 需要是数字)。
然后将
y
喂入动物园。您可以避免警告:)
Or you aggregate before using zoo (val1 and val2 need to be numeric though).
and then feed
y
into zoo.You avoid the warning:)
你一开始就走在正确的道路上,但犯了一些错误。
首先,zoo 只消耗矩阵,而不消耗 data.frames。其次,这些需要数字输入:
这给我们带来了动物园中的标准警告:它不喜欢相同的时间索引。
显示数据结构总是一个好主意,也许也可以通过
str()
,也可以在其上运行summary()
:然后,一旦我们有了它,聚合就是简单的:
You started on the right path but made a couple of mistakes.
First, zoo only consumes matrices, not data.frames. Second, those need numeric inputs:
This gets us a warning which is standard in zoo: it does not like identical time indices.
Always a good idea to show the data structure, maybe via
str()
as well, maybe runsummary()
on it:And then, once we have it, aggregation is easy:
val1
和val2
是字符串。data.frame()
将它们转换为因子。求和因素没有意义。您可能想要:从而产生:
val1
andval2
are character strings.data.frame()
converts them to factors. Summing factors doesn't make sense. You probably intended:which yields:
将字符列转换为数字,然后使用
read.zoo
并利用其aggregate
参数:Convert the character columns to numeric and then use
read.zoo
making use of itsaggregate
argument: