将数据框拆分为嵌套列表时出现问题
我是 R 的新手,在将非常大的数据框拆分为嵌套列表时遇到问题。我尝试在互联网上寻求帮助,但没有成功。
我有一个关于如何组织数据的简化示例:
标题是:
1 "station" (number)
2. "date.str" (date string)
3. "member"
4. "forecast time"
5. "data"
我不确定我的数据示例是否会正确显示,但如果是这样,它看起来像这样:
1. station date.str member forecast.time data1
2. 6019 20110805 mbr000 06 77
3. 6031 20110805 mbr000 06 28
4. 6071 20110805 mbr000 06 45
5. 6019 20110805 mbr001 12 22
6. 6019 20110806 mbr024 18 66
我想在之后将大数据框拆分为嵌套列表“站”、“会员”、“日期.str”和“预测.时间”。这样 mylist[[c(s,m,d,t)]] 包含一个数据框,其中包含站“s”的数据和 date.str“d”的成员“m”的数据以及保存值的预测时间“t” s、m、d 和 t。
我的代码是:(
data.st <- list()
data.st.member <- list()
data.st.member.dato <- list()
data.st. <- split(mydata, mydata$station)
data.st.member <- lapply(data.st, FUN = fsplit.member)
我创建了一个在“member”之后进行拆分的函数)
#Loop over station number:
for (s in 1:S){
#Loop over members:
for (m in 1:length(members){
tmp <- split( data.st.member[[s]][[m]], data.st.member[[s]][[m]]$dato.str )
#Loop over number of different "date.str"s
for (t in 1:length(no.date.str) ){
data.st.member.dato[[s]][[m]][[t]] <- tmp}
} #end m loop
} #end s loop
我还想根据预测时间进行拆分:forec.time,但我没有做到这一点。
我在循环中尝试了几种不同的配置,因此目前没有一致的错误消息。我不知道我在做什么或想错了。
非常感谢任何帮助!
问候 西塞
I am a newbie to R and I have problem splitting a very large data frame into a nested list. I tried to look for help on the internet, but I was unsuccessful.
I have a simplified example on how my data are organized:
The headers are:
1 "station" (number)
2. "date.str" (date string)
3. "member"
4. "forecast time"
5. "data"
I am not sure my data example will show up rightly, but if so, it look like this:
1. station date.str member forecast.time data1
2. 6019 20110805 mbr000 06 77
3. 6031 20110805 mbr000 06 28
4. 6071 20110805 mbr000 06 45
5. 6019 20110805 mbr001 12 22
6. 6019 20110806 mbr024 18 66
I want to split the large data frame into a nested list after "station", "member", "date.str" and "forecast.time". So that mylist[[c(s,m,d,t)]] contains a data frame with data for station "s" and member "m" for date.str "d" and for forecast time "t" conserving the values of s, m, d and t.
My code is:
data.st <- list()
data.st.member <- list()
data.st.member.dato <- list()
data.st. <- split(mydata, mydata$station)
data.st.member <- lapply(data.st, FUN = fsplit.member)
(I created a function to split after "member")
#Loop over station number:
for (s in 1:S){
#Loop over members:
for (m in 1:length(members){
tmp <- split( data.st.member[[s]][[m]], data.st.member[[s]][[m]]$dato.str )
#Loop over number of different "date.str"s
for (t in 1:length(no.date.str) ){
data.st.member.dato[[s]][[m]][[t]] <- tmp}
} #end m loop
} #end s loop
I would also like to split according to the forecast time: forec.time, but I didn't get that far.
I have tried a couple of different configurations within the loops, so I don't at the moment have a consistent error message. I can't figure out, what I am doing or thinking wrong.
Any help is much appreciated!
Regards
Sisse
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这比你想象的要容易。您可以将列表传递到
split
中,以便根据多个因素进行拆分。可重现的示例
使用您的数据
注意:这不会为您提供您所要求的嵌套列表,但正如 Joran 评论的那样,您很可能不想要这样。扁平化的列表会更适合使用。
疯狂猜测:您只是想计算不同数据块的统计数据吗?如果是这样,请参阅 split-apply-combine 上的许多问题问题。
It's easier than you think. You can pass a list into
split
in order to split on several factors.Reproducible example
With your data
Note: This doesn't give you a nested list like you asked for, but as Joran commented, you very probably don't want that. A flat list will be nicer to work with.
Speculating wildly: did you just want to calculate statistics on different chunks of data? If so, then see the many questions here on split-apply-combine problems.
我还想回应其他人的看法,这种递归数据结构将很难使用,并且可能有更好的方法。请按照里奇的建议看看“拆分-应用-组合”方法。但是,约束可能是外部的,因此这里是使用
plyr
库的答案。使用您为
mydata
提供的数据片段,I also want to echo the others in that this recursive data structure is going to be difficult to work with and probably there are better ways. Do look at the split-apply-combine approach as Richie suggested. However, the constraints may be external, so here is an answer using the
plyr
library.Using the snippet of data you gave for
mydata
,