如果一天中两小时之间的数据满足标准,则一天的子集数据?

发布于 2024-11-18 15:51:49 字数 989 浏览 4 评论 0原文

我对 R 相当陌生,如果你能帮助解决这个问题,那就太好了,因为我无法在网上找到这个问题的任何答案。 这是我的数据框 (DF) 的一部分(以这种格式持续到 2008 年)

Counter Date    Hour    counts
1245    26/05/2006  0   1
1245    26/05/2006  100 0
1245    26/05/2006  200 2
1245    26/05/2006  300 0
1245    26/05/2006  400 5
1245    26/05/2006  500 3
1245    26/05/2006  600 9
1245    26/05/2006  700 10
1245    26/05/2006  800 15

这是我的问题:我需要对我的代码进行子集化,以便在 600 到 2200 小时之间(如果有)计数超过 0 那么我需要在数据集中保留一整天(000 到 2300),但是如果在指定时间段(600 到 2200)内没有计数,那么一整天需要被删除。我该怎么做?

我尝试使用下面的代码来完成此操作,尽管它只需要 600 到 2200 小时之间的计数数据,而且我不知道如何让它花费一整天的时间。

DF2=DF[(DF$hour>=600)&(DF$hour<=2200)&(DF$counts>0),] ##16hr worth of counts from 600 to 2200

然后,我使用以下代码对数据进行子集化,其中每小时计数聚合为每日计数

daily=subset(DF2)
    daily$date = as.Date(daily$date, "%m/%d/%Y") 
    agg=aggregate(counts~ date, daily, sum)
town=merge(agg,DF2$counter,all=TRUE) 

非常感谢您提前提供的帮助, 凯蒂

I’m fairly new to R and it would be great if you could help out with this problem as i havent been able to find any answers to this problem online.
This is part of my data frame (DF) (it goes on until 2008 in this format)

Counter Date    Hour    counts
1245    26/05/2006  0   1
1245    26/05/2006  100 0
1245    26/05/2006  200 2
1245    26/05/2006  300 0
1245    26/05/2006  400 5
1245    26/05/2006  500 3
1245    26/05/2006  600 9
1245    26/05/2006  700 10
1245    26/05/2006  800 15

This is my question: I need to subset my code so that between the hours of 600 and 2200 if there are counts over 0 then I need to keep the whole day (000 to 2300) in the data set, but if there are no counts in the specified time period (600 to 2200) then the whole day needs to be deleted. How can I do this?

I tried to do this with the following piece of code, although it takes ONLY the counts data between 600 and 2200 hours and i can't figure out how to make it take the whole day.

DF2=DF[(DF$hour>=600)&(DF$hour<=2200)&(DF$counts>0),] ##16hr worth of counts from 600 to 2200

I’m then subsetting the data where hourly counts are aggregated into daily counts using the following code

daily=subset(DF2)
    daily$date = as.Date(daily$date, "%m/%d/%Y") 
    agg=aggregate(counts~ date, daily, sum)
town=merge(agg,DF2$counter,all=TRUE) 

Thank you so much for your help in advance,
Katie

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

满地尘埃落定 2024-11-25 15:51:49

试试这个:

TDF <- subset(DF, hour>=600 & hour<=2200)
# get dates where there at least one hour with count data in range
dates <- subset(aggregate(counts~Date,TDF,sum),counts>0)$Date
# get dates where there are no hours with zero count
dates2 <- subset(aggregate(counts~Date,TDF,prod),counts>0)$Date

DF2 <- subset(DF,Date %in% dates)
DF3 <- subset(DF,Date %in% dates2)

Try this:

TDF <- subset(DF, hour>=600 & hour<=2200)
# get dates where there at least one hour with count data in range
dates <- subset(aggregate(counts~Date,TDF,sum),counts>0)$Date
# get dates where there are no hours with zero count
dates2 <- subset(aggregate(counts~Date,TDF,prod),counts>0)$Date

DF2 <- subset(DF,Date %in% dates)
DF3 <- subset(DF,Date %in% dates2)
套路撩心 2024-11-25 15:51:49

plyr 是你的朋友:)

install.packages(plyr)
library(plyr)

ddply(DF, .(Date), function(day) {
   if (sum(day$hour >=600 & day$hour <= 2200) > 0) day
   else subset(day, hour == -1)
})

ddply 将按 DateDF 中的条目进行分组,然后对于每个组,如果有一个小时数在 6000 之间的条目2200,当天返回;否则返回空数据框。然后,ddply 将把所有组组合成一个结果数据框。

plyr is your friend :)

install.packages(plyr)
library(plyr)

ddply(DF, .(Date), function(day) {
   if (sum(day$hour >=600 & day$hour <= 2200) > 0) day
   else subset(day, hour == -1)
})

ddply will group entries in DF by Date, then for every group, if there is an entry with hour between 6000 and 2200, return that day; otherwise return an empty data frame. ddply will then combine all groups into a resulting data frame.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文