如何使用R中的以下数据查找顾客在超市花费的时间?
我有这样类型的数据:
Date Status ID
23-1-2010 11:40 in 321
23-1-2010 11:53 out 321
9-1-2010 12:11 in 356
9-1-2010 12:18 out 356
23-1-2010 11:37 in 356
23-1-2010 11:5 out 356
5-2-2010 13:14 in 398
5-2-2010 13:30 out 398
10-3-2010 9:30 in 398
13-3-2010 11:50 out 377
16-3-2010 10:30 in 377
16-3-2010 11:00 out 377
20-3-2010 12:09 in 377
20-3-2010 12:30 out 377
该数据描述了在特定日期和时间访问超市的顾客。客户通过其 ID 进行识别,并且还指定了其状态。
我想计算顾客在不同日期在超市花费的时间。我的数据存在的问题是,对于某些客户,仅记录进入时间或退出时间。我已经清除了访问过一次并且缺少进出状态的客户,但仍然有一些客户访问过多次且缺少进/出状态。
我已经尝试过
#create an empty data frame
TimeSpent<-rep(NA,length(df$ID))
ID<-rep(NA,length(df$ID))
Tspent<-data.frame(TimeSpent,ID)
#compute the time spent time
for(i in 1:length(df$Date - 1))
{
if(isTRUE(df$Status[i] == "in" && df$Status[i+1] == "out"))
{
Tspent$ID[i] <- df$ID[i]
Tspent$TimeSpent[i] <- difftime(df$Date[i+1] - df$Date[i])
} else if(isTRUE(df$Status[i+1] == "in" && df$Status[i+2] == "out"))
{
Tspent$ID[i] <- df$ID[i+1]
Tspent$TimeSpent[i] <- difftime(df$Date[i+2] - df$Date[i+1])
} else
{
Tspent$ID[i] <- df$ID[i+2]
Tspent$TimeSpent[i] <- difftime(df$Date[i+3] - df$Date[i+2])
}
i<-i+1
}
,但收到此错误: as.POSIXct.default(time1) 中的错误: 不知道如何将“time1”转换为“POSIXct”类
有谁知道如何纠正我的代码或任何替代解决方案?提前致谢!
I have such type of data:
Date Status ID
23-1-2010 11:40 in 321
23-1-2010 11:53 out 321
9-1-2010 12:11 in 356
9-1-2010 12:18 out 356
23-1-2010 11:37 in 356
23-1-2010 11:5 out 356
5-2-2010 13:14 in 398
5-2-2010 13:30 out 398
10-3-2010 9:30 in 398
13-3-2010 11:50 out 377
16-3-2010 10:30 in 377
16-3-2010 11:00 out 377
20-3-2010 12:09 in 377
20-3-2010 12:30 out 377
The data describes customers who visited a supermarket in a certain date and time. The customers are identified by their ID and their status is also specified.
I want to calculate the time a customer spent in the supermarket on different days. The problem I have with the data is for some customers only the entrance time or exit time is recorded. I have cleared the customers who visited once and either in or out status is missing but I still have some of them who visited more than once and the in/out is missing.
I have tried this
#create an empty data frame
TimeSpent<-rep(NA,length(df$ID))
ID<-rep(NA,length(df$ID))
Tspent<-data.frame(TimeSpent,ID)
#compute the time spent time
for(i in 1:length(df$Date - 1))
{
if(isTRUE(df$Status[i] == "in" && df$Status[i+1] == "out"))
{
Tspent$ID[i] <- df$ID[i]
Tspent$TimeSpent[i] <- difftime(df$Date[i+1] - df$Date[i])
} else if(isTRUE(df$Status[i+1] == "in" && df$Status[i+2] == "out"))
{
Tspent$ID[i] <- df$ID[i+1]
Tspent$TimeSpent[i] <- difftime(df$Date[i+2] - df$Date[i+1])
} else
{
Tspent$ID[i] <- df$ID[i+2]
Tspent$TimeSpent[i] <- difftime(df$Date[i+3] - df$Date[i+2])
}
i<-i+1
}
and I get this error:
Error in as.POSIXct.default(time1) :
do not know how to convert 'time1' to class "POSIXct"
Does anyone knows how to correct my code or any alternative solution? Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不知道你的 data.frame 的结构(尝试
str(df)
),但我猜你没有将日期转换为 POSIXct 对象。这样做是这样的:也许这可以解决您的问题。如果没有,请发布更多我可以读入的数据(当我尝试快速读入时,日期和时间之间的空白给我一个错误)
编辑:
我想我让你知道:问题在于
difftime()
函数。你可以轻松地绕过它并在没有它的情况下进行计算——它对于我的示例数据工作得很好。我的示例数据:
您稍微修改的代码
输出
I don't know the structure of your data.frame (try
str(df)
) but I guess you did not convert the date to a POSIXct object. This is done like that:probably this solves your problem. If not then please post some more data that I can read in (the blanks between the date and time give me an error when I tried to read it in quickly)
Edit:
I thought I let you know: The problem lies in the
difftime()
function. You could just easily circumvent and do the calculation without it -- it works fine for my sample data.My sample data:
Your sligthly altered code
Output