如何使用R中的以下数据查找顾客在超市花费的时间?

发布于 2024-12-28 04:41:39 字数 1468 浏览 2 评论 0原文

我有这样类型的数据:

Date           Status  ID
23-1-2010 11:40 in  321
23-1-2010 11:53 out 321
9-1-2010 12:11  in  356
9-1-2010 12:18  out 356
23-1-2010 11:37 in 356
23-1-2010 11:5  out 356
5-2-2010 13:14  in  398
5-2-2010 13:30  out 398
10-3-2010 9:30  in  398
13-3-2010 11:50 out 377
16-3-2010 10:30 in  377
16-3-2010 11:00 out 377
20-3-2010 12:09 in  377
20-3-2010 12:30 out 377

该数据描述了在特定日期和时间访问超市的顾客。客户通过其 ID 进行识别,并且还指定了其状态。

我想计算顾客在不同日期在超市花费的时间。我的数据存在的问题是,对于某些客户,仅记录进入时间或退出时间。我已经清除了访问过一次并且缺少进出状态的客户,但仍然有一些客户访问过多次且缺少进/出状态。

我已经尝试过

#create an empty data frame
TimeSpent<-rep(NA,length(df$ID))
ID<-rep(NA,length(df$ID))
Tspent<-data.frame(TimeSpent,ID)



#compute the time spent time
for(i in 1:length(df$Date - 1))
  {
      if(isTRUE(df$Status[i] == "in" && df$Status[i+1] == "out"))
      {
        Tspent$ID[i] <- df$ID[i]
        Tspent$TimeSpent[i] <- difftime(df$Date[i+1] - df$Date[i])
      } else if(isTRUE(df$Status[i+1] == "in" && df$Status[i+2] == "out"))
      {
        Tspent$ID[i] <- df$ID[i+1]
        Tspent$TimeSpent[i] <- difftime(df$Date[i+2] - df$Date[i+1])
      }  else 
        {
        Tspent$ID[i] <- df$ID[i+2]
        Tspent$TimeSpent[i] <- difftime(df$Date[i+3] - df$Date[i+2])
      }

      i<-i+1
}

,但收到此错误: as.POSIXct.default(time1) 中的错误: 不知道如何将“time1”转换为“POSIXct”类

有谁知道如何纠正我的代码或任何替代解决方案?提前致谢!

I have such type of data:

Date           Status  ID
23-1-2010 11:40 in  321
23-1-2010 11:53 out 321
9-1-2010 12:11  in  356
9-1-2010 12:18  out 356
23-1-2010 11:37 in 356
23-1-2010 11:5  out 356
5-2-2010 13:14  in  398
5-2-2010 13:30  out 398
10-3-2010 9:30  in  398
13-3-2010 11:50 out 377
16-3-2010 10:30 in  377
16-3-2010 11:00 out 377
20-3-2010 12:09 in  377
20-3-2010 12:30 out 377

The data describes customers who visited a supermarket in a certain date and time. The customers are identified by their ID and their status is also specified.

I want to calculate the time a customer spent in the supermarket on different days. The problem I have with the data is for some customers only the entrance time or exit time is recorded. I have cleared the customers who visited once and either in or out status is missing but I still have some of them who visited more than once and the in/out is missing.

I have tried this

#create an empty data frame
TimeSpent<-rep(NA,length(df$ID))
ID<-rep(NA,length(df$ID))
Tspent<-data.frame(TimeSpent,ID)



#compute the time spent time
for(i in 1:length(df$Date - 1))
  {
      if(isTRUE(df$Status[i] == "in" && df$Status[i+1] == "out"))
      {
        Tspent$ID[i] <- df$ID[i]
        Tspent$TimeSpent[i] <- difftime(df$Date[i+1] - df$Date[i])
      } else if(isTRUE(df$Status[i+1] == "in" && df$Status[i+2] == "out"))
      {
        Tspent$ID[i] <- df$ID[i+1]
        Tspent$TimeSpent[i] <- difftime(df$Date[i+2] - df$Date[i+1])
      }  else 
        {
        Tspent$ID[i] <- df$ID[i+2]
        Tspent$TimeSpent[i] <- difftime(df$Date[i+3] - df$Date[i+2])
      }

      i<-i+1
}

and I get this error:
Error in as.POSIXct.default(time1) :
do not know how to convert 'time1' to class "POSIXct"

Does anyone knows how to correct my code or any alternative solution? Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

凹づ凸ル 2025-01-04 04:41:39

我不知道你的 data.frame 的结构(尝试 str(df)),但我猜你没有将日期转换为 POSIXct 对象。这样做是这样的:

 as.POSIXct(strptime(df$Date, format='%d-%m-%Y %H:%M'))

也许这可以解决您的问题。如果没有,请发布更多我可以读入的数据(当我尝试快速读入时,日期和时间之间的空白给我一个错误)

编辑:

我想我让你知道:问题在于 difftime() 函数。你可以轻松地绕过它并在没有它的情况下进行计算——它对于我的示例数据工作得很好。

我的示例数据:

    df <- data.frame(Date=(Sys.time()+ runif(20)*3600)) # already delvers timedate object
    df <- data.frame(df[order(df),1])
    df$status <- rep(c('in', 'out'), each=(10))
    df$ID     <- rep(c(1:10), each=2)
    names(df)[1] <- 'Date'

您稍微修改的代码

 #create an empty data frame
 TimeSpent<-rep(NA,length(df$ID))
 ID<-rep(NA,length(df$ID))
 Tspent<-data.frame(TimeSpent,ID)



 #compute the time spent time
 for(i in 1:length(df$Date - 1))
   {
       if(isTRUE(df$Status[i] == "in" && df$Status[i+1] == "out"))
       {
         Tspent$ID[i] <- df$ID[i]
         Tspent$TimeSpent[i] <- df$Date[i+1] - df$Date[i]
       } else if(isTRUE(df$Status[i+1] == "in" && df$Status[i+2] == "out"))
       {
         Tspent$ID[i] <- df$ID[i+1]
         Tspent$TimeSpent[i] <- df$Date[i+2] - df$Date[i+1] ** just skipped the difftime function
       }  else 
         {
         Tspent$ID[i] <- df$ID[i+2]
         Tspent$TimeSpent[i] <- df$Date[i+3] - df$Date[i+2]
       }

       i<-i+1
 }

输出

    TimeSpent ID
 1   8.266451  2
 2   4.044099  2
 3  12.895463  3
 4   2.699761  3
 5   1.484544  4

I don't know the structure of your data.frame (try str(df)) but I guess you did not convert the date to a POSIXct object. This is done like that:

 as.POSIXct(strptime(df$Date, format='%d-%m-%Y %H:%M'))

probably this solves your problem. If not then please post some more data that I can read in (the blanks between the date and time give me an error when I tried to read it in quickly)

Edit:

I thought I let you know: The problem lies in the difftime() function. You could just easily circumvent and do the calculation without it -- it works fine for my sample data.

My sample data:

    df <- data.frame(Date=(Sys.time()+ runif(20)*3600)) # already delvers timedate object
    df <- data.frame(df[order(df),1])
    df$status <- rep(c('in', 'out'), each=(10))
    df$ID     <- rep(c(1:10), each=2)
    names(df)[1] <- 'Date'

Your sligthly altered code

 #create an empty data frame
 TimeSpent<-rep(NA,length(df$ID))
 ID<-rep(NA,length(df$ID))
 Tspent<-data.frame(TimeSpent,ID)



 #compute the time spent time
 for(i in 1:length(df$Date - 1))
   {
       if(isTRUE(df$Status[i] == "in" && df$Status[i+1] == "out"))
       {
         Tspent$ID[i] <- df$ID[i]
         Tspent$TimeSpent[i] <- df$Date[i+1] - df$Date[i]
       } else if(isTRUE(df$Status[i+1] == "in" && df$Status[i+2] == "out"))
       {
         Tspent$ID[i] <- df$ID[i+1]
         Tspent$TimeSpent[i] <- df$Date[i+2] - df$Date[i+1] ** just skipped the difftime function
       }  else 
         {
         Tspent$ID[i] <- df$ID[i+2]
         Tspent$TimeSpent[i] <- df$Date[i+3] - df$Date[i+2]
       }

       i<-i+1
 }

Output

    TimeSpent ID
 1   8.266451  2
 2   4.044099  2
 3  12.895463  3
 4   2.699761  3
 5   1.484544  4
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文