如何获得累积和图上常规时间点插值的斜率?

发布于 2024-12-20 01:01:47 字数 4727 浏览 4 评论 0原文

在交叉验证中,我提出了一个问题,关于按日期分析数据,但不想通过分箱生成虚假的峰值和波谷按月数据。例如,如果一个人在每个月的最后一天支付账单,但有一次支付晚了几天,那么一个月的费用将为零,而下个月的费用将是平时的两倍。都是假垃圾。

我的问题的答案解释了对累积和使用线性样条平滑来克服打嗝的插值概念分箱。我对它很感兴趣,想在 R 中实现它,但在网上找不到任何例子。我不只是想打印绘图。我想获得每个时间点(也许每天)的瞬时斜率,但该斜率应该从输入几天(或者可能几周或几个月)之前到几天的点的样条线得出时间点之后。换句话说,在一天结束时我想要得到一些东西,例如一个数据框,其中一列是每天的钱或每周的患者,但这不受变幻莫测的影响,例如我是否延迟支付几天或该月是否碰巧有 5 个手术日(而不是通常的 4 天)。

这是一些简化的模拟和绘图,以显示我所面临的情况。

library(lubridate)
library(ggplot2)
library(reshape2)
dates <- seq(as.Date("2010-02-01"), length=24, by="1 month") - 1
dates[5] <- dates[5]+3 #we are making one payment date that is 3 days late
dates#look how the payment date is the last day of every month except for
#2010-05 where it takes place on 2010-06-03 - naughty boy!
amounts <- rep(50,each=24)# pay $50 every month
register <- data.frame(dates,amounts)#this is the starting register or ledger
ggplot(data=register,aes(dates,amounts))+geom_point()#look carefully and you will see that 2010-05 has no dots in it and 2010-06 has two dots
register.by.month <- ddply(register,.(y=year(dates),month=month(dates)),summarise,month.tot=sum(amounts))#create a summary of totals by month but it lands up omiting a month in which nothing happened. Further badness is that it creates a new dataframe where one is not needed. Instead I created a new variable that allocates each date into a particular "zone" such as month or 
register$cutmonth <- as.Date(cut(register$dates, breaks = "month"))#until recently I did not know that the cut function can handle dates
table(register$cutmonth)#see how there are two payments in the month of 2010-06
#now lets look at what we paid each month. What is the total for each month
ggplot(register, aes(cutmonth, amounts))+ stat_summary(fun.y = sum, geom = "bar")#that is the truth but it is a useless truth

当一个人延迟付款几天时,看起来好像一个月的费用为零,而下个月的费用则翻倍. 这是虚假的

#so lets use cummulated expense over time
register$cumamount <- cumsum(register$amounts)
cum <- ggplot(data=register,aes(dates,cumamount))+geom_point()
cum+stat_smooth()

随着时间的推移累积数量可以消除改变项目箱的变化

#That was for everything the same every month, now lets introduce a situation where there is a trend that in the second year the amounts start to go up, 
increase <- c(rep(1,each=12),seq(from=1.01,to=1.9,length.out=12))
amounts.up <- round(amounts*increase,digits=2)#this is the monthly amount with a growth of amount in each month of the second year
register <- cbind(register,amounts.up)#add the variable to the data frarme
register$cumamount.up <- cumsum(register$amounts.up) #work out th cumulative sum for the new scenario
ggplot(data=register,aes(x=dates))+
   geom_point(aes(y=amounts, colour="amounts",shape="amounts"))+
   geom_point(aes(y=amounts.up, colour="amounts.up",shape="amounts.up"))# the plot of amount by date
#I am now going to plot the cumulative amount over time but now that I have two scenarios it is easier to deal with the data frame in long format (melted) rather than wide format (casted)
#before I can melt, the reshape2 package unforutnately can't handle date class so will have to turn them int o characters and then back again.
register[,c("dates","cutmonth")] <- lapply(register[,c("dates","cutmonth")],as.character)
register.long <- melt.data.frame(register,measure.vars=c("amounts","amounts.up"))
register.long[,c("dates","cutmonth")] <- lapply(register.long[,c("dates","cutmonth")],as.Date)
ggplot(register.long, aes(cutmonth,value))+ stat_summary(fun.y = sum, geom = "bar")+facet_grid(. ~ variable) #that is the truth but it is a useless truth, 
cum <- ggplot(data=register,aes(dates,cumamount))+geom_point()
#that is the truth but it is a useless truth. Furthermore it appears as if 2010-06 is similar to what is going on in 2011-12
#that is patently absurd. All that happened was that the 2010-05 payment was delayed by 3 days.

两种情况,但显示每个月支付的金额

#so lets use cummulated expense over time    
ggplot(data=register.long,aes(dates,c(cumamount,cumamount.up)))+geom_point() + scale_y_continuous(name='cumulative sum of amounts ($)')

这里我们看到两种情况的累积和数据

因此,对于简单绘图,变量 interpolate.daily 约为一年中每天每天 50/30.4 = 1.64 美元。对于第二个图,每月支付的金额在第二年每个月开始增加,将显示第一年每天的每日费率 1.64 美元,而对于第二年的日期,人们将看到每日费率逐渐从每天 1.64 美元增加到每天约 3.12 美元。

非常感谢您从头到尾阅读本文。你一定和我一样好奇!

Over at cross validated I asked a question about analyzing data by date but not wanting to generate spurious spikes and troughs by binning data by the month. For example if one pays a bill on the last day of each month but on one occasion one pays a few days late then the one month will reflect zero expense and the following month will reflect double the usual expense. All spurious junk.

One of the answers to my question explained the concept of interpolation using linear spline smoothing on the cumulative sum to overcome hiccoughs in binning. I am intrigued by it and want to implement it in R but cannot find any examples online. I do not just want to print plots. I want to get the instantaneous slope at each and every time point (maybe each day) but that slope should be derived from a spline that inputs points from a few days (or perhaps a few weeks or a few months) before to a few days after the time point. In other words, at the end of the day I want to get something such as a data frame in which one column is money per day or patients per week but that is not subject to vagaries such as whether I paid a few days late or whether there happened to be 5 operative days in the month (as opposed to the usual 4).

Here is some simplified simulation and plotting to show what I am up against.

library(lubridate)
library(ggplot2)
library(reshape2)
dates <- seq(as.Date("2010-02-01"), length=24, by="1 month") - 1
dates[5] <- dates[5]+3 #we are making one payment date that is 3 days late
dates#look how the payment date is the last day of every month except for
#2010-05 where it takes place on 2010-06-03 - naughty boy!
amounts <- rep(50,each=24)# pay $50 every month
register <- data.frame(dates,amounts)#this is the starting register or ledger
ggplot(data=register,aes(dates,amounts))+geom_point()#look carefully and you will see that 2010-05 has no dots in it and 2010-06 has two dots
register.by.month <- ddply(register,.(y=year(dates),month=month(dates)),summarise,month.tot=sum(amounts))#create a summary of totals by month but it lands up omiting a month in which nothing happened. Further badness is that it creates a new dataframe where one is not needed. Instead I created a new variable that allocates each date into a particular "zone" such as month or 
register$cutmonth <- as.Date(cut(register$dates, breaks = "month"))#until recently I did not know that the cut function can handle dates
table(register$cutmonth)#see how there are two payments in the month of 2010-06
#now lets look at what we paid each month. What is the total for each month
ggplot(register, aes(cutmonth, amounts))+ stat_summary(fun.y = sum, geom = "bar")#that is the truth but it is a useless truth

When one is late with a payment by a couple of days it appears as if the expense was zero in one month and double in the next. That is spurious

#so lets use cummulated expense over time
register$cumamount <- cumsum(register$amounts)
cum <- ggplot(data=register,aes(dates,cumamount))+geom_point()
cum+stat_smooth()

cumulative amount over time smooths out variability that changes an item's bin

#That was for everything the same every month, now lets introduce a situation where there is a trend that in the second year the amounts start to go up, 
increase <- c(rep(1,each=12),seq(from=1.01,to=1.9,length.out=12))
amounts.up <- round(amounts*increase,digits=2)#this is the monthly amount with a growth of amount in each month of the second year
register <- cbind(register,amounts.up)#add the variable to the data frarme
register$cumamount.up <- cumsum(register$amounts.up) #work out th cumulative sum for the new scenario
ggplot(data=register,aes(x=dates))+
   geom_point(aes(y=amounts, colour="amounts",shape="amounts"))+
   geom_point(aes(y=amounts.up, colour="amounts.up",shape="amounts.up"))# the plot of amount by date
#I am now going to plot the cumulative amount over time but now that I have two scenarios it is easier to deal with the data frame in long format (melted) rather than wide format (casted)
#before I can melt, the reshape2 package unforutnately can't handle date class so will have to turn them int o characters and then back again.
register[,c("dates","cutmonth")] <- lapply(register[,c("dates","cutmonth")],as.character)
register.long <- melt.data.frame(register,measure.vars=c("amounts","amounts.up"))
register.long[,c("dates","cutmonth")] <- lapply(register.long[,c("dates","cutmonth")],as.Date)
ggplot(register.long, aes(cutmonth,value))+ stat_summary(fun.y = sum, geom = "bar")+facet_grid(. ~ variable) #that is the truth but it is a useless truth, 
cum <- ggplot(data=register,aes(dates,cumamount))+geom_point()
#that is the truth but it is a useless truth. Furthermore it appears as if 2010-06 is similar to what is going on in 2011-12
#that is patently absurd. All that happened was that the 2010-05 payment was delayed by 3 days.

two scenarios but showing the amount of money paid in each month

#so lets use cummulated expense over time    
ggplot(data=register.long,aes(dates,c(cumamount,cumamount.up)))+geom_point() + scale_y_continuous(name='cumulative sum of amounts ($)')

Here we see the cumulative sum data for the two scenarios

So for the simple plot the variable interpolate.daily would be about $50/30.4 = $1.64 per day for every day of the year. For the second plot where the amount being paid every month starts to go up every month in the second year would be showing a daily rate of $1.64 per day for every day in the first year and for dates in the second year one would see daily rates gradually increasing from $1.64 per day to about $3.12 per day.

Thank you so much for reading this all the way to the end. You must have been as intrigued as I was!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

怪异←思 2024-12-27 01:01:47

这是一种基本方法。当然,还有更复杂的选项和需要调整的参数,但这应该是一个很好的起点。

dates <- seq(as.Date("2010-02-01"), length=24, by="1 month") - 1
dates[5] <- dates[5]+3
amounts <- rep(50,each=24)
increase <- c(rep(1,each=12),seq(from=1.01,to=1.9,length.out=12))
amounts.up <- round(amounts*increase,digits=2)

df = data.frame(dates=dates, cumamount.up=cumsum(amounts.up))

df.spline = splinefun(df$dates, df$cumamount.up)

newdates = seq(min(df$dates), max(df$dates), by=1)
money.per.day = df.spline(newdates, deriv=1)

如果您绘制它,您可以看到样条线的有趣行为:

plot(newdates, money.per.day, type='l')

在此处输入图像描述

Here is one basic way to do it. Of course there are more complex options, and parameters to tweak, but this should be a good starting point.

dates <- seq(as.Date("2010-02-01"), length=24, by="1 month") - 1
dates[5] <- dates[5]+3
amounts <- rep(50,each=24)
increase <- c(rep(1,each=12),seq(from=1.01,to=1.9,length.out=12))
amounts.up <- round(amounts*increase,digits=2)

df = data.frame(dates=dates, cumamount.up=cumsum(amounts.up))

df.spline = splinefun(df$dates, df$cumamount.up)

newdates = seq(min(df$dates), max(df$dates), by=1)
money.per.day = df.spline(newdates, deriv=1)

If you plot it, you can see the interesting behavior of the splines:

plot(newdates, money.per.day, type='l')

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文