使用 na.approx 进行插值:它是如何做到的?
我正在对就业数据进行一些简单的取消抑制,我偶然发现了动物园包中的 na.approx 方法。这些数据代表了政府总就业人数的百分比,我认为粗略的估计是看看州和地方政府之间的变化趋势。他们应该添加到一个。
State % Local %
2001 na na
2002 na na
2003 na na
2004 0.118147539 0.881852461
2005 0.114500321 0.885499679
2006 0.117247083 0.882752917
2007 0.116841331 0.883158669
我使用样条设置,它允许估计前导 na,
z <- zoo(DF2,1:7)
d<-na.spline(z,na.rm=FALSE,maxgap=Inf)
这给出了输出:
State % Local %
0.262918013 0.737081987
0.182809891 0.817190109
0.137735231 0.862264769
0.118147539 0.881852461
0.114500321 0.885499679
0.117247083 0.882752917
0.116841331 0.883158669
很好,对吧?让我惊讶的是,近似的 na 值总和为 1(这是我想要的,但出乎意料!),但 na.approx 的文档说它按列单独处理每一列。我错过了什么吗?我的钱花在误读文档上
I am doing some light un-suppression of employment data, and I stumbled on na.approx approach in the zoo package. The data represents the percentage of total government employment, and I figured a rough estimate would be to look at the trends of change between state and local government. They should add to one.
State % Local %
2001 na na
2002 na na
2003 na na
2004 0.118147539 0.881852461
2005 0.114500321 0.885499679
2006 0.117247083 0.882752917
2007 0.116841331 0.883158669
I use the spline setting which allows the estimation of leading na's
z <- zoo(DF2,1:7)
d<-na.spline(z,na.rm=FALSE,maxgap=Inf)
Which gives the output:
State % Local %
0.262918013 0.737081987
0.182809891 0.817190109
0.137735231 0.862264769
0.118147539 0.881852461
0.114500321 0.885499679
0.117247083 0.882752917
0.116841331 0.883158669
Great right? The part that amazes me is that, the approximated na values sum to 1 (which is what I want, but unexpected!) but the documentation for na.approx says that it does each column separately, column-wise. Am I missing something? My money's on mis-reading the documentation
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我相信这只是线性最小二乘的机会属性。由于级数之和等于 1 的约束,两个回归的斜率之和为零;截距之和为一。因此,任意时间点两个回归的拟合值总和为 1。
编辑:更多解释。
y1 = a + beta * t + epsilon
y2 = 1-y1 = (1-a) + (- beta) * t - epsilon
因此,运行 OLS 将使截距总和为 1,斜率为零。
I believe it's just a chance property of linear least squares. The slopes of from both regressions sum to zero, as a result of the constraint that the sum of the series equals one; and the intercepts sum to one. Hence the fitted values from both regressions at any point in time sum to one.
EDIT: A bit more explanations.
y1 = a + beta * t + epsilon
y2 = 1-y1 = (1-a) + (- beta) * t - epsilon
Therefore, running OLS will give intercepts summing to one, and slopes to zero.