根据正弦噪声预测值
背景
使用 R 来预测序列中的下一个值。
问题
以下代码生成并绘制具有一些均匀噪声的曲线的模型:
slope = 0.55
offset = -0.5
amplitude = 0.22
frequency = 3
noise = 0.75
x <- seq( 0, 200 )
y <- offset + (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
yn <- y + (noise * runif( length( x ) ))
gam.object <- gam( yn ~ s( x ) + 0 )
plot( gam.object, col = rgb( 1.0, 0.392, 0.0 ) )
points( x, yn, col = rgb( 0.121, 0.247, 0.506 ) )
该模型显示了预期的趋势。问题在于预测后续值:
p <- predict( gam.object, data.frame( x=201:210 ) )
绘制时预测看起来不正确:
df <- data.frame( fit=c( fitted( gam.object ), p ) )
plot( seq( 1:211 ), df[,], col="blue" )
points( yn, col="orange" )
预测值(从 201 开始)似乎太低。
问题
- 如图所示的预测值实际上是最准确的预测吗?
- 如果不是的话,如何提高准确率呢?
- 连接两个数据集(
fitted.values( gam.object )
和p
)的更好方法是什么?
Background
Using R to predict the next values in a series.
Problem
The following code generates and plots a model for a curve with some uniform noise:
slope = 0.55
offset = -0.5
amplitude = 0.22
frequency = 3
noise = 0.75
x <- seq( 0, 200 )
y <- offset + (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
yn <- y + (noise * runif( length( x ) ))
gam.object <- gam( yn ~ s( x ) + 0 )
plot( gam.object, col = rgb( 1.0, 0.392, 0.0 ) )
points( x, yn, col = rgb( 0.121, 0.247, 0.506 ) )
The model reveals the trend, as expected. The trouble is predicting subsequent values:
p <- predict( gam.object, data.frame( x=201:210 ) )
The predictions do not look correct when plotted:
df <- data.frame( fit=c( fitted( gam.object ), p ) )
plot( seq( 1:211 ), df[,], col="blue" )
points( yn, col="orange" )
The predicted values (from 201 onwards) appear to be too low.
Questions
- Are the predicted values, as shown, actually the most accurate predictions?
- If not, how can the accuracy be improved?
- What is a better way to concatenate the two data sets (
fitted.values( gam.object )
andp
)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
y
的所有错误都大于 0。(runif
在[0,1] 上创建数字
,而不是[-1,1]
。)例如:
没有截距的模型中系统低估的原因可能是 gam 对估计的平滑函数使用了和为零的约束。我认为第二点回答了你的第一个问题和第二个问题。
您的第三个问题需要澄清,因为
gam
对象不是data.frame
。这两种数据类型不能混合。一个更完整的例子:
y
are greater than 0. (runif
creates numbers on[0,1]
, not[-1,1]
.)For example:
The reason for the systematic underestimation in the model without intercept could be that
gam
uses a sum-to-zero constraint on the estimated smooth functions. I think point 2 answers your first and second questions.Your third question needs clarification because a
gam
-object is not adata.frame
. The two data types do not mix.A more complete example: