根据正弦噪声预测值

发布于 2024-10-09 06:04:59 字数 939 浏览 4 评论 0原文

背景

使用 R 来预测序列中的下一个值。

问题

以下代码生成并绘制具有一些均匀噪声的曲线的模型:

slope = 0.55
offset = -0.5
amplitude = 0.22
frequency = 3
noise = 0.75
x <- seq( 0, 200 )
y <- offset + (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
yn <- y + (noise * runif( length( x ) ))

gam.object <- gam( yn ~ s( x ) + 0 )
plot( gam.object, col = rgb( 1.0, 0.392, 0.0 ) )
points( x, yn, col = rgb( 0.121, 0.247, 0.506 ) )

该模型显示了预期的趋势。问题在于预测后续值:

p <- predict( gam.object, data.frame( x=201:210 ) )

绘制时预测看起来不正确:

df <- data.frame( fit=c( fitted( gam.object ), p ) )
plot( seq( 1:211 ), df[,], col="blue" )
points( yn, col="orange" )

预测值(从 201 开始)似乎太低。

问题

  1. 如图所示的预测值实际上是最准确的预测吗?
  2. 如果不是的话,如何提高准确率呢?
  3. 连接两个数据集(fitted.values( gam.object )p)的更好方法是什么?

Background

Using R to predict the next values in a series.

Problem

The following code generates and plots a model for a curve with some uniform noise:

slope = 0.55
offset = -0.5
amplitude = 0.22
frequency = 3
noise = 0.75
x <- seq( 0, 200 )
y <- offset + (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
yn <- y + (noise * runif( length( x ) ))

gam.object <- gam( yn ~ s( x ) + 0 )
plot( gam.object, col = rgb( 1.0, 0.392, 0.0 ) )
points( x, yn, col = rgb( 0.121, 0.247, 0.506 ) )

The model reveals the trend, as expected. The trouble is predicting subsequent values:

p <- predict( gam.object, data.frame( x=201:210 ) )

The predictions do not look correct when plotted:

df <- data.frame( fit=c( fitted( gam.object ), p ) )
plot( seq( 1:211 ), df[,], col="blue" )
points( yn, col="orange" )

The predicted values (from 201 onwards) appear to be too low.

Questions

  1. Are the predicted values, as shown, actually the most accurate predictions?
  2. If not, how can the accuracy be improved?
  3. What is a better way to concatenate the two data sets (fitted.values( gam.object ) and p)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

世态炎凉 2024-10-16 06:04:59
  1. 模拟数据很奇怪,因为您添加到“true”y 的所有错误都大于 0。(runif[0,1] 上创建数字,而不是[-1,1]。)
  2. 当模型中允许使用截距项时,问题就会消失。

例如:

gam.object2 <- gam( yn ~ s( x ))
p2 <- predict( gam.object2, data.frame( x=201:210 ))
points( 1:211, c( fitted( gam.object2 ), p2), col="green")

没有截距的模型中系统低估的原因可能是 gam 对估计的平滑函数使用了和为零的约束。我认为第二点回答了你的第一个问题和第二个问题。

您的第三个问题需要澄清,因为 gam 对象不是 data.frame。这两种数据类型不能混合。

一个更完整的例子:

slope = 0.55
amplitude = 0.22
frequency = 3
noise = 0.75
x <- 1:200
y <- (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
ynoise <- y + (noise * runif( length( x ) ))

gam.object <- gam( ynoise ~ s( x ) )
p <- predict( gam.object, data.frame( x = 1:210 ) )

plot( p, col = rgb( 0, 0.75, 0.2 ) )
points( x, ynoise, col = rgb( 0.121, 0.247, 0.506 ) )
points( fitted( gam.object ), col = rgb( 1.0, 0.392, 0.0 ) )
  1. The simulated data is weird, because all the errors you add to the "true" y are greater than 0. (runif creates numbers on [0,1], not [-1,1].)
  2. The problem disappears when an intercept term in the model is allowed.

For example:

gam.object2 <- gam( yn ~ s( x ))
p2 <- predict( gam.object2, data.frame( x=201:210 ))
points( 1:211, c( fitted( gam.object2 ), p2), col="green")

The reason for the systematic underestimation in the model without intercept could be that gam uses a sum-to-zero constraint on the estimated smooth functions. I think point 2 answers your first and second questions.

Your third question needs clarification because a gam-object is not a data.frame. The two data types do not mix.

A more complete example:

slope = 0.55
amplitude = 0.22
frequency = 3
noise = 0.75
x <- 1:200
y <- (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
ynoise <- y + (noise * runif( length( x ) ))

gam.object <- gam( ynoise ~ s( x ) )
p <- predict( gam.object, data.frame( x = 1:210 ) )

plot( p, col = rgb( 0, 0.75, 0.2 ) )
points( x, ynoise, col = rgb( 0.121, 0.247, 0.506 ) )
points( fitted( gam.object ), col = rgb( 1.0, 0.392, 0.0 ) )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文