如何访问和重用“R”中“mgcv”包中的平滑?
我正在查看 R
中的 mgcv
包,我想知道 如何根据新数据更新模型。例如,假设我有 以下数据,我对拟合三次回归样条感兴趣。
# Load library.
library(mgcv)
# Set seed.
set.seed(2022)
# Data
x <- seq(10, 100, by = 10)
y <- sort(runif(10))
我可以使用 mgcv::s()
函数来拟合模型来转换我的预测变量,其中 bs = "cr"
代表三次回归样条 em> 如中所示 文档(即 ?mgcv::s
)。
# Fit.
model <- mgcv::gam(y ~ s(x, bs = "cr"))
# Print model.
model
# Family: gaussian
# Link function: identity
#
# Formula:
# y ~ s(x, bs = "cr")
#
# Estimated degrees of freedom:
# 7.51 total = 8.51
#
# GCV score: 0.001123237
我假设 mgcv::s()
用于 确定样条基函数的内结?如果我想 对x
的整个范围进行插值,看来我可以使用predict
功能。
# Prepare range of `x` for interpolation.
x_new <- 10:100
# Interpolate.
mgcv_interpolation <- predict(model, type = "link", newdata = data.frame(x = x_new))
# Plot.
plot(x, y, pch = 19)
lines(x_new, mgcv_interpolation, lwd = 2, col = "red")
我不清楚的是当新数据(即y
)时如何更新模型 例如,假设我的新数据看起来像这样。
# Set seed.
set.seed(2022)
y_new <- sort(sample(y, size = length(y), replace = TRUE))
我的理解是我可以简单地使用之前创建的基础矩阵, 但我不知道如何使用 mgcv
做到这一点。例如,这就是我可以做的 它手动使用 B 样条基础。
# ...
# Suppose that based on some cross-validation procedure `df = 6` is selected.
df <- 6
# Create B-Spline basis functions.
basis <- splines::bs(x, df = df, degree = 3, intercept = TRUE)
# Estimate spline coefficients.
coefficients <- lm.fit(basis, y)$coef
# Compute fitted values.
fitted <- basis %*% coefficients
# Create extended basis for `x_new`.
basis_x_new <- splines::bs(x_new, df = df, degree = 3, intercept = TRUE)
# Interpolate.
bs_interpolation <- basis_x_new %*% coefficients
# Add to previous plot.
lines(x_new, bs_interpolation, lwd = 2, col = "blue")
# Update model based on `y_new`.
coefficients_y_new <- lm.fit(basis, y_new)$coef
# Add points and lines to the previous plot.
points(x, y_new, pch = 19, col = "orange")
lines(x_new, basis_x_new %*% coefficients_y_new, lwd = 2, col = "orange")
我想我的问题是如何找到 mgcv::s() 创建并 在随后的
mgcv::gam
调用中重用它吗?或者,还有没有更多 mgcv
-这样做的惯用方法是什么?
编辑 1.
经过更多研究,我发现我可以使用 mgcv::predict.gam()
和参数 type = "lpmatrix"
来提取基础矩阵。但是,我仍然无法复制 mgcv::gam()
提供的精确系数。差异并不大,但我想知道它们来自哪里。例如:
# Extract the basis matrix from the `gam` object.
basis_gam <- mgcv::predict.gam(model, type = "lpmatrix")
# Fit the model using the basis matrix.
model_basis_gam <- mgcv::gam(y ~ basis_gam - 1)
# Compare the coefficients.
round(data.frame(
difference = coef(model) - coef(model_basis_gam)
), 4)
# difference
# (Intercept) 0.0000
# s(x).1 0.0004
# s(x).2 -0.0033
# s(x).3 0.0071
# s(x).4 -0.0020
# s(x).5 -0.0016
# s(x).6 -0.0054
# s(x).7 0.0103
# s(x).8 -0.0064
# s(x).9 0.0017
编辑2。
似乎有一个函数 mgcv::bam.update()
可以更新新日期的 GAM 模型,但对于通过 mgcv::bam 拟合的模型()
,而不是 mgcv::gam()
。尽管如此,S3
方法 update
似乎可以与 mgcv::gam()
对象一起使用,这可能是因为 class(model)
包括 "gam" "glm" "lm"
,但是文档中没有提及这一点。例如:
# Update the model for `y_new`.
model_y_new_via_update <- update(model, data = data.frame(y = y_new))
# Extract the basis matrices for `model` and `model_y_new_via_update`.
basis_model <- mgcv::predict.gam(model, type = "lpmatrix")
basis_model_y_new_via_update <- mgcv::predict.gam(model_y_new_via_update, type = "lpmatrix")
# Check that both models used the same basis matrix.
all(basis_model == basis_model_y_new_via_update)
# TRUE
另外,系数中仍然存在一些我无法解释的差异。
# Fit the model using the extracted basis matrix.
model_y_new_via_basis <- mgcv::gam(y_new ~ basis_model - 1)
# Eyeballing the coefficients.
round(data.frame(
via_update = coef(model_y_new_via_update),
via_basis = coef(model_y_new_via_basis),
difference = coef(model_y_new_via_update) - coef(model_y_new_via_basis),
row.names = names(coef(model))
), 4)
# via_update via_basis difference
# (Intercept) 0.4420 0.4420 0.0000
# s(x).1 -0.2385 -0.2333 -0.0052
# s(x).2 -0.1901 -0.1689 -0.0212
# s(x).3 -0.0854 -0.1689 0.0835
# s(x).4 0.1315 0.1902 -0.0586
# s(x).5 0.2666 0.2821 -0.0155
# s(x).6 0.2907 0.2821 0.0085
# s(x).7 0.2855 0.2821 0.0033
# s(x).8 0.3119 0.2936 0.0183
# s(x).9 0.3917 0.4036 -0.0120
I am checking out the mgcv
package in R
and I would like to know
how to update a model based on new data. For example, suppose I have the
following data and I am interested in fitting a cubic regression spline.
# Load library.
library(mgcv)
# Set seed.
set.seed(2022)
# Data
x <- seq(10, 100, by = 10)
y <- sort(runif(10))
I can fit a model using the mgcv::s()
function for transforming my predictor variable, where bs = "cr"
stands for cubic regression spline as indicated in the
documentation (i.e., ?mgcv::s
).
# Fit.
model <- mgcv::gam(y ~ s(x, bs = "cr"))
# Print model.
model
# Family: gaussian
# Link function: identity
#
# Formula:
# y ~ s(x, bs = "cr")
#
# Estimated degrees of freedom:
# 7.51 total = 8.51
#
# GCV score: 0.001123237
I assume mgcv::s()
is used used to
determine the inner knots for the spline basis functions? If I want to
interpolate the entire range of x
, it seems that I can use the predict
function.
# Prepare range of `x` for interpolation.
x_new <- 10:100
# Interpolate.
mgcv_interpolation <- predict(model, type = "link", newdata = data.frame(x = x_new))
# Plot.
plot(x, y, pch = 19)
lines(x_new, mgcv_interpolation, lwd = 2, col = "red")
What is not clear to me is how to update the model when new data (i.e., y
)
comes in. For instance, suppose my new data looks something like this.
# Set seed.
set.seed(2022)
y_new <- sort(sample(y, size = length(y), replace = TRUE))
My understanding is that I could simply use the previously created basis matrix,
but I am not sure how to do that with mgcv
. For example, this is how I can do
it manually using a B-Spline basis.
# ...
# Suppose that based on some cross-validation procedure `df = 6` is selected.
df <- 6
# Create B-Spline basis functions.
basis <- splines::bs(x, df = df, degree = 3, intercept = TRUE)
# Estimate spline coefficients.
coefficients <- lm.fit(basis, y)$coef
# Compute fitted values.
fitted <- basis %*% coefficients
# Create extended basis for `x_new`.
basis_x_new <- splines::bs(x_new, df = df, degree = 3, intercept = TRUE)
# Interpolate.
bs_interpolation <- basis_x_new %*% coefficients
# Add to previous plot.
lines(x_new, bs_interpolation, lwd = 2, col = "blue")
# Update model based on `y_new`.
coefficients_y_new <- lm.fit(basis, y_new)$coef
# Add points and lines to the previous plot.
points(x, y_new, pch = 19, col = "orange")
lines(x_new, basis_x_new %*% coefficients_y_new, lwd = 2, col = "orange")
I guess my question is how to find whatever mgcv::s()
creates and
reuse it in subsequent calls of mgcv::gam
? Or, is there a moremgcv
-idiomatic way of doing this?
Edit 1.
Poking around more, I discovered that I can extract the basis matrix using mgcv::predict.gam()
with the argument type = "lpmatrix"
. However, I am still not able to replicate the exact coefficients provided by mgcv::gam()
. The differences are not large, but I wonder where they are coming from. For example:
# Extract the basis matrix from the `gam` object.
basis_gam <- mgcv::predict.gam(model, type = "lpmatrix")
# Fit the model using the basis matrix.
model_basis_gam <- mgcv::gam(y ~ basis_gam - 1)
# Compare the coefficients.
round(data.frame(
difference = coef(model) - coef(model_basis_gam)
), 4)
# difference
# (Intercept) 0.0000
# s(x).1 0.0004
# s(x).2 -0.0033
# s(x).3 0.0071
# s(x).4 -0.0020
# s(x).5 -0.0016
# s(x).6 -0.0054
# s(x).7 0.0103
# s(x).8 -0.0064
# s(x).9 0.0017
Edit 2.
It seems that that there is a function mgcv::bam.update()
to update a GAM model for new date, but for models fit via mgcv::bam()
, and not mgcv::gam()
. Despite this, the S3
method update
appears to work with the mgcv::gam()
object, perhaps because class(model)
includes "gam" "glm" "lm"
, however, there is no mention of this in the documentation. For example:
# Update the model for `y_new`.
model_y_new_via_update <- update(model, data = data.frame(y = y_new))
# Extract the basis matrices for `model` and `model_y_new_via_update`.
basis_model <- mgcv::predict.gam(model, type = "lpmatrix")
basis_model_y_new_via_update <- mgcv::predict.gam(model_y_new_via_update, type = "lpmatrix")
# Check that both models used the same basis matrix.
all(basis_model == basis_model_y_new_via_update)
# TRUE
Also, there are still some differences in the coefficients I cannot explain.
# Fit the model using the extracted basis matrix.
model_y_new_via_basis <- mgcv::gam(y_new ~ basis_model - 1)
# Eyeballing the coefficients.
round(data.frame(
via_update = coef(model_y_new_via_update),
via_basis = coef(model_y_new_via_basis),
difference = coef(model_y_new_via_update) - coef(model_y_new_via_basis),
row.names = names(coef(model))
), 4)
# via_update via_basis difference
# (Intercept) 0.4420 0.4420 0.0000
# s(x).1 -0.2385 -0.2333 -0.0052
# s(x).2 -0.1901 -0.1689 -0.0212
# s(x).3 -0.0854 -0.1689 0.0835
# s(x).4 0.1315 0.1902 -0.0586
# s(x).5 0.2666 0.2821 -0.0155
# s(x).6 0.2907 0.2821 0.0085
# s(x).7 0.2855 0.2821 0.0033
# s(x).8 0.3119 0.2936 0.0183
# s(x).9 0.3917 0.4036 -0.0120
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
下面是一个简短的示例
x
创建您的smoothCon
对象y
和您的smoothCon< 的 beta 系数/code> 对象
x
、y
和smoothCon
对象x_new
)、旧的 (y
) 和新的 (y_new
) y 值以及smoothCon
对象Here is a brief example
smoothCon
object, usingx
y
and yoursmoothCon
objectx
,y
, and andsmoothCon
objectx_new
), the old (y
) and new (y_new
) y values, and thesmoothCon
object