如何访问和重用“R”中“mgcv”包中的平滑?

发布于 2025-01-12 13:46:46 字数 5585 浏览 0 评论 0原文

我正在查看 R 中的 mgcv 包,我想知道 如何根据新数据更新模型。例如,假设我有 以下数据,我对拟合三次回归样条感兴趣。

# Load library.
library(mgcv)

# Set seed.
set.seed(2022)

# Data
x <- seq(10, 100, by = 10)
y <- sort(runif(10))

我可以使用 mgcv::s() 函数来拟合模型来转换我的预测变量,其中 bs = "cr" 代表三次回归样条 em> 如中所示 文档(即 ?mgcv::s)。

# Fit.
model <- mgcv::gam(y ~ s(x, bs = "cr"))

# Print model.
model

# Family: gaussian 
# Link function: identity 
#
# Formula:
# y ~ s(x, bs = "cr")
#
# Estimated degrees of freedom:
# 7.51  total = 8.51 
# 
# GCV score: 0.001123237   

我假设 mgcv::s() 用于 确定样条基函数的内结?如果我想 对x的整个范围进行插值,看来我可以使用predict 功能。

# Prepare range of `x` for interpolation.
x_new <- 10:100

# Interpolate.
mgcv_interpolation <- predict(model, type = "link", newdata = data.frame(x = x_new))

# Plot.
plot(x, y, pch = 19)
lines(x_new, mgcv_interpolation, lwd = 2, col = "red")

输入图片这里的描述

我不清楚的是当新数据(即y)时如何更新模型 例如,假设我的新数据看起来像这样。

# Set seed.
set.seed(2022)
y_new <- sort(sample(y, size = length(y), replace = TRUE))

我的理解是我可以简单地使用之前创建的基础矩阵, 但我不知道如何使用 mgcv 做到这一点。例如,这就是我可以做的 它手动使用 B 样条基础。

# ...
# Suppose that based on some cross-validation procedure `df = 6` is selected.
df <- 6

# Create B-Spline basis functions.
basis <- splines::bs(x, df = df, degree = 3, intercept = TRUE)

# Estimate spline coefficients.
coefficients <- lm.fit(basis, y)$coef

# Compute fitted values.
fitted <- basis %*% coefficients

# Create extended basis for `x_new`.
basis_x_new <- splines::bs(x_new, df = df, degree = 3, intercept = TRUE)

# Interpolate.
bs_interpolation <- basis_x_new %*% coefficients

# Add to previous plot.
lines(x_new, bs_interpolation, lwd = 2, col = "blue")

输入图片此处描述

# Update model based on `y_new`.
coefficients_y_new <- lm.fit(basis, y_new)$coef

# Add points and lines to the previous plot.
points(x, y_new, pch = 19, col = "orange")
lines(x_new, basis_x_new %*% coefficients_y_new, lwd = 2, col = "orange")

在此处输入图像描述

我想我的问题是如何找到 mgcv::s() 创建并 在随后的 mgcv::gam 调用中重用它吗?或者,还有没有更多 mgcv-这样做的惯用方法是什么?


编辑 1.

经过更多研究,我发现我可以使用 mgcv::predict.gam() 和参数 type = "lpmatrix" 来提取基础矩阵。但是,我仍然无法复制 mgcv::gam() 提供的精确系数。差异并不大,但我想知道它们来自哪里。例如:

# Extract the basis matrix from the `gam` object.
basis_gam <- mgcv::predict.gam(model, type = "lpmatrix")

# Fit the model using the basis matrix.
model_basis_gam <- mgcv::gam(y ~ basis_gam - 1)

# Compare the coefficients.
round(data.frame(
    difference = coef(model) - coef(model_basis_gam)
), 4)

#              difference
# (Intercept)      0.0000
# s(x).1           0.0004
# s(x).2          -0.0033
# s(x).3           0.0071
# s(x).4          -0.0020
# s(x).5          -0.0016
# s(x).6          -0.0054
# s(x).7           0.0103
# s(x).8          -0.0064
# s(x).9           0.0017

编辑2。

似乎有一个函数 mgcv::bam.update() 可以更新新日期的 GAM 模型,但对于通过 mgcv::bam 拟合的模型(),而不是 mgcv::gam()。尽管如此,S3 方法 update 似乎可以与 mgcv::gam() 对象一起使用,这可能是因为 class(model) 包括 "gam" "glm" "lm",但是文档中没有提及这一点。例如:

# Update the model for `y_new`.
model_y_new_via_update <- update(model, data = data.frame(y = y_new))

# Extract the basis matrices for `model` and `model_y_new_via_update`.
basis_model <- mgcv::predict.gam(model, type = "lpmatrix")
basis_model_y_new_via_update <- mgcv::predict.gam(model_y_new_via_update, type = "lpmatrix")

# Check that both models used the same basis matrix.
all(basis_model == basis_model_y_new_via_update)

# TRUE

另外,系数中仍然存在一些我无法解释的差异。

# Fit the model using the extracted basis matrix.
model_y_new_via_basis <- mgcv::gam(y_new ~ basis_model - 1)

# Eyeballing the coefficients.
round(data.frame(
    via_update = coef(model_y_new_via_update),
    via_basis = coef(model_y_new_via_basis),
    difference = coef(model_y_new_via_update) - coef(model_y_new_via_basis),
    row.names = names(coef(model))
), 4)

#             via_update via_basis difference
# (Intercept)     0.4420    0.4420     0.0000
# s(x).1         -0.2385   -0.2333    -0.0052
# s(x).2         -0.1901   -0.1689    -0.0212
# s(x).3         -0.0854   -0.1689     0.0835
# s(x).4          0.1315    0.1902    -0.0586
# s(x).5          0.2666    0.2821    -0.0155
# s(x).6          0.2907    0.2821     0.0085
# s(x).7          0.2855    0.2821     0.0033
# s(x).8          0.3119    0.2936     0.0183
# s(x).9          0.3917    0.4036    -0.0120

I am checking out the mgcv package in R and I would like to know
how to update a model based on new data. For example, suppose I have the
following data and I am interested in fitting a cubic regression spline.

# Load library.
library(mgcv)

# Set seed.
set.seed(2022)

# Data
x <- seq(10, 100, by = 10)
y <- sort(runif(10))

I can fit a model using the mgcv::s() function for transforming my predictor variable, where bs = "cr" stands for cubic regression spline as indicated in the
documentation (i.e., ?mgcv::s).

# Fit.
model <- mgcv::gam(y ~ s(x, bs = "cr"))

# Print model.
model

# Family: gaussian 
# Link function: identity 
#
# Formula:
# y ~ s(x, bs = "cr")
#
# Estimated degrees of freedom:
# 7.51  total = 8.51 
# 
# GCV score: 0.001123237   

I assume mgcv::s() is used used to
determine the inner knots for the spline basis functions? If I want to
interpolate the entire range of x, it seems that I can use the predict
function.

# Prepare range of `x` for interpolation.
x_new <- 10:100

# Interpolate.
mgcv_interpolation <- predict(model, type = "link", newdata = data.frame(x = x_new))

# Plot.
plot(x, y, pch = 19)
lines(x_new, mgcv_interpolation, lwd = 2, col = "red")

enter image description here

What is not clear to me is how to update the model when new data (i.e., y)
comes in. For instance, suppose my new data looks something like this.

# Set seed.
set.seed(2022)
y_new <- sort(sample(y, size = length(y), replace = TRUE))

My understanding is that I could simply use the previously created basis matrix,
but I am not sure how to do that with mgcv. For example, this is how I can do
it manually using a B-Spline basis.

# ...
# Suppose that based on some cross-validation procedure `df = 6` is selected.
df <- 6

# Create B-Spline basis functions.
basis <- splines::bs(x, df = df, degree = 3, intercept = TRUE)

# Estimate spline coefficients.
coefficients <- lm.fit(basis, y)$coef

# Compute fitted values.
fitted <- basis %*% coefficients

# Create extended basis for `x_new`.
basis_x_new <- splines::bs(x_new, df = df, degree = 3, intercept = TRUE)

# Interpolate.
bs_interpolation <- basis_x_new %*% coefficients

# Add to previous plot.
lines(x_new, bs_interpolation, lwd = 2, col = "blue")

enter image description here

# Update model based on `y_new`.
coefficients_y_new <- lm.fit(basis, y_new)$coef

# Add points and lines to the previous plot.
points(x, y_new, pch = 19, col = "orange")
lines(x_new, basis_x_new %*% coefficients_y_new, lwd = 2, col = "orange")

enter image description here

I guess my question is how to find whatever mgcv::s() creates and
reuse it in subsequent calls of mgcv::gam? Or, is there a more
mgcv-idiomatic way of doing this?


Edit 1.

Poking around more, I discovered that I can extract the basis matrix using mgcv::predict.gam() with the argument type = "lpmatrix". However, I am still not able to replicate the exact coefficients provided by mgcv::gam(). The differences are not large, but I wonder where they are coming from. For example:

# Extract the basis matrix from the `gam` object.
basis_gam <- mgcv::predict.gam(model, type = "lpmatrix")

# Fit the model using the basis matrix.
model_basis_gam <- mgcv::gam(y ~ basis_gam - 1)

# Compare the coefficients.
round(data.frame(
    difference = coef(model) - coef(model_basis_gam)
), 4)

#              difference
# (Intercept)      0.0000
# s(x).1           0.0004
# s(x).2          -0.0033
# s(x).3           0.0071
# s(x).4          -0.0020
# s(x).5          -0.0016
# s(x).6          -0.0054
# s(x).7           0.0103
# s(x).8          -0.0064
# s(x).9           0.0017

Edit 2.

It seems that that there is a function mgcv::bam.update() to update a GAM model for new date, but for models fit via mgcv::bam(), and not mgcv::gam(). Despite this, the S3 method update appears to work with the mgcv::gam() object, perhaps because class(model) includes "gam" "glm" "lm", however, there is no mention of this in the documentation. For example:

# Update the model for `y_new`.
model_y_new_via_update <- update(model, data = data.frame(y = y_new))

# Extract the basis matrices for `model` and `model_y_new_via_update`.
basis_model <- mgcv::predict.gam(model, type = "lpmatrix")
basis_model_y_new_via_update <- mgcv::predict.gam(model_y_new_via_update, type = "lpmatrix")

# Check that both models used the same basis matrix.
all(basis_model == basis_model_y_new_via_update)

# TRUE

Also, there are still some differences in the coefficients I cannot explain.

# Fit the model using the extracted basis matrix.
model_y_new_via_basis <- mgcv::gam(y_new ~ basis_model - 1)

# Eyeballing the coefficients.
round(data.frame(
    via_update = coef(model_y_new_via_update),
    via_basis = coef(model_y_new_via_basis),
    difference = coef(model_y_new_via_update) - coef(model_y_new_via_basis),
    row.names = names(coef(model))
), 4)

#             via_update via_basis difference
# (Intercept)     0.4420    0.4420     0.0000
# s(x).1         -0.2385   -0.2333    -0.0052
# s(x).2         -0.1901   -0.1689    -0.0212
# s(x).3         -0.0854   -0.1689     0.0835
# s(x).4          0.1315    0.1902    -0.0586
# s(x).5          0.2666    0.2821    -0.0155
# s(x).6          0.2907    0.2821     0.0085
# s(x).7          0.2855    0.2821     0.0033
# s(x).8          0.3119    0.2936     0.0183
# s(x).9          0.3917    0.4036    -0.0120

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

2025-01-19 13:46:46

下面是一个简短的示例

  1. 使用 x 创建您的 smoothCon 对象
sm = smoothCon(s(x, bs="cr"), data=data.frame(x))[[1]]
  1. 创建简单的函数来获取给定 y 和您的 smoothCon< 的 beta 系数/code> 对象
get_beta <- function(y,sm) {
  as.numeric(coef(lm(y~sm$X-1)))
}
  1. 创建简单函数来获取预测,给定 xysmoothCon 对象
get_pred <- function(x,y,sm) {
  PredictMat(sm, data.frame(x=x)) %*% get_beta(y, sm)
}
  1. 将原始 x,y 点绘制在红色和蓝色的新 x,y 点
plot(x,y, col="red")
points(x,y_new, col="blue")
  1. 添加行,仅使用新的 x 范围 (x_new)、旧的 (y) 和新的 (y_new) y 值以及 smoothCon 对象
lines(x_new, get_pred(x_new,y, sm), col="red")
lines(x_new, get_pred(x_new,y_new, sm), col="blue")

smoothCon_example

Here is a brief example

  1. Create your smoothCon object, using x
sm = smoothCon(s(x, bs="cr"), data=data.frame(x))[[1]]
  1. Create simple function to get the beta coefficients given y and your smoothCon object
get_beta <- function(y,sm) {
  as.numeric(coef(lm(y~sm$X-1)))
}
  1. Create simple function to get the predictions, given x, y, and and smoothCon object
get_pred <- function(x,y,sm) {
  PredictMat(sm, data.frame(x=x)) %*% get_beta(y, sm)
}
  1. Plot the original x,y points in red and the new x,y points in blue
plot(x,y, col="red")
points(x,y_new, col="blue")
  1. Add the lines, using only the new x range (x_new), the old (y) and new (y_new) y values, and the smoothCon object
lines(x_new, get_pred(x_new,y, sm), col="red")
lines(x_new, get_pred(x_new,y_new, sm), col="blue")

smoothCon_example

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文