如何访问和重用“R”中“mgcv”包中的平滑？

发布于 2025-01-12 13:46:46 字数 5585 浏览 5 评论 0原文

我正在查看 R 中的 mgcv 包，我想知道如何根据新数据更新模型。例如，假设我有以下数据，我对拟合三次回归样条感兴趣。

# Load library.
library(mgcv)

# Set seed.
set.seed(2022)

# Data
x <- seq(10, 100, by = 10)
y <- sort(runif(10))

我可以使用 mgcv::s() 函数来拟合模型来转换我的预测变量，其中 bs = "cr" 代表三次回归样条 em> 如中所示文档（即 ?mgcv::s）。

# Fit.
model <- mgcv::gam(y ~ s(x, bs = "cr"))

# Print model.
model

# Family: gaussian 
# Link function: identity 
#
# Formula:
# y ~ s(x, bs = "cr")
#
# Estimated degrees of freedom:
# 7.51  total = 8.51 
# 
# GCV score: 0.001123237

我假设 mgcv::s() 用于确定样条基函数的内结？如果我想对x的整个范围进行插值，看来我可以使用predict 功能。

# Prepare range of `x` for interpolation.
x_new <- 10:100

# Interpolate.
mgcv_interpolation <- predict(model, type = "link", newdata = data.frame(x = x_new))

# Plot.
plot(x, y, pch = 19)
lines(x_new, mgcv_interpolation, lwd = 2, col = "red")

我不清楚的是当新数据（即y）时如何更新模型例如，假设我的新数据看起来像这样。

# Set seed.
set.seed(2022)
y_new <- sort(sample(y, size = length(y), replace = TRUE))

我的理解是我可以简单地使用之前创建的基础矩阵，但我不知道如何使用 mgcv 做到这一点。例如，这就是我可以做的它手动使用 B 样条基础。

# ...
# Suppose that based on some cross-validation procedure `df = 6` is selected.
df <- 6

# Create B-Spline basis functions.
basis <- splines::bs(x, df = df, degree = 3, intercept = TRUE)

# Estimate spline coefficients.
coefficients <- lm.fit(basis, y)$coef

# Compute fitted values.
fitted <- basis %*% coefficients

# Create extended basis for `x_new`.
basis_x_new <- splines::bs(x_new, df = df, degree = 3, intercept = TRUE)

# Interpolate.
bs_interpolation <- basis_x_new %*% coefficients

# Add to previous plot.
lines(x_new, bs_interpolation, lwd = 2, col = "blue")

# Update model based on `y_new`.
coefficients_y_new <- lm.fit(basis, y_new)$coef

# Add points and lines to the previous plot.
points(x, y_new, pch = 19, col = "orange")
lines(x_new, basis_x_new %*% coefficients_y_new, lwd = 2, col = "orange")

我想我的问题是如何找到 mgcv::s() 创建并在随后的 mgcv::gam 调用中重用它吗？或者，还有没有更多 mgcv-这样做的惯用方法是什么？

编辑 1.

经过更多研究，我发现我可以使用 mgcv::predict.gam() 和参数 type = "lpmatrix" 来提取基础矩阵。但是，我仍然无法复制 mgcv::gam() 提供的精确系数。差异并不大，但我想知道它们来自哪里。例如：

# Extract the basis matrix from the `gam` object.
basis_gam <- mgcv::predict.gam(model, type = "lpmatrix")

# Fit the model using the basis matrix.
model_basis_gam <- mgcv::gam(y ~ basis_gam - 1)

# Compare the coefficients.
round(data.frame(
    difference = coef(model) - coef(model_basis_gam)
), 4)

#              difference
# (Intercept)      0.0000
# s(x).1           0.0004
# s(x).2          -0.0033
# s(x).3           0.0071
# s(x).4          -0.0020
# s(x).5          -0.0016
# s(x).6          -0.0054
# s(x).7           0.0103
# s(x).8          -0.0064
# s(x).9           0.0017

编辑2。

似乎有一个函数 mgcv::bam.update() 可以更新新日期的 GAM 模型，但对于通过 mgcv::bam 拟合的模型()，而不是 mgcv::gam()。尽管如此，S3 方法 update 似乎可以与 mgcv::gam() 对象一起使用，这可能是因为 class(model) 包括 "gam" "glm" "lm"，但是文档中没有提及这一点。例如：

# Update the model for `y_new`.
model_y_new_via_update <- update(model, data = data.frame(y = y_new))

# Extract the basis matrices for `model` and `model_y_new_via_update`.
basis_model <- mgcv::predict.gam(model, type = "lpmatrix")
basis_model_y_new_via_update <- mgcv::predict.gam(model_y_new_via_update, type = "lpmatrix")

# Check that both models used the same basis matrix.
all(basis_model == basis_model_y_new_via_update)

# TRUE

另外，系数中仍然存在一些我无法解释的差异。

# Fit the model using the extracted basis matrix.
model_y_new_via_basis <- mgcv::gam(y_new ~ basis_model - 1)

# Eyeballing the coefficients.
round(data.frame(
    via_update = coef(model_y_new_via_update),
    via_basis = coef(model_y_new_via_basis),
    difference = coef(model_y_new_via_update) - coef(model_y_new_via_basis),
    row.names = names(coef(model))
), 4)

#             via_update via_basis difference
# (Intercept)     0.4420    0.4420     0.0000
# s(x).1         -0.2385   -0.2333    -0.0052
# s(x).2         -0.1901   -0.1689    -0.0212
# s(x).3         -0.0854   -0.1689     0.0835
# s(x).4          0.1315    0.1902    -0.0586
# s(x).5          0.2666    0.2821    -0.0155
# s(x).6          0.2907    0.2821     0.0085
# s(x).7          0.2855    0.2821     0.0033
# s(x).8          0.3119    0.2936     0.0183
# s(x).9          0.3917    0.4036    -0.0120

原文

I am checking out the mgcv package in R and I would like to know
how to update a model based on new data. For example, suppose I have the
following data and I am interested in fitting a cubic regression spline.

# Load library.
library(mgcv)

# Set seed.
set.seed(2022)

# Data
x <- seq(10, 100, by = 10)
y <- sort(runif(10))

I can fit a model using the mgcv::s() function for transforming my predictor variable, where bs = "cr" stands for cubic regression spline as indicated in the
documentation (i.e., ?mgcv::s).

# Fit.
model <- mgcv::gam(y ~ s(x, bs = "cr"))

# Print model.
model

# Family: gaussian 
# Link function: identity 
#
# Formula:
# y ~ s(x, bs = "cr")
#
# Estimated degrees of freedom:
# 7.51  total = 8.51 
# 
# GCV score: 0.001123237

I assume mgcv::s() is used used to
determine the inner knots for the spline basis functions? If I want to
interpolate the entire range of x, it seems that I can use the predict
function.

# Prepare range of `x` for interpolation.
x_new <- 10:100

# Interpolate.
mgcv_interpolation <- predict(model, type = "link", newdata = data.frame(x = x_new))

# Plot.
plot(x, y, pch = 19)
lines(x_new, mgcv_interpolation, lwd = 2, col = "red")

What is not clear to me is how to update the model when new data (i.e., y)
comes in. For instance, suppose my new data looks something like this.

# Set seed.
set.seed(2022)
y_new <- sort(sample(y, size = length(y), replace = TRUE))

My understanding is that I could simply use the previously created basis matrix,
but I am not sure how to do that with mgcv. For example, this is how I can do
it manually using a B-Spline basis.

# ...
# Suppose that based on some cross-validation procedure `df = 6` is selected.
df <- 6

# Create B-Spline basis functions.
basis <- splines::bs(x, df = df, degree = 3, intercept = TRUE)

# Estimate spline coefficients.
coefficients <- lm.fit(basis, y)$coef

# Compute fitted values.
fitted <- basis %*% coefficients

# Create extended basis for `x_new`.
basis_x_new <- splines::bs(x_new, df = df, degree = 3, intercept = TRUE)

# Interpolate.
bs_interpolation <- basis_x_new %*% coefficients

# Add to previous plot.
lines(x_new, bs_interpolation, lwd = 2, col = "blue")

# Update model based on `y_new`.
coefficients_y_new <- lm.fit(basis, y_new)$coef

# Add points and lines to the previous plot.
points(x, y_new, pch = 19, col = "orange")
lines(x_new, basis_x_new %*% coefficients_y_new, lwd = 2, col = "orange")

I guess my question is how to find whatever mgcv::s() creates and
reuse it in subsequent calls of mgcv::gam? Or, is there a more
mgcv-idiomatic way of doing this?

Edit 1.

Poking around more, I discovered that I can extract the basis matrix using mgcv::predict.gam() with the argument type = "lpmatrix". However, I am still not able to replicate the exact coefficients provided by mgcv::gam(). The differences are not large, but I wonder where they are coming from. For example:

# Extract the basis matrix from the `gam` object.
basis_gam <- mgcv::predict.gam(model, type = "lpmatrix")

# Fit the model using the basis matrix.
model_basis_gam <- mgcv::gam(y ~ basis_gam - 1)

# Compare the coefficients.
round(data.frame(
    difference = coef(model) - coef(model_basis_gam)
), 4)

#              difference
# (Intercept)      0.0000
# s(x).1           0.0004
# s(x).2          -0.0033
# s(x).3           0.0071
# s(x).4          -0.0020
# s(x).5          -0.0016
# s(x).6          -0.0054
# s(x).7           0.0103
# s(x).8          -0.0064
# s(x).9           0.0017

Edit 2.

It seems that that there is a function mgcv::bam.update() to update a GAM model for new date, but for models fit via mgcv::bam(), and not mgcv::gam(). Despite this, the S3 method update appears to work with the mgcv::gam() object, perhaps because class(model) includes "gam" "glm" "lm", however, there is no mention of this in the documentation. For example:

# Update the model for `y_new`.
model_y_new_via_update <- update(model, data = data.frame(y = y_new))

# Extract the basis matrices for `model` and `model_y_new_via_update`.
basis_model <- mgcv::predict.gam(model, type = "lpmatrix")
basis_model_y_new_via_update <- mgcv::predict.gam(model_y_new_via_update, type = "lpmatrix")

# Check that both models used the same basis matrix.
all(basis_model == basis_model_y_new_via_update)

# TRUE

Also, there are still some differences in the coefficients I cannot explain.

# Fit the model using the extracted basis matrix.
model_y_new_via_basis <- mgcv::gam(y_new ~ basis_model - 1)

# Eyeballing the coefficients.
round(data.frame(
    via_update = coef(model_y_new_via_update),
    via_basis = coef(model_y_new_via_basis),
    difference = coef(model_y_new_via_update) - coef(model_y_new_via_basis),
    row.names = names(coef(model))
), 4)

#             via_update via_basis difference
# (Intercept)     0.4420    0.4420     0.0000
# s(x).1         -0.2385   -0.2333    -0.0052
# s(x).2         -0.1901   -0.1689    -0.0212
# s(x).3         -0.0854   -0.1689     0.0835
# s(x).4          0.1315    0.1902    -0.0586
# s(x).5          0.2666    0.2821    -0.0155
# s(x).6          0.2907    0.2821     0.0085
# s(x).7          0.2855    0.2821     0.0033
# s(x).8          0.3119    0.2936     0.0183
# s(x).9          0.3917    0.4036    -0.0120

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

星 2025-01-19 13:46:46

下面是一个简短的示例

使用 x 创建您的 smoothCon 对象

sm = smoothCon(s(x, bs="cr"), data=data.frame(x))[[1]]

创建简单的函数来获取给定 y 和您的 smoothCon< 的 beta 系数/code> 对象

get_beta <- function(y,sm) {
  as.numeric(coef(lm(y~sm$X-1)))
}

创建简单函数来获取预测，给定 x、y 和 smoothCon 对象

get_pred <- function(x,y,sm) {
  PredictMat(sm, data.frame(x=x)) %*% get_beta(y, sm)
}

将原始 x,y 点绘制在红色和蓝色的新 x,y 点

plot(x,y, col="red")
points(x,y_new, col="blue")

添加行，仅使用新的 x 范围 (x_new)、旧的 (y) 和新的 (y_new) y 值以及 smoothCon 对象

lines(x_new, get_pred(x_new,y, sm), col="red")
lines(x_new, get_pred(x_new,y_new, sm), col="blue")

Here is a brief example

Create your smoothCon object, using x

sm = smoothCon(s(x, bs="cr"), data=data.frame(x))[[1]]

Create simple function to get the beta coefficients given y and your smoothCon object

get_beta <- function(y,sm) {
  as.numeric(coef(lm(y~sm$X-1)))
}

Create simple function to get the predictions, given x, y, and and smoothCon object

get_pred <- function(x,y,sm) {
  PredictMat(sm, data.frame(x=x)) %*% get_beta(y, sm)
}

Plot the original x,y points in red and the new x,y points in blue

plot(x,y, col="red")
points(x,y_new, col="blue")

Add the lines, using only the new x range (x_new), the old (y) and new (y_new) y values, and the smoothCon object

lines(x_new, get_pred(x_new,y, sm), col="red")
lines(x_new, get_pred(x_new,y_new, sm), col="blue")

回复收藏 0 原文

~没有更多了~

关于作者

维持三分热

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

如何访问和重用“R”中“mgcv”包中的平滑？

编辑 1.

编辑2。

Edit 1.

Edit 2.

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如何访问和重用“R”中“mgcv”包中的平滑？

编辑 1.

编辑2。

Edit 1.

Edit 2.

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。