避免在带有BS()项的模型公式中编写大量列名称

发布于 2025-02-13 10:00:35 字数 478 浏览 0 评论 0原文

我想在拟合logistic回归模型时在数据集中使用bs函数。

df <- data.frame(a = c(0,1), b = c(0,1), d = c(0,1), e = c(0,1),
                  f= c("m","f"), output = c(0,1))
 
library(splines) 
model <- glm(output~ bs(a, df=2)+ bs(b, df=2)+ bs(d, df=2)+ bs(e, df=2)+
                      factor(f) ,
                      data = df, 
                      family = "binomial") 

在我的实际数据集中,我需要将bs()应用于比此示例更多的列。有没有办法在不编写所有术语的情况下做到这一点?

I want to use bs function for numerical variables in my dataset when fitting a logistic regression model.

df <- data.frame(a = c(0,1), b = c(0,1), d = c(0,1), e = c(0,1),
                  f= c("m","f"), output = c(0,1))
 
library(splines) 
model <- glm(output~ bs(a, df=2)+ bs(b, df=2)+ bs(d, df=2)+ bs(e, df=2)+
                      factor(f) ,
                      data = df, 
                      family = "binomial") 

In my actual dataset, I need to apply bs() to way more columns than this example. Is there a way I can do this without writing all the terms?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

怀中猫帐中妖 2025-02-20 10:00:35

我们可以使用sprintf的某些字符串操作,以及Remalulate

predictors <- c("a", "b", "d", "e")
bspl.terms <- sprintf("bs(%s, df = 2)", predictors)
other.terms <- "factor(f)"
form <- reformulate(c(bspl.terms, other.terms), response = "output")
#output ~ bs(a, df = 2) + bs(b, df = 2) + bs(d, df = 2) + bs(e, 
#    df = 2) + factor(f)

如果要使用其他df级别 code 对于每个样条,它也很简单(请注意,df不能小于geger)。

predictors <- c("a", "b", "d", "e")
dof <- c(3, 4, 3, 6)
degree <- c(2, 2, 2, 3)
bspl.terms <- sprintf("bs(%s, df = %d, degree = %d)", predictors, dof, degree)
other.terms <- "factor(f)"
form <- reformulate(c(bspl.terms, other.terms), response = "output")
#output ~ bs(a, df = 3, degree = 2) + bs(b, df = 4, degree = 2) + 
#    bs(d, df = 3, degree = 2) + bs(e, df = 6, degree = 3) + factor(f)

教授。 Ben Bolker:我要去一些更奇特的东西,例如predictors&lt; - setDiff(names(df)[sapply(df,is.numeric)],“输出”)。。 P>

是的。这对安全有好处。当然,如果OP希望将所有数值变量包括在“输出”作为预测因素之外,则是一种自动方式。

We can use some string manipulation with sprintf, together with reformulate:

predictors <- c("a", "b", "d", "e")
bspl.terms <- sprintf("bs(%s, df = 2)", predictors)
other.terms <- "factor(f)"
form <- reformulate(c(bspl.terms, other.terms), response = "output")
#output ~ bs(a, df = 2) + bs(b, df = 2) + bs(d, df = 2) + bs(e, 
#    df = 2) + factor(f)

If you want to use a different df and degree for each spline, it is also straightforward (note that df can not be smaller than degree).

predictors <- c("a", "b", "d", "e")
dof <- c(3, 4, 3, 6)
degree <- c(2, 2, 2, 3)
bspl.terms <- sprintf("bs(%s, df = %d, degree = %d)", predictors, dof, degree)
other.terms <- "factor(f)"
form <- reformulate(c(bspl.terms, other.terms), response = "output")
#output ~ bs(a, df = 3, degree = 2) + bs(b, df = 4, degree = 2) + 
#    bs(d, df = 3, degree = 2) + bs(e, df = 6, degree = 3) + factor(f)

Prof. Ben Bolker: I was going to something a little bit fancier, something like predictors <- setdiff(names(df)[sapply(df, is.numeric)], "output").

Yes. This is good for safety. And of course, an automatic way if OP wants to include all numerical variables other than "output" as predictors.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文