将 lm 应用到由帧的第三列定义的数据帧的子集

发布于 2024-12-04 18:36:14 字数 494 浏览 1 评论 0原文

我有一个数据框,其中包含 x 值向量、y 值向量和 ID 向量:

x <- rep(0:3, 3)
y <- runif(12)
ID <- c(rep("a", 4), rep("b", 4), rep("c", 4))
df <- data.frame(ID=ID, x=x, y=y)

我想为共享相同 ID 的 x 和 y 子集创建一个单独的 lm。以下代码可以完成工作:

a.lm <- lm(x~y, data=subset(df, ID=="a"))
b.lm <- lm(x~y, data=subset(df, ID=="b"))
c.lm <- lm(x~y, data=subset(df, ID=="c"))

除了这非常脆弱(未来的数据集可能有不同的 ID)并且未矢量化。我还想将所有流媒体存储在一个数据结构中。一定有一种优雅的方法可以做到这一点,但我找不到。有什么帮助吗?

I've got a data frame containing a vector of x values, a vector of y values, and a vector of IDs:

x <- rep(0:3, 3)
y <- runif(12)
ID <- c(rep("a", 4), rep("b", 4), rep("c", 4))
df <- data.frame(ID=ID, x=x, y=y)

I'd like to create a separate lm for the subset of x's and y's sharing the same ID. The following code gets the job done:

a.lm <- lm(x~y, data=subset(df, ID=="a"))
b.lm <- lm(x~y, data=subset(df, ID=="b"))
c.lm <- lm(x~y, data=subset(df, ID=="c"))

Except that this is very brittle (future data sets might have different IDs) and un-vectorized. I'd also like to store all the lms in a single data structure. There must be an elegant way to do this, but I can't find it. Any help?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

晚风撩人 2024-12-11 18:36:14

使用base函数,您可以split原始数据帧并对其使用lapply

lapply(split(df,df$ID),function(d) lm(x~y,d))
$a

Call:
lm(formula = x ~ y, data = d)

Coefficients:
(Intercept)            y  
    -0.2334       2.8813  


$b

Call:
lm(formula = x ~ y, data = d)

Coefficients:
(Intercept)            y  
     0.7558       1.8279  


$c

Call:
lm(formula = x ~ y, data = d)

Coefficients:
(Intercept)            y  
      3.451       -7.628  

Using base functions, you can split your original dataframe and use lapply on that:

lapply(split(df,df$ID),function(d) lm(x~y,d))
$a

Call:
lm(formula = x ~ y, data = d)

Coefficients:
(Intercept)            y  
    -0.2334       2.8813  


$b

Call:
lm(formula = x ~ y, data = d)

Coefficients:
(Intercept)            y  
     0.7558       1.8279  


$c

Call:
lm(formula = x ~ y, data = d)

Coefficients:
(Intercept)            y  
      3.451       -7.628  
空城之時有危險 2024-12-11 18:36:14

怎么样

library(nlme) ## OR library(lme4)
lmList(x~y|ID,data=d)

How about

library(nlme) ## OR library(lme4)
lmList(x~y|ID,data=d)

?

最好是你 2024-12-11 18:36:14

使用 plyr 包中的一些魔法。函数 dlply 接受一个 data.frame,将其拆分,对每个元素应用一个函数,然后将其组合到一个 list 中。这非常适合您的应用。

library(plyr)
#fitList <- dlply(df, .(ID), function(dat)lm(x~y, data=dat))
fitList <- dlply(df, .(ID), lm, formula=x~y) # Edit

这将创建一个列表,其中包含 ID 的每个子集的模型:

str(fitList, max.level=1)

List of 3
 $ a:List of 12
  ..- attr(*, "class")= chr "lm"
 $ b:List of 12
  ..- attr(*, "class")= chr "lm"
 $ c:List of 12
  ..- attr(*, "class")= chr "lm"
 - attr(*, "split_type")= chr "data.frame"
 - attr(*, "split_labels")='data.frame':    3 obs. of  1 variable:

这意味着您可以对列表进行子集化并使用它。例如,要获取 lm 模型的系数,其中 ID=="a"

> coef(fitList$a)
(Intercept)           y 
   3.071854   -3.440928 

Use some of the magic in the plyr package. The function dlply takes a data.frame, splits it, applies a function to each element, and combines it into a list. This is perfect for your application.

library(plyr)
#fitList <- dlply(df, .(ID), function(dat)lm(x~y, data=dat))
fitList <- dlply(df, .(ID), lm, formula=x~y) # Edit

This creates a list with a model for each subset of ID:

str(fitList, max.level=1)

List of 3
 $ a:List of 12
  ..- attr(*, "class")= chr "lm"
 $ b:List of 12
  ..- attr(*, "class")= chr "lm"
 $ c:List of 12
  ..- attr(*, "class")= chr "lm"
 - attr(*, "split_type")= chr "data.frame"
 - attr(*, "split_labels")='data.frame':    3 obs. of  1 variable:

This means you can subset the list and work with that. For example, to get the coefficients for your lm model where ID=="a":

> coef(fitList$a)
(Intercept)           y 
   3.071854   -3.440928 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文