R 中的线性回归(正态数据和对数数据)

发布于 2024-11-14 07:51:02 字数 666 浏览 9 评论 0原文

我想在 R 中对正态图和双对数图中的数据进行线性回归。

对于正常数据,数据集可能如下:

lin <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2))
plot (lin$x, lin$y)

我想计算仅对数据点 2、3 和 4 的线性回归绘制一条线。

对于双对数数据数据集可能如下:

data = data.frame(
    x=c(1:15),
    y=c(
        1.000, 0.742, 0.623, 0.550, 0.500, 0.462, 0.433,
        0.051, 0.043, 0.037, 0.032, 0.028, 0.025, 0.022, 0.020
      )
    )
plot (data$x, data$y, log="xy")

这里我想绘制数据集 1:7 和 8:15 的回归线。

我可以计算斜率y偏移以及拟合参数(R^2p值)?

对于正态数据和对数数据如何完成?

谢谢你的帮助,

斯文

I want to carry out a linear regression in R for data in a normal and in a double logarithmic plot.

For normal data the dataset might be the follwing:

lin <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2))
plot (lin$x, lin$y)

There I want to calculate draw a line for the linear regression only of the datapoints 2, 3 and 4.

For double logarithmic data the dataset might be the following:

data = data.frame(
    x=c(1:15),
    y=c(
        1.000, 0.742, 0.623, 0.550, 0.500, 0.462, 0.433,
        0.051, 0.043, 0.037, 0.032, 0.028, 0.025, 0.022, 0.020
      )
    )
plot (data$x, data$y, log="xy")

Here I want to draw the regression line for the datasets 1:7 and for 8:15.

Ho can I calculate the slope and the y-offset als well as parameters for the fit (R^2, p-value)?

How is it done for normal and for logarithmic data?

Thanks for you help,

Sven

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

泅人 2024-11-21 07:51:02

在 R 中,线性最小二乘模型通过 lm() 函数进行拟合。使用公式接口,我们可以使用 subset 参数来选择用于拟合实际模型的数据点,例如:

lin <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2))
linm <- lm(y ~ x, data = lin, subset = 2:4)

给出:

R> linm

Call:
lm(formula = y ~ x, data = lin, subset = 2:4)

Coefficients:
(Intercept)            x  
     -1.633        1.500  

R> fitted(linm)
         2          3          4 
-0.1333333  1.3666667  2.8666667

至于双对数,我猜你有两种选择; i) 按照我们上面的方法估计两个单独的模型,或者 ii) 通过 ANCOVA 估计。对数转换是使用 log() 在公式中完成的。

通过两个单独的模型:

logm1 <- lm(log(y) ~ log(x), data = dat, subset = 1:7)
logm2 <- lm(log(y) ~ log(x), data = dat, subset = 8:15)

或者通过 ANCOVA,我们需要一个指示变量

dat <- transform(dat, ind = factor(1:15 <= 7))
logm3 <- lm(log(y) ~ log(x) * ind, data = dat)

您可能会问这两种方法是否等效?确实如此,我们可以通过模型系数来证明这一点。

R> coef(logm1)
  (Intercept)        log(x) 
-0.0001487042 -0.4305802355 
R> coef(logm2)
(Intercept)      log(x) 
  0.1428293  -1.4966954

因此,单独模型的两个斜率分别为 -0.4306 和 -1.4967。 ANCOVA 模型的系数为:

R> coef(logm3)
   (Intercept)         log(x)        indTRUE log(x):indTRUE 
     0.1428293     -1.4966954     -0.1429780      1.0661152

我们如何协调两者?嗯,我设置 ind 的方式,对 logm3 进行了参数化,以给出从 logm2 估计的更直接的值; logm2logm3 的截距相同,log(x) 的系数也相同。获得相当于系数的值
对于 logm1,我们需要进行操作,首先是截距:

R> coefs[1] + coefs[3]
  (Intercept) 
-0.0001487042

其中 indTRUE 的系数是第 1 组平均值与第 2 组平均值的差值。对于斜率:与

R> coefs[2] + coefs[4]
    log(x) 
-0.4305802

我们获得的 logm1 相同,并且基于组 2 (coefs[2]) 的斜率,并根据斜率差异进行修改第1组(coefs[4])。

至于绘图,一个简单的方法是通过 abline() 来绘制简单模型。例如,对于普通数据示例:

plot(y ~ x, data = lin)
abline(linm)

对于日志数据,我们可能需要更有创意,这里的一般解决方案是在数据范围内进行预测并绘制预测结果:

pdat <- with(dat, data.frame(x = seq(from = head(x, 1), to = tail(x,1), 
                                     by = 0.1))
pdat <- transform(pdat, yhat = c(predict(logm1, pdat[1:70,, drop = FALSE]), 
                                 predict(logm2, pdat[71:141,, drop = FALSE])))

可以通过对 < code>yhat

plot(y ~ x, data = dat)
lines(exp(yhat) ~ x, dat = pdat, subset = 1:70, col = "red")
lines(exp(yhat) ~ x, dat = pdat, subset = 71:141, col = "blue")

或对数刻度:

plot(log(y) ~ log(x), data = dat)
lines(yhat ~ log(x), dat = pdat, subset = 1:70, col = "red")
lines(yhat ~ log(x), dat = pdat, subset = 71:141, col = "blue")

例如...

这个通用解决方案也适用于更复杂的 ANCOVA 模型。在这里,我像以前一样创建一个新的 pdat 并添加一个指标。

pdat <- with(dat, data.frame(x = seq(from = head(x, 1), to = tail(x,1), 
                                     by = 0.1)[1:140],
                             ind = factor(rep(c(TRUE, FALSE), each = 70))))
pdat <- transform(pdat, yhat = predict(logm3, pdat))

请注意,由于使用了 ANCOVA 来拟合 logm3<,因此我们如何从对 predict() 的一次调用中获得我们想要的所有预测。 /代码>。我们现在可以像以前一样绘制:

plot(y ~ x, data = dat)
lines(exp(yhat) ~ x, dat = pdat, subset = 1:70, col = "red")
lines(exp(yhat) ~ x, dat = pdat, subset = 71:141, col = "blue")

In R, linear least squares models are fitted via the lm() function. Using the formula interface we can use the subset argument to select the data points used to fit the actual model, for example:

lin <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2))
linm <- lm(y ~ x, data = lin, subset = 2:4)

giving:

R> linm

Call:
lm(formula = y ~ x, data = lin, subset = 2:4)

Coefficients:
(Intercept)            x  
     -1.633        1.500  

R> fitted(linm)
         2          3          4 
-0.1333333  1.3666667  2.8666667

As for the double log, you have two choices I guess; i) estimate two separate models as we did above, or ii) estimate via ANCOVA. The log transformation is done in the formula using log().

Via two separate models:

logm1 <- lm(log(y) ~ log(x), data = dat, subset = 1:7)
logm2 <- lm(log(y) ~ log(x), data = dat, subset = 8:15)

Or via ANCOVA, where we need an indicator variable

dat <- transform(dat, ind = factor(1:15 <= 7))
logm3 <- lm(log(y) ~ log(x) * ind, data = dat)

You might ask if these two approaches are equivalent? Well they are and we can show this via the model coefficients.

R> coef(logm1)
  (Intercept)        log(x) 
-0.0001487042 -0.4305802355 
R> coef(logm2)
(Intercept)      log(x) 
  0.1428293  -1.4966954

So the two slopes are -0.4306 and -1.4967 for the separate models. The coefficients for the ANCOVA model are:

R> coef(logm3)
   (Intercept)         log(x)        indTRUE log(x):indTRUE 
     0.1428293     -1.4966954     -0.1429780      1.0661152

How do we reconcile the two? Well the way I set up ind, logm3 is parametrised to give more directly values estimated from logm2; the intercepts of logm2 and logm3 are the same, as are the coefficients for log(x). To get the values equivalent to the coefficients
of logm1, we need to do a manipulation, first for the intercept:

R> coefs[1] + coefs[3]
  (Intercept) 
-0.0001487042

where the coefficient for indTRUE is the difference in the mean of group 1 over the mean of group 2. And for the slope:

R> coefs[2] + coefs[4]
    log(x) 
-0.4305802

which is the same as we got for logm1 and is based on the slope for group 2 (coefs[2]) modified by the difference in slope for group 1 (coefs[4]).

As for plotting, an easy way is via abline() for simple models. E.g. for the normal data example:

plot(y ~ x, data = lin)
abline(linm)

For the log data we might need to be a bit more creative, and the general solution here is to predict over the range of data and plot the predictions:

pdat <- with(dat, data.frame(x = seq(from = head(x, 1), to = tail(x,1), 
                                     by = 0.1))
pdat <- transform(pdat, yhat = c(predict(logm1, pdat[1:70,, drop = FALSE]), 
                                 predict(logm2, pdat[71:141,, drop = FALSE])))

Which can plot on the original scale, by exponentiating yhat

plot(y ~ x, data = dat)
lines(exp(yhat) ~ x, dat = pdat, subset = 1:70, col = "red")
lines(exp(yhat) ~ x, dat = pdat, subset = 71:141, col = "blue")

or on the log scale:

plot(log(y) ~ log(x), data = dat)
lines(yhat ~ log(x), dat = pdat, subset = 1:70, col = "red")
lines(yhat ~ log(x), dat = pdat, subset = 71:141, col = "blue")

For example...

This general solution works well for the more complex ANCOVA model too. Here I create a new pdat as before and add in an indicator

pdat <- with(dat, data.frame(x = seq(from = head(x, 1), to = tail(x,1), 
                                     by = 0.1)[1:140],
                             ind = factor(rep(c(TRUE, FALSE), each = 70))))
pdat <- transform(pdat, yhat = predict(logm3, pdat))

Notice how we get all the predictions we want from the single call to predict() because of the use of ANCOVA to fit logm3. We can now plot as before:

plot(y ~ x, data = dat)
lines(exp(yhat) ~ x, dat = pdat, subset = 1:70, col = "red")
lines(exp(yhat) ~ x, dat = pdat, subset = 71:141, col = "blue")
素罗衫 2024-11-21 07:51:02
#Split the data into two groups
data1 <- data[1:7, ]
data2 <- data[8:15, ]

#Perform the regression
model1 <- lm(log(y) ~ log(x), data1)
model2 <- lm(log(y) ~ log(x), data2)
summary(model1)
summary(model2)

#Plot it
with(data, plot(x, y, log="xy"))
lines(1:7, exp(predict(model1, data.frame(x = 1:7))))
lines(8:15, exp(predict(model2, data.frame(x = 8:15))))

一般来说,将数据分成不同的组并在不同的子集上运行不同的模型是不寻常的,而且可能是不好的形式。您可能需要考虑添加分组变量

data$group <- factor(rep(letters[1:2], times = 7:8))

并在整个数据集上运行某种模型,例如,

model_all <- lm(log(y) ~ log(x) * group, data)
summary(model_all)
#Split the data into two groups
data1 <- data[1:7, ]
data2 <- data[8:15, ]

#Perform the regression
model1 <- lm(log(y) ~ log(x), data1)
model2 <- lm(log(y) ~ log(x), data2)
summary(model1)
summary(model2)

#Plot it
with(data, plot(x, y, log="xy"))
lines(1:7, exp(predict(model1, data.frame(x = 1:7))))
lines(8:15, exp(predict(model2, data.frame(x = 8:15))))

In general, splitting the data into different groups and running different models on different subsets is unusual, and probably bad form. You may want to consider adding a grouping variable

data$group <- factor(rep(letters[1:2], times = 7:8))

and running some sort of model on the whole dataset, e.g.,

model_all <- lm(log(y) ~ log(x) * group, data)
summary(model_all)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文