R 来自一张表的两个回归

发布于 2025-01-15 21:24:25 字数 2506 浏览 2 评论 0原文

我试图通过划分数据将两条不同的回归线（公式为：salary = beta0 + beta1D3 + beta2spending + beta3*(spending*D3) + w）绘制成一个散点图分为两个子集，如以下代码所示：

salary = data$salary
spending = data$spending
D1 = data$North
D2 = data$South
D3 = data$West

subsetWest = subset(data, D3 == 1)
subsetRest = subset(data, D3 == 0)

abab = lm(salary ~ 1 + spending + 1*spending, data=subsetWest) #red line
caca = lm(salary ~ 0 + spending + 0*spending, data=subsetRest) #blue line


plot(spending,salary)

points(subsetWest$spending, subsetWest$salary, pch=25, col = "red")
points(subsetRest$spending, subsetRest$salary, pch=10, col = "blue")

abline(abab, col = "red")
abline(caca, col = "blue")

这是我的数据表的示例：

在此处输入图像描述

这是是运行代码时得到的图：

[在此处输入图像描述][2] [2]：https://i.sstatic.net/It8ai.png

我的问题是我的第二个回归的截距是错误的，事实上我与第一次回归不同，在查看摘要时甚至没有得到截距。

有人知道我的问题出在哪里吗？或者有人知道绘制两条回归线的替代方法吗？

非常感谢您的帮助。非常感谢！

这是整个表：

structure(list(salary = c(39166L, 40526L, 40650L, 53600L, 58940L, 
53220L, 61356L, 54340L, 51706L, 49000L, 48548L, 54340L, 60336L, 
53050L, 54720L, 43380L, 43948L, 41632L, 36190L, 41878L, 45288L, 
49248L, 54372L, 67980L, 46764L, 41254L, 45590L, 43140L, 44160L, 
44500L, 41880L, 43600L, 45868L, 36886L, 39076L, 40920L, 42838L, 
50320L, 44964L, 41938L, 54448L, 51784L, 45288L, 49280L, 44682L, 
51220L, 52030L, 51576L, 58264L, 51690L), spending = c(6692L, 
6228L, 7108L, 9284L, 9338L, 9776L, 11420L, 11072L, 8336L, 7094L, 
6318L, 7242L, 7564L, 8494L, 7964L, 7136L, 6310L, 6118L, 5934L, 
6570L, 7828L, 9034L, 8698L, 10040L, 7188L, 5642L, 6732L, 5840L, 
5960L, 7462L, 5706L, 5066L, 5458L, 4610L, 5284L, 6248L, 5504L, 
6858L, 7894L, 5018L, 10880L, 8084L, 6804L, 5658L, 4594L, 5864L, 
7410L, 8246L, 7216L, 7532L), North = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), South = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), West = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-50L))

原文

I am trying to plot two different regression lines (with the formula: salary = beta0 + beta1D3 + beta2spending + beta3*(spending*D3) + w) into one scatter plot by deviding the data I have into two subsets as seen in the following code:

salary = data$salary
spending = data$spending
D1 = data$North
D2 = data$South
D3 = data$West

subsetWest = subset(data, D3 == 1)
subsetRest = subset(data, D3 == 0)

abab = lm(salary ~ 1 + spending + 1*spending, data=subsetWest) #red line
caca = lm(salary ~ 0 + spending + 0*spending, data=subsetRest) #blue line


plot(spending,salary)

points(subsetWest$spending, subsetWest$salary, pch=25, col = "red")
points(subsetRest$spending, subsetRest$salary, pch=10, col = "blue")

abline(abab, col = "red")
abline(caca, col = "blue")

This is a sample of my data table:

enter image description here

And this is the plot I get when running the code:

[enter image description here][2] [2]: https://i.sstatic.net/It8ai.png

My problem is that the intercept for my second regression is wrong, in fact I do not even get an intercept when looking at the summary, unlike with the first regression.

Does anybody see where my problem is or does anybody know an alternative way of plotting the two regression lines?

Help would be much appreciated. Thank you very much!

This is the whole table:

structure(list(salary = c(39166L, 40526L, 40650L, 53600L, 58940L, 
53220L, 61356L, 54340L, 51706L, 49000L, 48548L, 54340L, 60336L, 
53050L, 54720L, 43380L, 43948L, 41632L, 36190L, 41878L, 45288L, 
49248L, 54372L, 67980L, 46764L, 41254L, 45590L, 43140L, 44160L, 
44500L, 41880L, 43600L, 45868L, 36886L, 39076L, 40920L, 42838L, 
50320L, 44964L, 41938L, 54448L, 51784L, 45288L, 49280L, 44682L, 
51220L, 52030L, 51576L, 58264L, 51690L), spending = c(6692L, 
6228L, 7108L, 9284L, 9338L, 9776L, 11420L, 11072L, 8336L, 7094L, 
6318L, 7242L, 7564L, 8494L, 7964L, 7136L, 6310L, 6118L, 5934L, 
6570L, 7828L, 9034L, 8698L, 10040L, 7188L, 5642L, 6732L, 5840L, 
5960L, 7462L, 5706L, 5066L, 5458L, 4610L, 5284L, 6248L, 5504L, 
6858L, 7894L, 5018L, 10880L, 8084L, 6804L, 5658L, 4594L, 5864L, 
7410L, 8246L, 7216L, 7532L), North = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), South = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), West = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-50L))

分享到QQ

分享到微博