如何使用不均匀的间隔标准化间隔变量的单位更改?

发布于 2025-01-25 07:31:23 字数 1492 浏览 3 评论 0原文

我正在R中构建OLS模型,并且遇到了一个方法论问题。该研究的主要独立变量是“城镇规模”,该变量已编码(在代码书中)为:

  • 1.- 2,000以下
  • 2.- 2,000-5,000
data$G_TOWNSIZE[data$G_TOWNSIZE == 1] <- "Under 2,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 2] <- "2,000-5,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 3] <- "5,000-10,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 4] <- "10,000-20,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 5] <- "20,000-50,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 6] <- "50,000-100,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 7] <- "100,000-500,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 8] <- "500,000 and more"

(此数据来自世界价值调查 - 浪潮7) 现在 - 我知道这确实是一个半分类变量。实际上,直到今天,上面的代码才作为我们回归的一部分。我们一直依靠比例1:8来测试线性关系。是的 - 对不起。我们现在知道这是错误的(哈哈)。

我们想研究政治参与是人口密度的函数的程度。我们所有的因变量都是分类变量。在以下模型中,Q221R是自我报告的投票趋势,我们已重新编码为:

data$Q221R[data$Q221 == 1] <- 3
data$Q221R[data$Q221 == 2] <- 2
data$Q221R[data$Q221 == 3] <- 1
    1. 从不
    1. 通常
    1. 总是
model6 <- lm(Q221R ~ G_TOWNSIZE + Q262 + Q260 + Q240FR + Q275 + Q288R, data=GER)

基于我们的文献综述,我们希望会有线性关系。的确,即使我们对G_Townsize的使用方式变得多么混乱,我们在测试该子集(德国)时也会观察到相关性。但是在城镇规模上观察到的单位变化显然是任意的。

有没有办法重新编码或重量重量的城镇规模,以便生活在“ 1”城镇与生活在“ 2”城镇中的变化实际上是有道理的?数据模型不包括人口密度的其他变量,它超出了我们项目的范围,可以将每个受访者的大地数据与各自城镇的大地数据匹配,以便找出实际人口。假设我们在智力上有能力进行数学 - 我保证我们可以做到。我们只是统计的新手。非常感谢。

I am constructing OLS models in R and I have run into a methodological issue. The main independent variable for the study is "town size," which is coded (in the codebook) as:

  • 1.- Under 2,000
  • 2.- 2,000 - 5,000
  • etc
data$G_TOWNSIZE[data$G_TOWNSIZE == 1] <- "Under 2,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 2] <- "2,000-5,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 3] <- "5,000-10,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 4] <- "10,000-20,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 5] <- "20,000-50,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 6] <- "50,000-100,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 7] <- "100,000-500,000"
data$G_TOWNSIZE[data$G_TOWNSIZE == 8] <- "500,000 and more"

(This data is from the World Value Survey- Wave 7)
Now- I understand that this is really a semi-categorical variable. In fact, the above code was not included as part of our regression until today. We had been relying only on the scale 1:8 to test for linear relationship relationship. Yes- sorry. We now know this is wrong (haha).

We want to examine the degree to which political participation is a function of population density. All our dependent variables are ordered categorical variables. In the following model Q221R is self-reported voting tendency that we have re-coded as:

data$Q221R[data$Q221 == 1] <- 3
data$Q221R[data$Q221 == 2] <- 2
data$Q221R[data$Q221 == 3] <- 1
    1. Never
    1. Usually
    1. Always
model6 <- lm(Q221R ~ G_TOWNSIZE + Q262 + Q260 + Q240FR + Q275 + Q288R, data=GER)

Based on our literature review we expect there to be a linear relationship. Indeed, even with how messed up our usage of G_TOWNSIZE is, we do observe a correlation when testing on this subset (Germany). But the unit change observed in town size is obviously arbitrary.

Is there a way to re-code or re-weight town size so that the change between living in a "1" town and living in a "2" town actually makes sense? The data model includes no other variable for population density and it is beyond the scope of our project to match the geodetic data of each respondent to their respective towns in order to find out the actual population. Assume we are intellectually capable of doing the math- I promise we can. We are just new to statistics. Thank you very much.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

巴黎夜雨 2025-02-01 07:31:23

我在此处搁置回归模型的适当性,但是我认为最统计的方法是将g_townSize转换为因子变量,然后将其包括在内。 as_factoras.factor功能将执行此操作。

I'm setting aside the appropriateness of the regression model here, but I think the most statistically robust approach would be to convert G_TOWNSIZE into a factor variable, and then include it as a set of dummies. The as_factor or as.factor functions will do this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文