基于单个变量或单个数据集上的多个变量的匹配观测值-STATA

发布于 2025-01-18 16:52:42 字数 760 浏览 3 评论 0原文

我需要根据索引变量匹配观察结果,该索引变量可以测量家庭条件,个人变量,例如年龄,性别,教育等。我的家庭索引变量是数值(从0到103),个人特征是假人或分类变量。对于我的分析,我需要根据这些变量匹配最相似的观察结果。它是最近的邻居匹配,但没有对照组或治疗组。

数据集看起来像这样。

indice_hogar anio mes directorio orden mujer nivel__educativo_cat trabaja
0 2018 08 4700731 1 1 4 1
0 2018 08 4700731 2 0 5 1
0 2018 11 4777752 1 0 5 1
37 2018 04 4605803 1 0 3 1
42 2011 07 2735691 1 1 4 1
42 2018 02 4545459 1 0 3 1
43 2018 12 4803694 1 0 5 1
44 2018 10 4747974 1 0 5 1
46 2018 05 4610096 1 0 3 1
47 2018 04 4598828 1 1 1 0
47 2018 08 4687722 1 0 1 0
48 2018 04 4592941 1 0 5 0
48 2018 06 4636177 1 0 3 1
50 2018 06 4645892 1 0 1 1
50 2018 06 4645892 2 1 4 1

为了更好地理解,我正在使用IV,该IV是根据指数和个人特征最相似的人的能力。这意味着我需要找到最相似的观察结果,例如,A a,然后能够掌握比赛的能力并将其用于回归。

我无法创建代码。

I need to match observations based on an index variable that measures home conditions, personal variables such as age, gender, education, etc. and year. My home index variable is numerical (from 0 to 103) and the personal characteristics are either dummies or categorical variables. For my analysis I need to match the most similar observations based on these variables. It is sort of a nearest neighbor match but without having a control or treatment group.

The dataset looks something like this.

indice_hogar anio mes directorio orden mujer nivel__educativo_cat trabaja
0 2018 08 4700731 1 1 4 1
0 2018 08 4700731 2 0 5 1
0 2018 11 4777752 1 0 5 1
37 2018 04 4605803 1 0 3 1
42 2011 07 2735691 1 1 4 1
42 2018 02 4545459 1 0 3 1
43 2018 12 4803694 1 0 5 1
44 2018 10 4747974 1 0 5 1
46 2018 05 4610096 1 0 3 1
47 2018 04 4598828 1 1 1 0
47 2018 08 4687722 1 0 1 0
48 2018 04 4592941 1 0 5 0
48 2018 06 4636177 1 0 3 1
50 2018 06 4645892 1 0 1 1
50 2018 06 4645892 2 1 4 1

For better understanding, I am using an IV that is the ability of the most similar person according to the index and to personal characteristics. That means I need to find the most similar observation to, for example, person A and then be able to take the match's abilities and use it for a regression.

I have not been able to create a code.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

把人绕傻吧 2025-01-25 16:52:42

复制数据集,并使用NNMatch匹配第二副本。

* Duplicate the data set
gen byte treat = 1
gen nobs = _N
save temp, replace
replace treat = 0
append using temp

* Make a fake outcome variable to keep nnmatch happy
gen byte outcome = runiform()<.5

* nnmatch performs a nearest neighbor match, return the id of the matched cases as nnid
teffects nnmatch (outcome indice_hogar nivel_educativo_cat trabaja) (treat), gen(nnid)

* Unduplicate the data set
keep if treat == 0

* change nnid to point to the 1st copy of the data set, not the 2nd
replace nnid = nnid - nobs

Duplicate your dataset, and match the 1st copy to the 2nd using nnmatch.

* Duplicate the data set
gen byte treat = 1
gen nobs = _N
save temp, replace
replace treat = 0
append using temp

* Make a fake outcome variable to keep nnmatch happy
gen byte outcome = runiform()<.5

* nnmatch performs a nearest neighbor match, return the id of the matched cases as nnid
teffects nnmatch (outcome indice_hogar nivel_educativo_cat trabaja) (treat), gen(nnid)

* Unduplicate the data set
keep if treat == 0

* change nnid to point to the 1st copy of the data set, not the 2nd
replace nnid = nnid - nobs
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文