MatchIt 问题 - 如何使用 Mahalanobis Dsitance 访问匹配单位之间的距离

发布于 2025-01-09 14:51:53 字数 944 浏览 3 评论 0原文

是否可以使用 MatchIt::matchit() 函数获取匹配单元之间的距离?

这是一个可重现的示例。当我使用 distance = "glm" 时可以看到距离,但使用 distance = "mahalanobis" 时则看不到距离。

如果您有其他套餐的推荐,我也很乐意尝试。我只想与另一个单位匹配,而不是计算 ATT。谢谢你!

# Run nearest neighbor with "mahalanobis" distance
res_matchitmahalanobis <- matchit(
  data = df_example,
  formula = treat ~ age + male,
  method = "nearest", 
  distance = "mahalanobis",
  exact = ~ male,
  replace = TRUE 
)

# Note: No `distance` column
get_matches(res_matchitmahalanobis)

# Note: `distance` element is missing
res_matchitmahalanobis$distance


# Run nearest neighbor with "glm" distance
res_glm <- matchit(
  data = df_example,
  formula = treat ~ age + male,
  method = "nearest", 
  distance = "glm",
  exact = ~ male,
  replace = TRUE 
)

# Note: There is now a `distance` column
get_matches(res_glm)

# Note: `distance` element is now present
res_glm$distance

Is it possible to get the distances between matched units using the MatchIt::matchit() function?

Here is a reproducible example. I can see the distances when I use distance = "glm" but not with distance = "mahalanobis".

If you have a recommendation for a different package I am also happy to try that. I am only looking to match to another unit and not, for example, to calculate an ATT. Thank you!

# Run nearest neighbor with "mahalanobis" distance
res_matchitmahalanobis <- matchit(
  data = df_example,
  formula = treat ~ age + male,
  method = "nearest", 
  distance = "mahalanobis",
  exact = ~ male,
  replace = TRUE 
)

# Note: No `distance` column
get_matches(res_matchitmahalanobis)

# Note: `distance` element is missing
res_matchitmahalanobis$distance


# Run nearest neighbor with "glm" distance
res_glm <- matchit(
  data = df_example,
  formula = treat ~ age + male,
  method = "nearest", 
  distance = "glm",
  exact = ~ male,
  replace = TRUE 
)

# Note: There is now a `distance` column
get_matches(res_glm)

# Note: `distance` element is now present
res_glm$distance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

葬﹪忆之殇 2025-01-16 14:51:53

如果您使用马哈拉诺比斯,他们似乎不会给您距离。不过,他们使用该指标来计算结果。

如果您想使用 Mahalanobis,可以将其与另一个指标(如“glm”)一起使用。或者,您可以单独收集距离。

我使用 glm 和 Mahalanobis 距离运行 matchit 函数。然后我分别收集了马氏距离。 (真的,我想看看距离是 Malahanobis 还是 glm...但正如预期的那样,它们是 glm。)

要收集 Mahalanobis 距离(即使有因子且无需额外工作),您可以使用包 assertr< /code> 和函数 maha_dist。基本 R 函数要求您手动将因子转换为值。

library(MatchIt)
library(tidyverse)
library(assertr)

data("lalonde")

m.out2 <- matchit(treat ~ age + educ + race, data = lalonde,
                  distance = "glm", method = "nearest",
                  exact = ~educ, replace = T,
                  mahvars = ~age + educ + race)
summary(m.out2)

la2 <- lalonde %>% select(age, educ, race)
head(la2) # as expected
# collect distances
vals <- maha_dist(la2, robust = T) # robust uses covariance matrix

# visualize it
plot(density(vals, bw = .5),
     main = "Mahal Sq Distances")
qqplot(qchisq(ppoints(100), df = 3), vals,
       main = "QQ Plot Mahal Sq Distances")
abline(0, 1, "gray")
# definately outside of the 'normal'

It looks like they don't give you the distances if you use Mahalanobis. They calculate the results using that metric, though.

If you'd like to use Mahalanobis, you can use it along with another metric (like 'glm'). Alternatively, you can collect the distances separately.

I ran the matchit function with both the glm and Mahalanobis distances. Then I collected the Mahalonbis distances separately. (Really, I wanted to see if the distances were Malahanobis or glm...but as expected, they were glm.)

To collect the Mahalanobis distances (even with factors and no extra work) you can use the package assertr and the function maha_dist. The base R function requires you to manually convert factors to values.

library(MatchIt)
library(tidyverse)
library(assertr)

data("lalonde")

m.out2 <- matchit(treat ~ age + educ + race, data = lalonde,
                  distance = "glm", method = "nearest",
                  exact = ~educ, replace = T,
                  mahvars = ~age + educ + race)
summary(m.out2)

la2 <- lalonde %>% select(age, educ, race)
head(la2) # as expected
# collect distances
vals <- maha_dist(la2, robust = T) # robust uses covariance matrix

# visualize it
plot(density(vals, bw = .5),
     main = "Mahal Sq Distances")
qqplot(qchisq(ppoints(100), df = 3), vals,
       main = "QQ Plot Mahal Sq Distances")
abline(0, 1, "gray")
# definately outside of the 'normal'
抚笙 2025-01-16 14:51:53

正如@Kat 指出的,matchit() 不返回该值。将其放在 distance 列中是不合适的;请参阅此处为什么。 matchit 对象中的 distance 输出是用词不当;它指的是倾向得分,每个单位有一个distance值。这就是为什么它显示为 distance = "glm";您正在估计倾向得分,然后使用该得分来计算单位之间的距离。 matchit() 中的任何方法都不会实际返回两个配对单元之间的距离。

提取这些信息需要花费大量的工作。 matchit() 不提供匹配中使用的马哈拉诺比斯距离矩阵(因为这对于大数据集来说太大了!)。但是,您可以在 matchit() 之外计算距离矩阵,将其提供给 distance 参数,然后通过在执行完操作后从矩阵中提取这些距离来访问单位之间的距离配对。您可以使用optmatch::match_on() 等方法计算马氏距离,但不能保证它与matchit() 内部使用的马氏距离相同。具体操作方法如下:

data("lalonde", package = "MatchIt")

#Create distance matrix
dist <- optmatch::match_on(treat ~ age + educ + race, data = lalonde,
                           method = "mahalanobis")

#Do matching on distance matrix
m <- MatchIt::matchit(treat ~ age + educ + race, data = lalonde,
                      distance = dist, exact = ~married,
                      replace = TRUE)

#Extract matched pairs
mm <- m$match.matrix

#Create data frame of pairs and distance
d <- data.frame(treated = rownames(mm), control = mm[,1],
                distance = dist[cbind(rownames(mm), mm[,1])])
head(d)
#>      treated control  distance
#> NSW1    NSW1 PSID368 0.3100525
#> NSW2    NSW2 PSID341 0.2067017
#> NSW3    NSW3  PSID99 0.2067017
#> NSW4    NSW4 PSID189 0.3900789
#> NSW5    NSW5 PSID400 0.4134033
#> NSW6    NSW6 PSID253 0.1033508

dist["NSW1", "PSID368"]
#> [1] 0.3100525

reprex 包于 2022 年 2 月 24 日创建 (v2 .0.1)

这也适用于 replace = FALSE,但在 k:1 匹配或完全匹配时需要更多工作。尽管您没有使用 matchit() 的马哈拉诺比斯距离进行匹配,但上面输出中生成的距离确实与用于配对的距离相对应。

As @Kat pointed out, matchit() does not return this value. It would be inappropriate to have this in the distance column; see here for why. The distance output in the matchit object is a misnomer; it refers to the propensity score, and each unit has one distance value. This is why it shows up with distance = "glm"; you are estimating a propensity score, which is then used to compute the distance between units. No methods in matchit() will actually return the distance between two paired units.

It would take a fair bit of work to extract this information. matchit() does not provide the Mahalanobis distance matrix used in the matching (because this would be way too big for big datasets!). However, you can compute a distance matrix outside matchit(), supply it to the distance argument, and then access the distance between units by extracting those distances from the matrix after doing the pairing. You can compute the Mahalanobis distance using, e.g., optmatch::match_on(), though it is not guaranteed to be identical to the Mahalanobis distance matchit() uses internally. Here is how you would do this:

data("lalonde", package = "MatchIt")

#Create distance matrix
dist <- optmatch::match_on(treat ~ age + educ + race, data = lalonde,
                           method = "mahalanobis")

#Do matching on distance matrix
m <- MatchIt::matchit(treat ~ age + educ + race, data = lalonde,
                      distance = dist, exact = ~married,
                      replace = TRUE)

#Extract matched pairs
mm <- m$match.matrix

#Create data frame of pairs and distance
d <- data.frame(treated = rownames(mm), control = mm[,1],
                distance = dist[cbind(rownames(mm), mm[,1])])
head(d)
#>      treated control  distance
#> NSW1    NSW1 PSID368 0.3100525
#> NSW2    NSW2 PSID341 0.2067017
#> NSW3    NSW3  PSID99 0.2067017
#> NSW4    NSW4 PSID189 0.3900789
#> NSW5    NSW5 PSID400 0.4134033
#> NSW6    NSW6 PSID253 0.1033508

dist["NSW1", "PSID368"]
#> [1] 0.3100525

Created on 2022-02-24 by the reprex package (v2.0.1)

This works with replace = FALSE as well but would take a bit more work when k:1 matching or full matching. Although you are not matching using matchit()'s Mahalanobis distance, the distances produced in the output above do correspond to the distances used to pair.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文