当前位置：文江博客话题详情

r ggplot2 tidy pairwise-distance

计算R中所有变量对之间的差异

发布于 2025-02-07 19:08:38 字数 645 浏览 1 评论 0 原文

我有一个带有4列的数据框。

set.seed(123)
df <- data.frame(A = round(rnorm(1000, mean = 1)),
           B = rpois(1000, lambda = 3),
           C = round(rnorm(1000, mean = -1)),
           D = round(rnorm(1000, mean = 0)))

我想计算我数据帧的每一行的列（AB，AC，AD，BC，BD，CD）的每种可能组合的差异。这相当于每种组合做 df $ a -df $ b 。

我们可以使用 dist（）函数在我的数据集中时有效地计算此功能吗？然后，我想将DIST对象转换为 data.frame ，以 ggplot2 绘制结果。 除非有一个很好的整洁做上述版本。

非常感谢

我最接近的是在下面做，但是我不确定列名所指的内容。

d <- apply(as.matrix(df), 1, function(e) as.vector(dist(e)))
t(d)

原文

I have a dataframe with 4 columns.

set.seed(123)
df <- data.frame(A = round(rnorm(1000, mean = 1)),
           B = rpois(1000, lambda = 3),
           C = round(rnorm(1000, mean = -1)),
           D = round(rnorm(1000, mean = 0)))

I would like to compute the differences for every possible combination of my columns (A-B, A-C, A-D, B-C, B-D, C-D) at every row of my dataframe.
This would be the equivalent of doing df$A - df$B for every combination.

Can we use the dist() function to compute this efficiently as I have a very large dataset? I would like to then convert the dist object into a data.frame to plot the results with ggplot2.
Unless there is a good tidy version of doing the above.

Many Thanks

The closest I got was doing the below, but I am not sure to what the column names refer to.

d <- apply(as.matrix(df), 1, function(e) as.vector(dist(e)))
t(d)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何处潇湘 2025-02-14 19:08:38

DIST 将将向量中的每个值与同一向量中的每个其他值进行比较，因此，如果您想比较列列逐行，这不是您想要的。

如果您只想计算所有列之间的差异，则可以执行：

df <- cbind(df, 
            do.call(cbind, lapply(asplit(combn(names(df), 2), 2), function(x) {
  setNames(data.frame(df[x[1]] - df[x[2]]), paste(x, collapse = ""))
})))

head(df)
#>   A B  C  D AB AC AD BC BD CD
#> 1 0 1 -2 -1 -1  2  1  3  2 -1
#> 2 1 1 -1  1  0  2  0  2  0 -2
#> 3 3 1 -2 -1  2  5  4  3  2 -1
#> 4 1 3  0 -1 -2  1  2  3  4  1
#> 5 1 3  0  1 -2  1  0  3  2 -1
#> 6 3 3  1  0  0  2  3  2  3  1

^{在2022-06-14上由 > Reprex软件包（v2.0.1）}

dist will compare every value in a vector to every other value in the same vector, so if you are looking to compare columns row-by-row, this is not what you are looking for.

If you just want to calculate the difference between all columns pairwise, you can do:

df <- cbind(df, 
            do.call(cbind, lapply(asplit(combn(names(df), 2), 2), function(x) {
  setNames(data.frame(df[x[1]] - df[x[2]]), paste(x, collapse = ""))
})))

head(df)
#>   A B  C  D AB AC AD BC BD CD
#> 1 0 1 -2 -1 -1  2  1  3  2 -1
#> 2 1 1 -1  1  0  2  0  2  0 -2
#> 3 3 1 -2 -1  2  5  4  3  2 -1
#> 4 1 3  0 -1 -2  1  2  3  4  1
#> 5 1 3  0  1 -2  1  0  3  2 -1
#> 6 3 3  1  0  0  2  3  2  3  1

^{Created on 2022-06-14 by the reprex package (v2.0.1)}

回复收藏 0 原文

风月客 2025-02-14 19:08:38

使用基本R：

df_dist <- t(apply(df, 1, dist))
colnames(df_dist) <- apply(combn(names(df), 2), 2, paste0, collapse = "_")

如果您真的想使用整齐的操作，则可以使用 c_across ，但这也可以删除名称，如果您的数据巨大，则要慢得多

Using base r:

df_dist <- t(apply(df, 1, dist))
colnames(df_dist) <- apply(combn(names(df), 2), 2, paste0, collapse = "_")

If you really want to use a tidy-approach, you could go with c_across, but this also removes the names, and is much slower if your data is huge

回复收藏 0 原文

~没有更多了~

关于作者

铜锣湾横着走

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

计算R中所有变量对之间的差异

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

计算R中所有变量对之间的差异

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。