给定一个包含 A 列的 R 数据框，如何创建两个包含 A 的所有有序组合的新列

发布于 2024-11-18 06:56:40 字数 416 浏览 7 评论 0原文

我有一个 data.frame，其中有一个 id 列（下面的 x）和许多变量（下面的 y1,y2）。

    x y1 y2
1   1 43 55
2   2 51 53
[...]

我想从中生成一个数据框，其中前两列涵盖 x 的每个有序组合（除非它们相等）以及与顺序相关的每个变量的列。数据帧标题和前两行看起来像这样（手动完成，请原谅错误）：

xi xj y1i y1j y2i y2j
 1  2  43  51  55  53
 2  1  51  43  53  55
[...]

因此，每一行将包含源和目标（i 和 j），然后包含每个源和目标处的 y1 值。

我正在慢慢学习 R 数据操作，但这一个难倒了我。感谢一行万能的答案，以及更具可读性的说教答案。

原文

I have a data.frame with one id column (x below), and a number of variables (y1,y2 below).

    x y1 y2
1   1 43 55
2   2 51 53
[...]

What I would like to generate from this is a dataframe where the first two columns cover every ordered combination of x (except where they are equal) along with columns for each variable related to the order. The data frame header and first two rows would look like this (did this by hand, excuse errors):

xi xj y1i y1j y2i y2j
 1  2  43  51  55  53
 2  1  51  43  53  55
[...]

So each row would container a source and destination (i and j) and then values for y1 at each source and destination.

I'm slowly learning R data manipulation, but this one is stumping me. Kudos for the one line does-it-all answer, as well as a more readable didactic answer.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谎言月老 2024-11-25 06:56:40

（可能除了顺序）

firstdf  <- data.frame(x  = c( 1, 2, 4, 5), 
                       y1 = c(43,51,57,49), y2 = c(55,53,47,44)) 
co       <- combn(firstdf$x,2)
seconddf <- data.frame(xi = c(co[1,], co[2,]), xj = c(co[2,], co[1,]))
thirddf  <- merge(merge(seconddf, firstdf, by.x = "xj", by.y = "x" ),
                  firstdf, by.x = "xi", by.y = "x", suffixes = c("j", "i") )

这可以产生

> thirddf
   xi xj y1j y2j y1i y2i
1   1  2  51  53  43  55
2   1  5  49  44  43  55
3   1  4  57  47  43  55
4   2  4  57  47  51  53
5   2  1  43  55  51  53
6   2  5  49  44  51  53
7   4  5  49  44  57  47
8   4  1  43  55  57  47
9   4  2  51  53  57  47
10  5  1  43  55  49  44
11  5  2  51  53  49  44
12  5  4  57  47  49  44

第一行和第五行与您的示例匹配的位置

。如果您将 firstdf 视为给定并坚持一行，那么您可以将其变成

merge(merge(data.frame(xi = c(combn(firstdf$x,2)[1,], combn(firstdf$x,2)[2,]), xj = c(combn(firstdf$x,2)[2,], combn(firstdf$x,2)[1,])), firstdf, by.x = "xj", by.y = "x" ), firstdf, by.x = "xi", by.y = "x", suffixes = c("j", "i") )

但我真的不明白这一点

This works (apart perhaps from order)

firstdf  <- data.frame(x  = c( 1, 2, 4, 5), 
                       y1 = c(43,51,57,49), y2 = c(55,53,47,44)) 
co       <- combn(firstdf$x,2)
seconddf <- data.frame(xi = c(co[1,], co[2,]), xj = c(co[2,], co[1,]))
thirddf  <- merge(merge(seconddf, firstdf, by.x = "xj", by.y = "x" ),
                  firstdf, by.x = "xi", by.y = "x", suffixes = c("j", "i") )

to produce

> thirddf
   xi xj y1j y2j y1i y2i
1   1  2  51  53  43  55
2   1  5  49  44  43  55
3   1  4  57  47  43  55
4   2  4  57  47  51  53
5   2  1  43  55  51  53
6   2  5  49  44  51  53
7   4  5  49  44  57  47
8   4  1  43  55  57  47
9   4  2  51  53  57  47
10  5  1  43  55  49  44
11  5  2  51  53  49  44
12  5  4  57  47  49  44

where the first and fifth rows match your example.

If you take firstdf as given and insist on one line then you can turn this into

merge(merge(data.frame(xi = c(combn(firstdf$x,2)[1,], combn(firstdf$x,2)[2,]), xj = c(combn(firstdf$x,2)[2,], combn(firstdf$x,2)[1,])), firstdf, by.x = "xj", by.y = "x" ), firstdf, by.x = "xi", by.y = "x", suffixes = c("j", "i") )

but I don't really see the point

回复收藏 0 原文

紫罗兰の梦幻 2024-11-25 06:56:40

两行是我能做的最好的事情，并且仍然保持合理：（编辑：参见单行答案的底部。）

创建一些数据：

n <- 4
a <- cbind(x=LETTERS[1:n], y=letters[1:n])
a

     x   y  
[1,] "A" "a"
[2,] "B" "b"
[3,] "C" "c"
[4,] "D" "d"

代码：

f <- function(x, i){cbind(i, x[i[,1],], x[i[,2],])}
f(a, t(combn(seq_len(nrow(a)), 2)))

结果：

             x   y   x   y  
[1,] "1" "2" "A" "a" "B" "b"
[2,] "1" "3" "A" "a" "C" "c"
[3,] "1" "4" "A" "a" "D" "d"
[4,] "2" "3" "B" "b" "C" "c"
[5,] "2" "4" "B" "b" "D" "d"
[6,] "3" "4" "C" "c" "D" "d"

编辑

可以通过使用匿名函数将其变成单行

(function(x, i=t(combn(seq_len(nrow(a)), 2))){cbind(i, x[i[,1],], x[i[,2],])})(a)

             x   y   x   y  
[1,] "1" "2" "A" "a" "B" "b"
[2,] "1" "3" "A" "a" "C" "c"
[3,] "1" "4" "A" "a" "D" "d"
[4,] "2" "3" "B" "b" "C" "c"
[5,] "2" "4" "B" "b" "D" "d"
[6,] "3" "4" "C" "c" "D" "d"

Two lines is the best I can do and still keep it sensible: (Edit: see bottom of answer for one-liner.)

Create some data:

n <- 4
a <- cbind(x=LETTERS[1:n], y=letters[1:n])
a

     x   y  
[1,] "A" "a"
[2,] "B" "b"
[3,] "C" "c"
[4,] "D" "d"

The code:

f <- function(x, i){cbind(i, x[i[,1],], x[i[,2],])}
f(a, t(combn(seq_len(nrow(a)), 2)))

The results:

             x   y   x   y  
[1,] "1" "2" "A" "a" "B" "b"
[2,] "1" "3" "A" "a" "C" "c"
[3,] "1" "4" "A" "a" "D" "d"
[4,] "2" "3" "B" "b" "C" "c"
[5,] "2" "4" "B" "b" "D" "d"
[6,] "3" "4" "C" "c" "D" "d"

EDIT

This can be turned into a one-liner by making use of anonymous functions:

(function(x, i=t(combn(seq_len(nrow(a)), 2))){cbind(i, x[i[,1],], x[i[,2],])})(a)

             x   y   x   y  
[1,] "1" "2" "A" "a" "B" "b"
[2,] "1" "3" "A" "a" "C" "c"
[3,] "1" "4" "A" "a" "D" "d"
[4,] "2" "3" "B" "b" "C" "c"
[5,] "2" "4" "B" "b" "D" "d"
[6,] "3" "4" "C" "c" "D" "d"

回复收藏 0 原文

毁梦 2024-11-25 06:56:40

我不确定您总体上到底想要什么，但据我了解，这可能接近您想要的：

> library(combinat) # for permn
> library(plyr) # for llply
> 
> # sample data
> d <- data.frame(x = 1:3, y1 = rnorm(3), y2 = rnorm(3))
> d
  x          y1         y2
1 1 -0.17525893 -1.1660321
2 2 -0.05585689 -0.2059244
3 3  0.90500983 -1.3067601
> 
> # permutation of rows
> idx <- permn(nrow(d))
> idx
[[1]]
[1] 1 2 3

... snip ...

[[6]]
[1] 2 1 3

> 
> # a list of perm-ed data.frame
> d2 <- llply(idx, function(i)data.frame(idx = 1:nrow(d), d[i,]))
> d2
[[1]]
  idx x          y1         y2
1   1 1 -0.17525893 -1.1660321
2   2 2 -0.05585689 -0.2059244
3   3 3  0.90500983 -1.3067601

... snip ...

[[6]]
  idx x          y1         y2
2   1 2 -0.05585689 -0.2059244
1   2 1 -0.17525893 -1.1660321
3   3 3  0.90500983 -1.3067601

> 
> # merge htam
> d3 <- subset(Reduce(function(df1, df2) merge(df1, df2, by="idx"), d2), select = -c(idx))
> d3
  x.x        y1.x       y2.x x.y        y1.y       y2.y x.x.1      y1.x.1     y2.x.1 x.y.1      y1.y.1     y2.y.1 x.x.2      y1.x.2     y2.x.2 x.y.2
1   1 -0.17525893 -1.1660321   1 -0.17525893 -1.1660321     3  0.90500983 -1.3067601     3  0.90500983 -1.3067601     2 -0.05585689 -0.2059244     2
2   2 -0.05585689 -0.2059244   3  0.90500983 -1.3067601     1 -0.17525893 -1.1660321     2 -0.05585689 -0.2059244     3  0.90500983 -1.3067601     1
3   3  0.90500983 -1.3067601   2 -0.05585689 -0.2059244     2 -0.05585689 -0.2059244     1 -0.17525893 -1.1660321     1 -0.17525893 -1.1660321     3
       y1.y.2     y2.y.2
1 -0.05585689 -0.2059244
2 -0.17525893 -1.1660321
3  0.90500983 -1.3067601
> 
> # and here is the one-liner version
> subset(Reduce(function(df1, df2) merge(df1, df2, by="idx"), llply(permn(nrow(d)), function(i)data.frame(idx=1:nrow(d), d[i,]))), select=-c(idx))
  x.x        y1.x       y2.x x.y        y1.y       y2.y x.x.1      y1.x.1     y2.x.1 x.y.1      y1.y.1     y2.y.1 x.x.2      y1.x.2     y2.x.2 x.y.2
1   1 -0.17525893 -1.1660321   1 -0.17525893 -1.1660321     3  0.90500983 -1.3067601     3  0.90500983 -1.3067601     2 -0.05585689 -0.2059244     2
2   2 -0.05585689 -0.2059244   3  0.90500983 -1.3067601     1 -0.17525893 -1.1660321     2 -0.05585689 -0.2059244     3  0.90500983 -1.3067601     1
3   3  0.90500983 -1.3067601   2 -0.05585689 -0.2059244     2 -0.05585689 -0.2059244     1 -0.17525893 -1.1660321     1 -0.17525893 -1.1660321     3
       y1.y.2     y2.y.2
1 -0.05585689 -0.2059244
2 -0.17525893 -1.1660321
3  0.90500983 -1.3067601

如果您提供更详细的信息，可能您可以获得更好的答案。

I'm not sure what you exactly want in general, but as far as my understanding, this may be close to what you want:

> library(combinat) # for permn
> library(plyr) # for llply
> 
> # sample data
> d <- data.frame(x = 1:3, y1 = rnorm(3), y2 = rnorm(3))
> d
  x          y1         y2
1 1 -0.17525893 -1.1660321
2 2 -0.05585689 -0.2059244
3 3  0.90500983 -1.3067601
> 
> # permutation of rows
> idx <- permn(nrow(d))
> idx
[[1]]
[1] 1 2 3

... snip ...

[[6]]
[1] 2 1 3

> 
> # a list of perm-ed data.frame
> d2 <- llply(idx, function(i)data.frame(idx = 1:nrow(d), d[i,]))
> d2
[[1]]
  idx x          y1         y2
1   1 1 -0.17525893 -1.1660321
2   2 2 -0.05585689 -0.2059244
3   3 3  0.90500983 -1.3067601

... snip ...

[[6]]
  idx x          y1         y2
2   1 2 -0.05585689 -0.2059244
1   2 1 -0.17525893 -1.1660321
3   3 3  0.90500983 -1.3067601

> 
> # merge htam
> d3 <- subset(Reduce(function(df1, df2) merge(df1, df2, by="idx"), d2), select = -c(idx))
> d3
  x.x        y1.x       y2.x x.y        y1.y       y2.y x.x.1      y1.x.1     y2.x.1 x.y.1      y1.y.1     y2.y.1 x.x.2      y1.x.2     y2.x.2 x.y.2
1   1 -0.17525893 -1.1660321   1 -0.17525893 -1.1660321     3  0.90500983 -1.3067601     3  0.90500983 -1.3067601     2 -0.05585689 -0.2059244     2
2   2 -0.05585689 -0.2059244   3  0.90500983 -1.3067601     1 -0.17525893 -1.1660321     2 -0.05585689 -0.2059244     3  0.90500983 -1.3067601     1
3   3  0.90500983 -1.3067601   2 -0.05585689 -0.2059244     2 -0.05585689 -0.2059244     1 -0.17525893 -1.1660321     1 -0.17525893 -1.1660321     3
       y1.y.2     y2.y.2
1 -0.05585689 -0.2059244
2 -0.17525893 -1.1660321
3  0.90500983 -1.3067601
> 
> # and here is the one-liner version
> subset(Reduce(function(df1, df2) merge(df1, df2, by="idx"), llply(permn(nrow(d)), function(i)data.frame(idx=1:nrow(d), d[i,]))), select=-c(idx))
  x.x        y1.x       y2.x x.y        y1.y       y2.y x.x.1      y1.x.1     y2.x.1 x.y.1      y1.y.1     y2.y.1 x.x.2      y1.x.2     y2.x.2 x.y.2
1   1 -0.17525893 -1.1660321   1 -0.17525893 -1.1660321     3  0.90500983 -1.3067601     3  0.90500983 -1.3067601     2 -0.05585689 -0.2059244     2
2   2 -0.05585689 -0.2059244   3  0.90500983 -1.3067601     1 -0.17525893 -1.1660321     2 -0.05585689 -0.2059244     3  0.90500983 -1.3067601     1
3   3  0.90500983 -1.3067601   2 -0.05585689 -0.2059244     2 -0.05585689 -0.2059244     1 -0.17525893 -1.1660321     1 -0.17525893 -1.1660321     3
       y1.y.2     y2.y.2
1 -0.05585689 -0.2059244
2 -0.17525893 -1.1660321
3  0.90500983 -1.3067601

If you provide information in more detail, probably you can get better answers.

回复收藏 0 原文

铃予 2024-11-25 06:56:40

好吧，它与单行相去甚远（我有点怀疑这是可能的），但这里有一种“天真的”方法：

dat <- data.frame(x=1:5,y1=6:10,y2=11:15)

#Collect all ordered pairs of elements of x
tmp <- expand.grid(dat$x,dat$x)
tmp <- tmp[tmp[,1] != tmp[,2],]

#Init a matrix to hold the results
rs <- as.matrix(cbind(tmp,matrix(NA,nrow(tmp),4)))

#Loop through each ordered pair
for (i in 1:nrow(rs)){
    rs[i,3:6] <- c(dat$y1[rs[i,1:2]],dat$y2[rs[i,1:2]])
}

我没有命名这些列，但事后很容易完成。

不是很优雅，但也许可以帮助你开始......

Well, it's nowhere close to a one-liner (which I kind of doubt is possible) but here's a 'naive' approach:

dat <- data.frame(x=1:5,y1=6:10,y2=11:15)

#Collect all ordered pairs of elements of x
tmp <- expand.grid(dat$x,dat$x)
tmp <- tmp[tmp[,1] != tmp[,2],]

#Init a matrix to hold the results
rs <- as.matrix(cbind(tmp,matrix(NA,nrow(tmp),4)))

#Loop through each ordered pair
for (i in 1:nrow(rs)){
    rs[i,3:6] <- c(dat$y1[rs[i,1:2]],dat$y2[rs[i,1:2]])
}

I didn't name the columns, but that's easily done after the fact.

Not very elegant, but maybe something to get you started...

回复收藏 0 原文

~没有更多了~