笛卡尔积数据框

发布于 2024-10-04 21:41:52 字数 742 浏览 0 评论 0原文

我有三个或更多表示为 R 向量的自变量,如下所示:

A <- c(1,2,3)
B <- factor(c('x','y'))
C <- c(0.1,0.5)

我想获取所有这些变量的笛卡尔积并将结果放入数据框中,如下所示:

A B C
1 x 0.1
1 x 0.5
1 y 0.1
1 y 0.5
2 x 0.1
2 x 0.5
2 y 0.1
2 y 0.5
3 x 0.1
3 x 0.5
3 y 0.1
3 y 0.5

我可以通过手动写出对 < 的调用来完成此操作code>rep:

d <- data.frame(A = rep(A, times=length(B)*length(C)),
                B = rep(B, times=length(A), each=length(C)),
                C = rep(C, each=length(A)*length(B))

但是一定有更优雅的方法来做到这一点,是吗? itertools 中的product 完成了部分工作,但我找不到任何方法来吸收迭代器的输出并将其放入数据框中。有什么建议吗?

ps 此计算的下一步看起来像

d$D <- f(d$A, d$B, d$C)

这样,如果您知道同时执行这两个步骤的方法,那也会很有帮助。

I have three or more independent variables represented as R vectors, like so:

A <- c(1,2,3)
B <- factor(c('x','y'))
C <- c(0.1,0.5)

and I want to take the Cartesian product of all of them and put the result into a data frame, like this:

A B C
1 x 0.1
1 x 0.5
1 y 0.1
1 y 0.5
2 x 0.1
2 x 0.5
2 y 0.1
2 y 0.5
3 x 0.1
3 x 0.5
3 y 0.1
3 y 0.5

I can do this by manually writing out calls to rep:

d <- data.frame(A = rep(A, times=length(B)*length(C)),
                B = rep(B, times=length(A), each=length(C)),
                C = rep(C, each=length(A)*length(B))

but there must be a more elegant way to do it, yes? product in itertools does part of the job, but I can't find any way to absorb the output of an iterator and put it into a data frame. Any suggestions?

p.s. The next step in this calculation looks like

d$D <- f(d$A, d$B, d$C)

so if you know a way to do both steps at once, that would also be helpful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

无人接听 2024-10-11 21:42:02

我永远记不起那个标准函数expand.grid。所以这是另一个版本。

crossproduct <- function(...,FUN='data.frame') {
  args <- list(...)
  n1 <- names(args)
  n2 <- sapply(match.call()[1+1:length(args)], as.character)
  nn <- if (is.null(n1)) n2 else ifelse(n1!='',n1,n2)
  dims <- sapply(args,length)
  dimtot <- prod(dims)
  reps <- rev(cumprod(c(1,rev(dims))))[-1]
  cols <- lapply(1:length(dims), function(j)
                 args[[j]][1+((1:dimtot-1) %/% reps[j]) %% dims[j]])
  names(cols) <- nn
  do.call(match.fun(FUN),cols)
}

A <- c(1,2,3)
B <- factor(c('x','y'))
C <- c(.1,.5)

crossproduct(A,B,C)

crossproduct(A,B,C, FUN=function(...) paste(...,sep='_'))

I can never remember that standard function expand.grid. So here's another version.

crossproduct <- function(...,FUN='data.frame') {
  args <- list(...)
  n1 <- names(args)
  n2 <- sapply(match.call()[1+1:length(args)], as.character)
  nn <- if (is.null(n1)) n2 else ifelse(n1!='',n1,n2)
  dims <- sapply(args,length)
  dimtot <- prod(dims)
  reps <- rev(cumprod(c(1,rev(dims))))[-1]
  cols <- lapply(1:length(dims), function(j)
                 args[[j]][1+((1:dimtot-1) %/% reps[j]) %% dims[j]])
  names(cols) <- nn
  do.call(match.fun(FUN),cols)
}

A <- c(1,2,3)
B <- factor(c('x','y'))
C <- c(.1,.5)

crossproduct(A,B,C)

crossproduct(A,B,C, FUN=function(...) paste(...,sep='_'))
脱离于你 2024-10-11 21:41:59

这里有一种方法可以同时实现这两种功能,使用 Ramnath 的 expand.grid 建议:

f <- function(x,y,z) paste(x,y,z,sep="+")
d <- expand.grid(x=A, y=B, z=C)
d$D <- do.call(f, d)

请注意,do.call “按原样”在 d 上工作,因为data.frame 是一个列表。但 do.call 期望 d 的列名称与 f 的参数名称匹配。

Here's a way to do both, using Ramnath's suggestion of expand.grid:

f <- function(x,y,z) paste(x,y,z,sep="+")
d <- expand.grid(x=A, y=B, z=C)
d$D <- do.call(f, d)

Note that do.call works on d "as-is" because a data.frame is a list. But do.call expects the column names of d to match the argument names of f.

最单纯的乌龟 2024-10-11 21:41:59

sqldf 中使用交叉联接:

library(sqldf)

A <- data.frame(c1 = c(1,2,3))
B <- data.frame(c2 = factor(c('x','y')))
C <- data.frame(c3 = c(0.1,0.5))

result <- sqldf('SELECT * FROM (A CROSS JOIN B) CROSS JOIN C') 

Using cross join in sqldf:

library(sqldf)

A <- data.frame(c1 = c(1,2,3))
B <- data.frame(c2 = factor(c('x','y')))
C <- data.frame(c3 = c(0.1,0.5))

result <- sqldf('SELECT * FROM (A CROSS JOIN B) CROSS JOIN C') 
柠栀 2024-10-11 21:41:58

考虑使用出色的 data.table 库来提高表现力和速度。它使用相当简单的统一语法处理许多 plyr 用例(关系分组依据),以及变换、子集和关系连接。

library(data.table)
d <- CJ(x=A, y=B, z=C)  # Cross join
d[, w:=f(x,y,z)]  # Mutates the data.table

或在一行中

d <- CJ(x=A, y=B, z=C)[, w:=f(x,y,z)]

Consider using the wonderful data.table library for expressiveness and speed. It handles many plyr use-cases (relational group by), along with transform, subset and relational join using a fairly simple uniform syntax.

library(data.table)
d <- CJ(x=A, y=B, z=C)  # Cross join
d[, w:=f(x,y,z)]  # Mutates the data.table

or in one line

d <- CJ(x=A, y=B, z=C)[, w:=f(x,y,z)]
风吹过旳痕迹 2024-10-11 21:41:56

有一个操作数据框的函数,在这种情况下很有帮助。

它可以产生各种连接(在SQL术语中),而笛卡尔积是一个特例。

您必须先将变量转换为数据帧,因为它以数据帧作为参数。

所以像这样的事情就可以了:

A.B=merge(data.frame(A=A), data.frame(B=B),by=NULL);
A.B.C=merge(A.B, data.frame(C=C),by=NULL);

唯一需要关心的是行没有按照你所描述的那样排序。
您可以根据需要手动对它们进行排序。

merge(x, y, by = intersect(names(x), names(y)),
      by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
      sort = TRUE, suffixes = c(".x",".y"),
      incomparables = NULL, ...)

“如果 by.x 和 by.y 或两者的长度为 0(长度为零的向量或 NULL),则结果 r 是 x 和 y 的笛卡尔积”,

有关详细信息,请参阅此 url:http://stat.ethz.ch/R-manual/R-修补/library/base/html/merge.html

There's a function manipulating dataframe, which is helpful in this case.

It can produce various join(in SQL terminology), while Cartesian product is a special case.

You have to convert the varibles to data frames first, because it take data frame as parameters.

so something like this will do:

A.B=merge(data.frame(A=A), data.frame(B=B),by=NULL);
A.B.C=merge(A.B, data.frame(C=C),by=NULL);

The only thing to care about is that rows are not sorted as you depicted.
You may sort them manually as you wish.

merge(x, y, by = intersect(names(x), names(y)),
      by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
      sort = TRUE, suffixes = c(".x",".y"),
      incomparables = NULL, ...)

"If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y"

see this url for detail: http://stat.ethz.ch/R-manual/R-patched/library/base/html/merge.html

木格 2024-10-11 21:41:56

通过库 tidyr 可以使用 tidyr::crossing (顺序将与 OP 中相同):

library(tidyr)
crossing(A,B,C)
# A tibble: 12 x 3
#        A B         C
#    <dbl> <fct> <dbl>
#  1     1 x       0.1
#  2     1 x       0.5
#  3     1 y       0.1
#  4     1 y       0.5
#  5     2 x       0.1
#  6     2 x       0.5
#  7     2 y       0.1
#  8     2 y       0.5
#  9     3 x       0.1
# 10     3 x       0.5
# 11     3 y       0.1
# 12     3 y       0.5 

下一步将是使用 tidyverse,尤其是purrr::pmap* 系列:

library(tidyverse)
crossing(A,B,C) %>% mutate(D = pmap_chr(.,paste,sep="_"))
# A tibble: 12 x 4
#        A B         C D      
#    <dbl> <fct> <dbl> <chr>  
#  1     1 x       0.1 1_1_0.1
#  2     1 x       0.5 1_1_0.5
#  3     1 y       0.1 1_2_0.1
#  4     1 y       0.5 1_2_0.5
#  5     2 x       0.1 2_1_0.1
#  6     2 x       0.5 2_1_0.5
#  7     2 y       0.1 2_2_0.1
#  8     2 y       0.5 2_2_0.5
#  9     3 x       0.1 3_1_0.1
# 10     3 x       0.5 3_1_0.5
# 11     3 y       0.1 3_2_0.1
# 12     3 y       0.5 3_2_0.5

With library tidyr one can use tidyr::crossing (order will be as in OP):

library(tidyr)
crossing(A,B,C)
# A tibble: 12 x 3
#        A B         C
#    <dbl> <fct> <dbl>
#  1     1 x       0.1
#  2     1 x       0.5
#  3     1 y       0.1
#  4     1 y       0.5
#  5     2 x       0.1
#  6     2 x       0.5
#  7     2 y       0.1
#  8     2 y       0.5
#  9     3 x       0.1
# 10     3 x       0.5
# 11     3 y       0.1
# 12     3 y       0.5 

The next step would be to use tidyverse and especially the purrr::pmap* family:

library(tidyverse)
crossing(A,B,C) %>% mutate(D = pmap_chr(.,paste,sep="_"))
# A tibble: 12 x 4
#        A B         C D      
#    <dbl> <fct> <dbl> <chr>  
#  1     1 x       0.1 1_1_0.1
#  2     1 x       0.5 1_1_0.5
#  3     1 y       0.1 1_2_0.1
#  4     1 y       0.5 1_2_0.5
#  5     2 x       0.1 2_1_0.1
#  6     2 x       0.5 2_1_0.5
#  7     2 y       0.1 2_2_0.1
#  8     2 y       0.5 2_2_0.5
#  9     3 x       0.1 3_1_0.1
# 10     3 x       0.5 3_1_0.5
# 11     3 y       0.1 3_2_0.1
# 12     3 y       0.5 3_2_0.5
贱人配狗天长地久 2024-10-11 21:41:55

您可以使用 expand.grid(A, B, C)


编辑: 使用 do.call 来实现第二部分的替代方法是plyr 包中的函数 mdply

library(plyr)

d = expand.grid(x = A, y = B, z = C)
d = mdply(d, f)

为了使用简单的函数“paste”来说明其用法,您可以尝试

d = mdply(d, 'paste', sep = '+');

You can use expand.grid(A, B, C)


EDIT: an alternative to using do.call to achieve the second part, is the function mdply from the package plyr:

library(plyr)

d = expand.grid(x = A, y = B, z = C)
d = mdply(d, f)

To illustrate its usage using a trivial function 'paste', you can try

d = mdply(d, 'paste', sep = '+');
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文