笛卡尔积数据框
我有三个或更多表示为 R 向量的自变量,如下所示:
A <- c(1,2,3)
B <- factor(c('x','y'))
C <- c(0.1,0.5)
我想获取所有这些变量的笛卡尔积并将结果放入数据框中,如下所示:
A B C
1 x 0.1
1 x 0.5
1 y 0.1
1 y 0.5
2 x 0.1
2 x 0.5
2 y 0.1
2 y 0.5
3 x 0.1
3 x 0.5
3 y 0.1
3 y 0.5
我可以通过手动写出对 < 的调用来完成此操作code>rep:
d <- data.frame(A = rep(A, times=length(B)*length(C)),
B = rep(B, times=length(A), each=length(C)),
C = rep(C, each=length(A)*length(B))
但是一定有更优雅的方法来做到这一点,是吗? itertools 中的product 完成了部分工作,但我找不到任何方法来吸收迭代器的输出并将其放入数据框中。有什么建议吗?
ps 此计算的下一步看起来像
d$D <- f(d$A, d$B, d$C)
这样,如果您知道同时执行这两个步骤的方法,那也会很有帮助。
I have three or more independent variables represented as R vectors, like so:
A <- c(1,2,3)
B <- factor(c('x','y'))
C <- c(0.1,0.5)
and I want to take the Cartesian product of all of them and put the result into a data frame, like this:
A B C
1 x 0.1
1 x 0.5
1 y 0.1
1 y 0.5
2 x 0.1
2 x 0.5
2 y 0.1
2 y 0.5
3 x 0.1
3 x 0.5
3 y 0.1
3 y 0.5
I can do this by manually writing out calls to rep
:
d <- data.frame(A = rep(A, times=length(B)*length(C)),
B = rep(B, times=length(A), each=length(C)),
C = rep(C, each=length(A)*length(B))
but there must be a more elegant way to do it, yes? product
in itertools
does part of the job, but I can't find any way to absorb the output of an iterator and put it into a data frame. Any suggestions?
p.s. The next step in this calculation looks like
d$D <- f(d$A, d$B, d$C)
so if you know a way to do both steps at once, that would also be helpful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我永远记不起那个标准函数
expand.grid
。所以这是另一个版本。I can never remember that standard function
expand.grid
. So here's another version.这里有一种方法可以同时实现这两种功能,使用 Ramnath 的
expand.grid
建议:请注意,
do.call
“按原样”在d
上工作,因为data.frame
是一个列表
。但do.call
期望d
的列名称与f
的参数名称匹配。Here's a way to do both, using Ramnath's suggestion of
expand.grid
:Note that
do.call
works ond
"as-is" because adata.frame
is alist
. Butdo.call
expects the column names ofd
to match the argument names off
.在
sqldf
中使用交叉联接:Using cross join in
sqldf
:考虑使用出色的 data.table 库来提高表现力和速度。它使用相当简单的统一语法处理许多 plyr 用例(关系分组依据),以及变换、子集和关系连接。
或在一行中
Consider using the wonderful data.table library for expressiveness and speed. It handles many plyr use-cases (relational group by), along with transform, subset and relational join using a fairly simple uniform syntax.
or in one line
有一个操作数据框的函数,在这种情况下很有帮助。
它可以产生各种连接(在SQL术语中),而笛卡尔积是一个特例。
您必须先将变量转换为数据帧,因为它以数据帧作为参数。
所以像这样的事情就可以了:
唯一需要关心的是行没有按照你所描述的那样排序。
您可以根据需要手动对它们进行排序。
“如果 by.x 和 by.y 或两者的长度为 0(长度为零的向量或 NULL),则结果 r 是 x 和 y 的笛卡尔积”,
有关详细信息,请参阅此 url:http://stat.ethz.ch/R-manual/R-修补/library/base/html/merge.html
There's a function manipulating dataframe, which is helpful in this case.
It can produce various join(in SQL terminology), while Cartesian product is a special case.
You have to convert the varibles to data frames first, because it take data frame as parameters.
so something like this will do:
The only thing to care about is that rows are not sorted as you depicted.
You may sort them manually as you wish.
"If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y"
see this url for detail: http://stat.ethz.ch/R-manual/R-patched/library/base/html/merge.html
通过库
tidyr
可以使用tidyr::crossing
(顺序将与 OP 中相同):下一步将是使用
tidyverse
,尤其是purrr::pmap*
系列:With library
tidyr
one can usetidyr::crossing
(order will be as in OP):The next step would be to use
tidyverse
and especially thepurrr::pmap*
family:您可以使用
expand.grid(A, B, C)
编辑: 使用
do.call
来实现第二部分的替代方法是plyr
包中的函数mdply
:为了使用简单的函数“paste”来说明其用法,您可以尝试
You can use
expand.grid(A, B, C)
EDIT: an alternative to using
do.call
to achieve the second part, is the functionmdply
from the packageplyr
:To illustrate its usage using a trivial function 'paste', you can try