通过 Snow 在并行代码中使用 Rcpp 来创建集群
我在 Rcpp
中编写了一个函数,并使用 inline
对其进行了编译。现在,我想在不同的内核上并行运行它,但我遇到了一个奇怪的错误。这是一个最小的示例,其中函数 funCPP1
可以自行编译并运行良好,但无法被 snow
的 clusterCall
函数调用。该函数作为单个进程运行良好,但在并行运行时会出现以下错误:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: NULL value passed as symbol address
这是一些代码:
## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
Rcpp::NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
Rcpp::NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")
## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)
## Parallel
cl <- makeCluster(2, type = "SOCK")
clusterExport(cl, 'funCPP1')
clusterCall(cl, funCPP1, A, 0.5)
I've written a function in Rcpp
and compiled it with inline
. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1
can be compiled and runs well by itself, but cannot be called by snow
's clusterCall
function. The function runs well as a single process, but gives the following error when ran in parallel:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: NULL value passed as symbol address
And here is some code:
## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
Rcpp::NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
Rcpp::NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")
## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)
## Parallel
cl <- makeCluster(2, type = "SOCK")
clusterExport(cl, 'funCPP1')
clusterCall(cl, funCPP1, A, 0.5)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
想一想——内联有什么作用?它为您创建一个 C/C++ 函数,然后将其编译并链接到可动态加载的共享库中。那个人坐在哪里?在 R 的临时目录中。
因此,您尝试了正确的做法,将调用该共享库的 R 前端传送到另一个进程(该进程有另一个临时目录!!),但这并没有在那里获取 dll / so 文件。
因此,建议是创建一个本地包,安装它并让两个 Snow 进程加载并调用它。
(和往常一样:在 rcpp-devel 列表中可能会有更好质量的答案,该列表的 Rcpp 贡献者比 SO 的阅读者更多。)
Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.
So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.
Hence the advice is to create a local package, install it and have both snow processes load and call it.
(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)
老问题,但我在查看顶部 Rcpp 标签时偶然发现了它,所以也许这个答案仍然有用。
我认为当您编写的代码经过完全调试并执行您想要的操作时,德克的答案是正确的,但是为示例中的一小段代码编写新包可能会很麻烦。相反,您可以做的是导出代码块,导出编译源代码并运行助手的“助手”函数。这将使 CXX 函数可用,然后使用另一个辅助函数来调用它。例如:
我写了一个包 ctools (无耻的自我推销)包含了用于集群计算的并行包和 Rhpc 包中的许多功能,均带有 PSOCK 和 MPI。我已经有一个名为“c.sourceCpp”的函数,它以与上面大致相同的方式在每个节点上调用“Rcpp::sourceCpp”。现在我看到了它的用处,我将添加一个“c.inlineCpp”来执行上述操作。
编辑:
根据 Coatless 的评论,
Rcpp::cppFunction()
实际上不需要c.inline
帮助器,尽管c.仍然需要 namecall
。Old question, but I stumbled across it while looking through the top Rcpp tags so maybe this answer will be of use still.
I think Dirk's answer is proper when the code you've written is fully de-bugged and does what you want, but it can be a hassle to write a new package for such as small piece of code like in the example. What you can do instead is export the code block, export a "helper" function that compiles source code and run the helper. That'll make the CXX function available, then use another helper function to call it. For instance:
I've written a package ctools (shameless self-promotion) which wraps up a lot of the functionality that is in the parallel and Rhpc packages for cluster computing, both with PSOCK and MPI. I already have a function called "c.sourceCpp" which calls "Rcpp::sourceCpp" on every node in much the same way as above. I'm going to add in a "c.inlineCpp" which does the above now that I see the usefulness of it.
Edit:
In light of Coatless' comments, the
Rcpp::cppFunction()
in fact negates the need for thec.inline
helper here, though thec.namecall
is still needed.我通过在每个集群节点上获取一个带有所需 C 内联函数的 R 文件来解决这个问题:
并且您的文件 your_C_func.R 应包含 C 函数定义:
I resolved it by sourcing on each cluster cluster node an R file with the wanted C inline function:
And your file your_C_func.R should contain the C function definition: