通过 Snow 在并行代码中使用 Rcpp 来创建集群

发布于 2024-11-08 21:06:29 字数 1007 浏览 5 评论 0原文

我在 Rcpp 中编写了一个函数,并使用 inline 对其进行了编译。现在,我想在不同的内核上并行运行它,但我遇到了一个奇怪的错误。这是一个最小的示例,其中函数 funCPP1 可以自行编译并运行良好,但无法被 snowclusterCall 函数调用。该函数作为单个进程运行良好,但在并行运行时会出现以下错误:

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: NULL value passed as symbol address

这是一些代码:

## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")

## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)

## Parallel
cl <- makeCluster(2, type = "SOCK") 
clusterExport(cl, 'funCPP1') 
clusterCall(cl, funCPP1, A, 0.5)

I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: NULL value passed as symbol address

And here is some code:

## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")

## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)

## Parallel
cl <- makeCluster(2, type = "SOCK") 
clusterExport(cl, 'funCPP1') 
clusterCall(cl, funCPP1, A, 0.5)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

离线来电— 2024-11-15 21:06:29

想一想——内联有什么作用?它为您创建一个 C/C++ 函数,然后将其编译并链接到可动态加载的共享库中。那个人坐在哪里?在 R 的临时目录中。

因此,您尝试了正确的做法,将调用该共享库的 R 前端传送到另一个进程(该进程有另一个临时目录!!),但这并没有在那里获取 dll / so 文件。

因此,建议是创建一个本地包,安装它并让两个 Snow 进程加载并调用它。

(和往常一样:在 rcpp-devel 列表中可能会有更好质量的答案,该列表的 Rcpp 贡献者比 SO 的阅读者更多。)

Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.

So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.

Hence the advice is to create a local package, install it and have both snow processes load and call it.

(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)

帅冕 2024-11-15 21:06:29

老问题,但我在查看顶部 Rcpp 标签时偶然发现了它,所以也许这个答案仍然有用。

我认为当您编写的代码经过完全调试并执行您想要的操作时,德克的答案是正确的,但是为示例中的一小段代码编写新包可能会很麻烦。相反,您可以做的是导出代码块,导出编译源代码并运行助手的“助手”函数。这将使 CXX 函数可用,然后使用另一个辅助函数来调用它。例如:

# Snow must still be installed, but this functionality is now in "parallel" which ships with base r.
library(parallel)

# Keep your source as an object
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
# Save the signature
sig <- signature(xbe = "numeric", g="numeric")

# make a function that compiles the source, then assigns the compiled function 
# to the global environment
c.inline <- function(name, sig, src){
    library(Rcpp)
    funCXX <- inline::cxxfunction(sig = sig, body = src, plugin="Rcpp")
    assign(name, funCXX, envir=.GlobalEnv)
}
# and the function which retrieves and calls this newly-compiled function 
c.namecall <- function(name,...){
    funCXX <- get(name)
    funCXX(...)
}

# Keep your example matrix
A <- matrix(rnorm(400), 20,20)

# What are we calling the compiled funciton?
fxname <- "TestCXX"

## Parallel
cl <- makeCluster(2, type = "PSOCK") 

# Export all the pieces
clusterExport(cl, c("src1","c.inline","A","fxname")) 

# Call the compiler function
clusterCall(cl, c.inline, name=fxname, sig=sig, src=src1)

# Notice how the function now named "TestCXX" is available in the environment
# of every node?
clusterCall(cl, ls, envir=.GlobalEnv)

# Call the function through our wrapper
clusterCall(cl, c.namecall, name=fxname, A, 0.5)
# Works with my testing

我写了一个包 ctools (无耻的自我推销)包含了用于集群计算的并行包和 Rhpc 包中的许多功能,均带有 PSOCK 和 MPI。我已经有一个名为“c.sourceCpp”的函数,它以与上面大致相同的方式在每个节点上调用“Rcpp::sourceCpp”。现在我看到了它的用处,我将添加一个“c.inlineCpp”来执行上述操作。

编辑:

根据 Coatless 的评论,Rcpp::cppFunction() 实际上不需要 c.inline 帮助器,尽管 c.仍然需要 namecall

src2 <- '
 NumericMatrix TestCpp(NumericMatrix xbe, int g){
        NumericMatrix xbem(xbe);
        int nrows = xbem.nrow();
        NumericVector gv(g);
        for (int i = 1; i < nrows; i++) {
            xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
        }
        return xbem;
 }
'

clusterCall(cl, Rcpp::cppFunction, code=src2, env=.GlobalEnv)

# Call the function through our wrapper
clusterCall(cl, c.namecall, name="TestCpp", A, 0.5)

Old question, but I stumbled across it while looking through the top Rcpp tags so maybe this answer will be of use still.

I think Dirk's answer is proper when the code you've written is fully de-bugged and does what you want, but it can be a hassle to write a new package for such as small piece of code like in the example. What you can do instead is export the code block, export a "helper" function that compiles source code and run the helper. That'll make the CXX function available, then use another helper function to call it. For instance:

# Snow must still be installed, but this functionality is now in "parallel" which ships with base r.
library(parallel)

# Keep your source as an object
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
# Save the signature
sig <- signature(xbe = "numeric", g="numeric")

# make a function that compiles the source, then assigns the compiled function 
# to the global environment
c.inline <- function(name, sig, src){
    library(Rcpp)
    funCXX <- inline::cxxfunction(sig = sig, body = src, plugin="Rcpp")
    assign(name, funCXX, envir=.GlobalEnv)
}
# and the function which retrieves and calls this newly-compiled function 
c.namecall <- function(name,...){
    funCXX <- get(name)
    funCXX(...)
}

# Keep your example matrix
A <- matrix(rnorm(400), 20,20)

# What are we calling the compiled funciton?
fxname <- "TestCXX"

## Parallel
cl <- makeCluster(2, type = "PSOCK") 

# Export all the pieces
clusterExport(cl, c("src1","c.inline","A","fxname")) 

# Call the compiler function
clusterCall(cl, c.inline, name=fxname, sig=sig, src=src1)

# Notice how the function now named "TestCXX" is available in the environment
# of every node?
clusterCall(cl, ls, envir=.GlobalEnv)

# Call the function through our wrapper
clusterCall(cl, c.namecall, name=fxname, A, 0.5)
# Works with my testing

I've written a package ctools (shameless self-promotion) which wraps up a lot of the functionality that is in the parallel and Rhpc packages for cluster computing, both with PSOCK and MPI. I already have a function called "c.sourceCpp" which calls "Rcpp::sourceCpp" on every node in much the same way as above. I'm going to add in a "c.inlineCpp" which does the above now that I see the usefulness of it.

Edit:

In light of Coatless' comments, the Rcpp::cppFunction() in fact negates the need for the c.inline helper here, though the c.namecall is still needed.

src2 <- '
 NumericMatrix TestCpp(NumericMatrix xbe, int g){
        NumericMatrix xbem(xbe);
        int nrows = xbem.nrow();
        NumericVector gv(g);
        for (int i = 1; i < nrows; i++) {
            xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
        }
        return xbem;
 }
'

clusterCall(cl, Rcpp::cppFunction, code=src2, env=.GlobalEnv)

# Call the function through our wrapper
clusterCall(cl, c.namecall, name="TestCpp", A, 0.5)
泪是无色的血 2024-11-15 21:06:29

我通过在每个集群节点上获取一个带有所需 C 内联函数的 R 文件来解决这个问题:

clusterEvalQ(cl, 
    {
     library(inline)
     invisible(source("your_C_func.R"))
    })

并且您的文件 your_C_func.R 应包含 C 函数定义:

c_func <- cfunction(...)

I resolved it by sourcing on each cluster cluster node an R file with the wanted C inline function:

clusterEvalQ(cl, 
    {
     library(inline)
     invisible(source("your_C_func.R"))
    })

And your file your_C_func.R should contain the C function definition:

c_func <- cfunction(...)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文