如何让 doSMP 与 plyr 完美配合?

发布于 2024-10-29 23:56:48 字数 2122 浏览 0 评论 0原文

此代码有效:

library(plyr)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE) 

虽然此代码失败:

library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopWorkers(workers)

>Error in do.ply(i) : task 3 failed - "subscript out of bounds"
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’

2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’

我正在使用 R 2.1.12、plyr 1.4 和 doSMP 1.0-1。有没有人想出办法解决这个问题?

编辑:为了回应Andrie,这里有一个进一步的说明:

system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #1
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #2
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #3
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #4
stopWorkers(workers)

前三个功能可以工作,但它们都需要大约3秒。函数 #2 发出警告,表明没有注册并行后端,因此按顺序执行。函数 #4 给出了我在原始帖子中引用的相同错误。

/编辑:好奇者和好奇者:在我的Mac上,以下工作:

library(plyr)
library(doMC)
registerDoMC()
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)

但这失败了:

library(plyr)
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopWorkers(workers)

这也失败了:

library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopCluster(cl)

所以我认为 foreach 的各种并行后端是不可互换的。

This code works:

library(plyr)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE) 

While this code fails:

library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopWorkers(workers)

>Error in do.ply(i) : task 3 failed - "subscript out of bounds"
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’

2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’

I am using R 2.1.12, plyr 1.4 and doSMP 1.0-1. Has anyone figured out a way around this?

edit: In response to Andrie, here is a further illustration:

system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #1
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #2
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #3
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #4
stopWorkers(workers)

The first three functions work, but they all take about 3 seconds. Function #2 gives a warning that no parallel backend is registered, and thus executes sequentially. Function #4 gives the same error I referenced in my original post.

/edit: curioser and curiouser: On my mac, the following works:

library(plyr)
library(doMC)
registerDoMC()
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)

But this fails:

library(plyr)
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopWorkers(workers)

And this fails too:

library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopCluster(cl)

So I suppose the various parallel back ends for foreach are not interchangeable.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

战皆罪 2024-11-05 23:56:48

虽然@hadley 很好地回答了这个问题,但我想补充一点,我认为 plyr 现在可以与其他 foreach 并行后端一起使用。这是 链接到包含 plyr 与 doSNOW 结合使用的示例的博客条目:

While the question has been answered well by @hadley, I want to add that I think plyr now works with other foreach parallel back-ends. Here is a link to a blog entry containing an example where plyr is used in conjunction with doSNOW:

懒猫 2024-11-05 23:56:48

只是为了确认@LeeZamparo的答案,plyr现在似乎可以与snow一起使用,至少在带有R版本2.15.0的Windows 7上。问题中的最后一段代码有效,尽管带有神秘的警告:

library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)

x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)

library(microbenchmark)
mb <- microbenchmark(

      PP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE),
      NP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE) 
                     )

stopCluster(cl)

神秘的警告:

> warnings()
Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...

这并不快,我猜这就是开销...

> mb
Unit: milliseconds
                                                             expr
1 NP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = FALSE)
2 PP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = TRUE)
        min        lq    median        uq       max
1  11.91518  15.74567  20.10944  23.30453  38.09237
2 314.58008 336.81160 348.42421 358.57337 575.11220

检查它是否给出了预期的结果

> PP
  V V1
1 X  4
2 Y  6
3 Z  5

有关此会话的额外详细信息:

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.1-3 doSNOW_1.0.6         iterators_1.0.6     
[4] foreach_1.4.0        plyr_1.7.1           snow_0.3-10          

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 tools_2.15.0

Just to confirm @LeeZamparo's answer, plyr does now seem to work with snow, at least on on Windows 7 with R version 2.15.0. The last chunk of code in the question works, though with cryptic warnings:

library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)

x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)

library(microbenchmark)
mb <- microbenchmark(

      PP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE),
      NP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE) 
                     )

stopCluster(cl)

Cryptic warnings:

> warnings()
Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...

It's not quick, I guess that's the overhead...

> mb
Unit: milliseconds
                                                             expr
1 NP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = FALSE)
2 PP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = TRUE)
        min        lq    median        uq       max
1  11.91518  15.74567  20.10944  23.30453  38.09237
2 314.58008 336.81160 348.42421 358.57337 575.11220

Check it gives the expected result

> PP
  V V1
1 X  4
2 Y  6
3 Z  5

Extra details about this session:

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.1-3 doSNOW_1.0.6         iterators_1.0.6     
[4] foreach_1.4.0        plyr_1.7.1           snow_0.3-10          

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 tools_2.15.0
很糊涂小朋友 2024-11-05 23:56:48

事实证明 plyr仅适用于doMC,但开发人员正在努力。

It turns out plyr only works with doMC, but the developer is working on it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文