在 R 中使用多核和 doMC 时,如何标记子进程以进行日志记录

发布于 2024-10-27 08:04:05 字数 609 浏览 4 评论 0原文

我已经开始使用 R 的 doMC 包作为并行化 plyr 例程的并行后端。

并行化本身似乎工作得很好(尽管我还没有正确地对加速进行基准测试),我的问题是日志记录现在是异步的并且来自不同核心的消息混合在一起。我可以为每个核心创建不同的日志文件,但我认为更简洁的解决方案是简单地为每个核心添加不同的标签。我目前正在使用 log4r 包来满足我的日志记录需求。

我记得在使用 MPI 时,每个处理器都有一个等级,这是区分每个进程的一种方式,那么有没有办法用 doMC 来做到这一点?我确实有提取 PID 的想法,但这看起来确实很混乱,并且每次迭代都会改变。

不过,我对想法持开放态度,因此欢迎任何建议。

编辑(2011-04-08):根据一个答案的建议,我仍然存在正确识别我当前所在子进程的问题,因为我需要为每个 log() 调用,以便它写入正确的文件,或者我将有一个 log() 函数,但其​​中有一些逻辑确定要附加到哪个日志文件。无论哪种情况,我仍然需要某种方式来标记当前子流程,但我不确定如何做到这一点。

MPI 库中是否有与 mpi_rank() 函数等效的函数?

I have started using the doMC package for R as the parallel backend for parallelised plyr routines.

The parallelisation itself seems to be working fine (though I have yet to properly benchmark the speedup), my problem is that the logging is now asynchronous and messages from different cores are getting mixed in together. I could created different logfiles for each core, but I think I neater solution is to simply add a different label for each core. I am currently using the log4r package for my logging needs.

I remember when using MPI that each processor got a rank, which was a way of distinguishing each process from one another, so is there a way to do this with doMC? I did have the idea of extracting the PID, but this does seem messy and will change for every iteration.

I am open to ideas though, so any suggestions are welcome.

EDIT (2011-04-08): Going with the suggestion of one answer, I still have the issue of correctly identifying which subprocess I am currently inside, as I would either need separate closures for each log() call so that it writes to the correct file, or I would have a single log() function, but have some logic inside it determining which logfile to append to. In either case, I would still need some way of labelling the current subprocess, but I am not sure how to do this.

Is there an equivalent of the mpi_rank() function in the MPI library?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无人问我粥可暖 2024-11-03 08:04:05

我认为让多个进程写入同一个文件会导致灾难(尽管它只是一个日志,所以也许“灾难”有点强)。

我经常在染色体上并行工作。这是我会做的一个例子(我主要使用 foreach/doMC):

foreach(chr=chromosomes, ...) %dopar% {
  cat("+++", chr, "+++\n")
  ## ... some undoubtedly amazing code would then follow ...
}

并且获得相互践踏的输出并不罕见......类似的输出(不完全是)这个:

+++chr1+++
+++chr2+++
++++chr3++chr4+++

...你明白了...

如果我处于你的位置,我想我会分割每个进程的日志,并将它们各自的文件名设置为相对于该进程循环中发生的事情是唯一的(就像上面我的例子中的chr)。如果您必须的话,请稍后整理它们......即。映射/减少您的日志文件:-)

I think having multiple process write to the same file is a recipe for a disaster (it's just a log though, so maybe "disaster" is a bit strong).

Often times I parallelize work over chromosomes. Here is an example of what I'd do (I've mostly been using foreach/doMC):

foreach(chr=chromosomes, ...) %dopar% {
  cat("+++", chr, "+++\n")
  ## ... some undoubtedly amazing code would then follow ...
}

And it wouldn't be unusual to get output that tramples over each other ... something like (not exactly) this:

+++chr1+++
+++chr2+++
++++chr3++chr4+++

... you get the idea ...

If I were in your shoes, I think I'd split the logs for each process and set their respective filenames to be unique with respect to something happening in that process's loop (like chr in my case above). Collate them later if you must ... ie. map/reduce your log files :-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文