使用 R 时,我应该更喜欢 hadoop 还是 condor?
我正在寻找为我的大学计算机网格上的多台计算机发送作品的方法。
目前它运行的是 Condor,还提供 Hadoop。
因此,我的问题是,我应该尝试将 R 与 Hadoop 或 Conder 接口用于我的项目吗?
为了进行讨论,我们假设我们正在讨论令人尴尬的并行任务。
ps:我已经看到了 CRAN 任务视图中描述的资源。
I am looking for ways to send works for multiple computers on my University computer grid.
Currently it is running Condor and also offers Hadoop.
My question is thus, should I try and interface with R to Hadoop or to the Conder for my projects?
For the discussion, let's assume we are talking about Embarrassingly parallel tasks.
p.s: I've seen the resources described in the CRAN task views.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你可以两者都做。
您可以使用 HDFS 作为数据集,使用 Condor 进行作业调度。使用 Condor 将执行器放置在机器上,并使用 HDFS + Hadoop Map-Reduce 功能来处理数据(假设您的问题是可映射的 map-reduce)。然后,您将使用最适合该工作的工具:Condor 是一个作业调度程序,因此它比 Hadoop 工作得更好。 Hadoop 的 HDFS 和 MR 框架是 Condor 所没有的(但对于在 Condor 上运行的作业使用确实很有帮助)。
我个人会考虑使用 HDFS 在作为 Condor 作业离散运行的作业之间共享数据。特别是在大学环境中,共享计算资源并非 100% 可靠,并且可以随意来来去去,Condor 在此类设置中的弹性将使完成工作变得更加容易。
You can do both.
You can use HDFS for your data sets and Condor for your job scheduling. Using Condor to place executors on machines and HDFS + Hadoops Map-Reduce features to process your data (assuming your problem is map-reduce mappable). Then you're using the most appropriate tool for the job: Condor is a job scheduler, and as such does that work better than Hadoop. And Hadoop's HDFS and M-R framework are things Condor doesn't have (but are really helpful for jobs running on Condor to use).
I would personally look at has HDFS to share data among jobs that run discretely as Condor jobs. Especially in a university environment, where shared compute resources are not 100% reliable and can come and go at will, Condor's resilience in this type of set up is going to make getting work done a whole lot easier.