代码执行的并行化/集群选项
我有 Java 背景,并且遇到了 CPU 限制问题,我正在尝试并行化以提高性能。我已经分解了我的代码以模块化的方式执行,以便它可以以并行的方式分发和运行(希望如此)。
@Transactional(readOnly = false, propagation = Propagation.REQUIRES_NEW)
public void runMyJob(List<String> some params){
doComplexEnoughStuffAndWriteToMysqlDB();
}
现在,我一直在考虑以下选项来并行化这个问题,我想了解人们在这个领域的想法/经验。
我目前正在考虑的选项:
1)使用Java EE(例如JBoss)集群和MessageDrivenBeans。 MDB 位于集群中的从属节点上。每个 MDB 都可以拾取一个启动作业的事件,如上所述。 AFAIK Java EE MDB 由应用程序服务器进行多线程处理,因此希望也能够利用多核。因此它应该是垂直和水平可扩展的。
2) 我可以考虑使用 Hadoop 和 MapReduce 之类的东西。我在这里担心的是,我的作业处理逻辑实际上是相当高水平的,所以我不确定它如何转化为 MapReduce。另外,我对 MR 完全是新手。
3)我可以看看 Scala 之类的东西,我相信它使并发编程变得更加简单。然而,虽然这是垂直可扩展的,但它不是集群/水平可扩展的解决方案。
不管怎样,希望一切都有意义,并非常感谢您提供的任何帮助。
I'm coming from a java background and have a CPU bound problem that I'm trying to parallelize to improve performance. I have broken up my code to perform in a modular way so that it can be distributed and run in a parallel way (hopefully).
@Transactional(readOnly = false, propagation = Propagation.REQUIRES_NEW)
public void runMyJob(List<String> some params){
doComplexEnoughStuffAndWriteToMysqlDB();
}
Now, I have been thinking of the following options for parallelizing this problem and I'd like people's thoughts/experience in this area.
Options I am currently thinking of:
1) Use Java EE (eg JBoss) clustering and MessageDrivenBeans. The MDBs are on the slave nodes in the cluster. Each MDB can pick up an event which kicks off a job as above. AFAIK Java EE MDBs are multithreaded by the app server so this should hopefully also be able to take advantage of multicores. Thus it should be vertically and horizontally scalable.
2) I could look at using something like Hadoop and Map Reduce. Concerns I would have here is that my job processing logic is actually quite high level so I'm not sure how translatable that is to Map Reduce. Also, I'm a total newbie to MR.
3) I could look at something like Scala which I believe makes concurrency programming much simpler. However, while this is vertically scalable, it's not a cluster/horizontally scalable solution.
Anyway, hope all that makes sense and thank you very much for any help provided.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该查看 spark。
它是一个用 Scala 编写的集群计算框架,旨在成为 Hadoop 的可行替代方案。
它有许多不错的功能:
如果我理解您的问题正确的是,Spark 会结合您的选项 2) 和 3)。
You should have a look at spark.
It is a cluster computing framework written in Scala aiming at being a viable alternative to Hadoop.
It has a number of nice feats:
If I understand your question correctly, Spark would combine your options 2) and 3).
您正在寻找的解决方案是 Akka。集群是一个正在开发的功能,通常会包含在 Akka 2.1 中
如果你是,请摆脱 J2EE仍然准时。非常欢迎您加入 Akka 邮件列表来提问。
the solution you are looking for is Akka. Clustering is a feature under development, and will normally be included in Akka 2.1
Please get rid of J2EE if you are still on time. You are very welcome to join the Akka mailing list to ask your questions.