代码执行的并行化/集群选项

发布于 2024-10-14 02:17:34 字数 743 浏览 1 评论 0原文

我有 Java 背景,并且遇到了 CPU 限制问题,我正在尝试并行化以提高性能。我已经分解了我的代码以模块化的方式执行,以便它可以以并行的方式分发和运行(希望如此)。

@Transactional(readOnly = false, propagation = Propagation.REQUIRES_NEW)
public void runMyJob(List<String> some params){
  doComplexEnoughStuffAndWriteToMysqlDB();
}

现在,我一直在考虑以下选项来并行化这个问题,我想了解人们在这个领域的想法/经验。

我目前正在考虑的选项:

1)使用Java EE(例如JBoss)集群和MessageDrivenBeans。 MDB 位于集群中的从属节点上。每个 MDB 都可以拾取一个启动作业的事件,如上所述。 AFAIK Java EE MDB 由应用程序服务器进行多线程处理,因此希望也能够利用多核。因此它应该是垂直和水平可扩展的。

2) 我可以考虑使用 Hadoop 和 MapReduce 之类的东西。我在这里担心的是,我的作业处理逻辑实际上是相当高水平的,所以我不确定它如何转化为 MapReduce。另外,我对 MR 完全是新手。

3)我可以看看 Scala 之类的东西,我相信它使并发编程变得更加简单。然而,虽然这是垂直可扩展的,但它不是集群/水平可扩展的解决方案。

不管怎样,希望一切都有意义,并非常感谢您提供的任何帮助。

I'm coming from a java background and have a CPU bound problem that I'm trying to parallelize to improve performance. I have broken up my code to perform in a modular way so that it can be distributed and run in a parallel way (hopefully).

@Transactional(readOnly = false, propagation = Propagation.REQUIRES_NEW)
public void runMyJob(List<String> some params){
  doComplexEnoughStuffAndWriteToMysqlDB();
}

Now, I have been thinking of the following options for parallelizing this problem and I'd like people's thoughts/experience in this area.

Options I am currently thinking of:

1) Use Java EE (eg JBoss) clustering and MessageDrivenBeans. The MDBs are on the slave nodes in the cluster. Each MDB can pick up an event which kicks off a job as above. AFAIK Java EE MDBs are multithreaded by the app server so this should hopefully also be able to take advantage of multicores. Thus it should be vertically and horizontally scalable.

2) I could look at using something like Hadoop and Map Reduce. Concerns I would have here is that my job processing logic is actually quite high level so I'm not sure how translatable that is to Map Reduce. Also, I'm a total newbie to MR.

3) I could look at something like Scala which I believe makes concurrency programming much simpler. However, while this is vertically scalable, it's not a cluster/horizontally scalable solution.

Anyway, hope all that makes sense and thank you very much for any help provided.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

以为你会在 2024-10-21 02:17:34

您应该查看 spark
它是一个用 Scala 编写的集群计算框架,旨在成为 Hadoop 的可行替代方案。
它有许多不错的功能:

  • 内存计算:您可以控制缓存的程度
  • Hadoop 输入/输出互操作性:Spark 可以从所有 Hadoop 输入源(例如 HDFS、EC2 等)读取/写入数据。
  • “的概念弹性分布式数据集”(RDD),它允许您直接在集群上并行执行大多数 MR 风格的工作负载,就像在本地一样
  • 主要 API = Scala,可选的 python 和 Java API
  • 它使用 Akka :)

如果我理解您的问题正确的是,Spark 会结合您的选项 2) 和 3)。

You should have a look at spark.
It is a cluster computing framework written in Scala aiming at being a viable alternative to Hadoop.
It has a number of nice feats:

  • In-Memory Computations: You can control the degree of caching
  • Hadoop Input/Output interoperability: Spark can read/write data from all the Hadoop input sources such as HDFS, EC2, etc.
  • The concept of "Resilient Distributed Datasets" (RDD) which allows you to directly execute most of MR style workloads in parallel on a cluster as you would do locally
  • Primary API = Scala, optional python and Java APIs
  • It makes use of Akka :)

If I understand your question correctly, Spark would combine your options 2) and 3).

节枝 2024-10-21 02:17:34

您正在寻找的解决方案是 Akka。集群是一个正在开发的功能,通常会包含在 Akka 2.1 中

  • 优秀的 Scala 和 Java Api,非常完整
  • 纯粹的面向消息的模式,没有共享状态
  • 抗故障和可扩展
  • 非常容易分发作业

如果你是,请摆脱 J2EE仍然准时。非常欢迎您加入 Akka 邮件列表来提问。

the solution you are looking for is Akka. Clustering is a feature under development, and will normally be included in Akka 2.1

  • Excellent Scala and Java Api, extremely complete
  • Purely message-oriented pattern, with no shared state
  • Fault resistant and scalable
  • Extremely easy to distribute jobs

Please get rid of J2EE if you are still on time. You are very welcome to join the Akka mailing list to ask your questions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文