分布式作业调度、管理和报告

发布于 2024-08-14 11:54:34 字数 556 浏览 3 评论 0原文

我最近试用了 Hadoop,对其 MapReduce 作业的调度、管理和报告印象深刻。它似乎使新作业的分配和执行变得非常无缝,使开发人员能够专注于其作业的实施。

我想知道Java领域中是否存在一些对于作业的分布式执行来说不容易表达为MapReduce问题的东西?例如:

  • 需要任务协调和同步的作业。例如,它们可能涉及任务的顺序执行,但同时执行某些任务也是可行的:

    <前><代码> .-- B --. .--A--| |--。 | '--C--'| 开始--| |-- 完成 | | '--D -------------'
  • 您想要分发但不提供任何输出以减少的 CPU 密集型任务 - 例如图像转换/调整大小。

那么有没有一个Java框架/平台可以提供这样的分布式计算环境呢?或者使用 Hadoop 可以接受/实现这种事情吗?如果是的话,对于此类工作是否有任何模式/指南?

I recently had a play around with Hadoop and was impressed with it's scheduling, management, and reporting of MapReduce jobs. It appears to make the distribution and execution of new jobs quite seamless, allowing the developer to concentrate on the implementation of their jobs.

I am wondering if anything exists in the Java domain for the distributed execution of jobs that are not easily expressed as MapReduce problems? For example:

  • Jobs that require task co-ordination and synchronization. For example, they may involve sequential execution of tasks yet it is feasible to execute some tasks concurrently:

                   .-- B --.
            .--A --|       |--.
            |      '-- C --'  |
    Start --|                 |-- Done
            |                 |
            '--D -------------'
    
  • CPU intensive tasks that you'd like to distribute but don't provide any outputs to reduce - image conversion/resizing for example.

So is there a Java framework/platform that provides such a distributed computing environment? Or is this sort of thing acceptable/achievable using Hadoop - and if so are there any patterns/guidelines for these sorts of jobs?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

留一抹残留的笑 2024-08-21 11:54:34

我已经找到 Spring BatchSpring Batch Integration 这似乎满足了我的许多要求。我会让你知道我的进展如何。

I have since found Spring Batch and Spring Batch Integration which appear to address many of my requirements. I will let you know how I get on.

苏璃陌 2024-08-21 11:54:34

看看 Quartz。我认为它支持远程管理作业和集群多台机器来运行作业等功能。

Take a look at Quartz. I think it supports stuff like managing jobs remotely and clustering several machines to run jobs.

罪#恶を代价 2024-08-21 11:54:34

我猜您正在寻找用于 CPU 密集型任务的工作流程引擎(也知道“科学工作流程”,例如 http://www.extreme.indiana.edu/swf-survey)。但我不确定你希望它的分布程度如何。通常所有工作流引擎都存在“单点故障”。

I guess you are looking for a workflow engine for CPU intensive tasks (also know "scientific workflow", e.g. http://www.extreme.indiana.edu/swf-survey). But I'm not sure how distributed do you want it to be. Usually all workflow engines have a "single point of failure".

寒冷纷飞旳雪 2024-08-21 11:54:34

我相信很多问题都可以表达为map-reduce问题。

对于无法修改以适应结构的问题,您可以考虑使用 Java 的 ExecutorService.但它仅限于一个 JVM,并且级别相当低。然而,它将允许轻松的协调和同步。

I believe quite a few problems can be expressed as map-reduce problems.

For problems that you can't modify to fit the structure your can look at setting up your own using Java's ExecutorService. But it will be limited to one JVM and it will be quite low level. It will allow for easy coordination and synchronization however.

橘寄 2024-08-21 11:54:34

ProActive Scheduler 似乎符合您的要求,特别是您提到的任务协调的复杂工作流程。
它是开源的并且基于 Java。您可以使用它来运行任何内容,Hadoop 作业、脚本、Java 代码……

免责声明:我为 公司 在它后面

ProActive Scheduler seems to fit your requirements, especially the complex workflows you mentionned with tasks coordination.
It is open source and Java based. You can use it to run anything, Hadoop jobs, scripts, Java code,...

Disclaimer: I work for the company behind it

失眠症患者 2024-08-21 11:54:34

尝试 Redisson 框架。它提供了简单的 API 来执行和调度 java.util.concurrent.Callable 和 java.lang.Runnable 任务。这是有关分布式 执行器服务< /a> 和 调度程序服务

Try Redisson framework. It provides easy api to execute and schedule java.util.concurrent.Callable and java.lang.Runnable tasks. Here is documentation about distributed Executor service and Scheduler service

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文