管道化 hadoop map reduce 作业

发布于 2024-09-27 17:29:27 字数 111 浏览 3 评论 0 原文

我有五个地图缩减,每个地图都单独运行。我想将它们全部放在一起。因此,一项工作的输出将转到下一项工作。目前,我编写了 shell 脚本来执行它们。有没有办法用java来写这个?请举个例子。

谢谢

I have five map reduce that I am running each separately. I want to pipeline them all together. So, output of one job goes to next job. Currently, I wrote shell script to execute them all. Is there a way to write this in java? Please provide an example.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

冰葑 2024-10-04 17:29:27

您可能会发现 JobControl 是最简单的链接方法这些工作放在一起。对于更复杂的工作流程,我建议查看 Oozie

You may find JobControl to be the simplest method for chaining these jobs together. For more complex workflows, I'd recommend checking out Oozie.

£噩梦荏苒 2024-10-04 17:29:27

你好
我有类似的要求
执行此操作的一种方法是

在提交第一个作业后执行以下操作

Job job1 = new Job( getConf() );
job.waitForCompletion( true );

,然后使用检查状态

if(job.isSuccessful()){
    //start another job with different Mapper.
    //change config
    Job job2 = new Job( getConf() );
}

Hi
I had similar requirement
One way to do this is

after submitting first job execute following

Job job1 = new Job( getConf() );
job.waitForCompletion( true );

and then check for status using

if(job.isSuccessful()){
    //start another job with different Mapper.
    //change config
    Job job2 = new Job( getConf() );
}
爱的那么颓废 2024-10-04 17:29:27

Oozie 是适合您的解决方案。您可以通过 Oozie 的操作标签提交 map-reduce 类型的作业、hive 作业、pig 作业、系统命令等。

它甚至有一个协调员,充当您工作流程的 cron。

Oozie is the solution for you. You can submit map-reduce types of jobs, hive jobs, pig jobs, system commands etc through Oozie's action tags.

It even has a co-ordinator which acts as a cron for your workflow.

鲸落 2024-10-04 17:29:27

另一种可能性是 Cascading,它还在 Hadoop 之上提供了一个抽象层:它似乎提供了类似的工作组合- 与 Hadoop 概念密切相关,但让 hadoop 完成使用调用 Pig 脚本的 Oozie 工作流程所获得的 M/R 繁重工作。

Another possibility is Cascading, which also provides an abstraction layer on top of Hadoop: itseems to provide a similar combination of working-closely-with-Hadoop-concepts yet letting-hadoop-do-the-M/R-heavy lifting that one gets using Oozie workflows calling Pig scripts.

桃扇骨 2024-10-04 17:29:27

对于你的用例,我认为 Oozie 会很好。 Oozie 是一个工作流调度程序,您可以在其中编写不同的操作(可以是 map-reduce、java、shell 等)来执行一些计算、转换、丰富等。对于本例:

操作 A : i/p 输入 o/pa

动作B:i/pao/pb

动作C:i/pbo/pc(最终输出)

您最终可以将c持久化到HDFS中,并可以决定持久化或删除中间输出。

如果您想在单个操作中完成所有三个操作完成的计算,那么您可以使用级联。您可以通过他们的官方文档更好地了解 Cascading,也可以参考我的博客:https://tech.flipkart.com/expressing-etl-workflows-via-cascading-192eb5e7d85d

For your use case, I think Oozie will be good. Oozie is a workflow scheduler in which you can write different actions(can be map-reduce, java, shell, etc) to perform some compute, transformation, enrichment, etc. For this case :

action A : i/p input o/p a

action B : i/p a o/p b

action C : i/p b o/p c(final output)

You can finally persist c in HDFS, and can decide to persist or delete intermediate outputs.

If you want to do the computation done by all three actions in a single one then you can use Cascading. You can understand better about Cascading by their official documentation, and you can also refer my blog on same : https://tech.flipkart.com/expressing-etl-workflows-via-cascading-192eb5e7d85d

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文