在 hadoop 中并行运行作业

发布于 2024-12-05 15:05:10 字数 150 浏览 0 评论 0原文

我是 hadoop 新手。

我已经设置了一个 2 节点集群。

如何在 hadoop 中并行运行 2 个作业。

当我提交作业时,它们按照 FIFO 顺序一项一项地运行。我必须并行运行这些作业。如何实现这一目标。

谢谢 MRK

I am new to hadoop.

I have set up a 2 node cluster.

How to run 2 jobs parallely in hadoop.

When i submit jobs, they are running one by one in FIFO order. I have to run the jobs parallely. How to acheive that.

Thanks
MRK

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

橙味迷妹 2024-12-12 15:05:10

Hadoop 可以配置多种调度器,默认是 FIFO 调度器。

FIFO 调度的行为如下。

场景一:如果集群有10个Map Task容量,job1需要15个Map Task,那么运行job1会占用整个集群。当 job1 取得进展并且有可用的空闲插槽但 job1 未使用时,job2 将在集群上运行。

场景二:如果集群有10个Map Task容量,job1需要6个Map Task,那么job1占用6个slot,job2占用4个slot。 job1 和 job2 并行运行。

要从一开始就并行运行作业,您可以配置 公平调度程序 或根据您的要求容量调度程序。必须设置mapreduce.jobtracker.taskscheduler和特定的调度程序参数才能使其在mapred-site.xml

编辑:根据 MRK 的评论更新了答案。

Hadoop can be configured with a number of schedulers and the default is the FIFO scheduler.

FIFO Schedule behaves like this.

Scenario 1: If the cluster has 10 Map Task capacity and job1 needs 15 Map Task, then running job1 takes the complete cluster. As job1 makes progress and there are free slots available which are not used by job1 then job2 runs on the cluster.

Scenario 2: If the cluster has 10 Map Task capacity and job1 needs 6 Map Task, then job1 takes 6 slots and job2 takes 4 slots. job1 and job2 run in parallel.

To run jobs in parallel from the start, you can either configure a Fair Scheduler or a Capacity Scheduler based on your requirements. The mapreduce.jobtracker.taskscheduler and the specific scheduler parameters have to be set for this to take effect in the mapred-site.xml.

Edit: Updated the answer based on the comment from MRK.

童话里做英雄 2024-12-12 15:05:10

您有“映射任务容量”和“减少任务容量”。只要有空,他们就会按照先进先出的顺序挑选工作。您提交的作业包含映射器和可选的减速器。如果您的作业映射器(和/或减速器)数量小于集群的容量,它将占用下一个作业映射器(和/或减速器)。

如果您不喜欢先进先出,您始终可以优先处理您提交的作业。

编辑:

抱歉,有轻微的错误信息,Praveen 的答案是正确的。
除了他的回答之外,您还可以检查HOD调度程序

You have "Map Task Capacity" and "Reduce Task Capacity". Whenever those are free they would pick the job in FIFO order. Your submitted jobs contains mapper and optionally reducer. If your jobs mapper (and/or reducer) count is smaller then the cluster's capacity it would take the next jobs mapper (and/or reducer).

If you don't like FIFO, you can always give priority to your submitted jobs.

Edit:

Sorry about slight missinformation, Praveen's answer is the right one.
in adition to his answer you can check HOD scheduler aswell.

悍妇囚夫 2024-12-12 15:05:10

使用默认调度程序,每个用户一次只能执行一项作业。您可以从不同的用户 ID 启动不同的作业。当然,它们将并行运行,正如其他人提到的,您需要有足够的插槽容量。

With the default scheduler only one job per user at a time. You can launch different jobs from different user ids. They will run in parallel, of course, as mentioned by others you need to have enough slot capacity.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文