从 JobTracker 上已完成的作业中获取 org.apache.hadoop.mapreduce.Job
我正在使用 org.apache.hadoop.mapreduce.Job 来创建/提交/运行 MR 作业(Cloudera3,20.2),完成后,在一个单独的应用程序中,我尝试让作业抓取计数器来对它们进行一些工作,这样我就不必每次都重新运行整个 MR 作业来测试我的代码是否有效。
我可以从 JobClient 获取 RunningJob,但不能获取 org.apache.hadoop.mapreduce.Job。 RunningJob 为我提供来自mapred 包的计数器,而Job 为我提供来自mapreduce 包的计数器。我尝试使用 new Job(conf, "job_id")
,但这只是创建了一个状态为 DEFINE
的空白作业,而不是 FINISHED
。
I'm using org.apache.hadoop.mapreduce.Job
to create/submit/run a MR Job (Cloudera3, 20.2), and after it completes, in a separate application, I'm trying to get the Job to grab the counters to do some work with them so I don't have to re-run the entire MR Job every time to test my code that does work.
I can get a RunningJob
from a JobClient, but not a org.apache.hadoop.mapreduce.Job
. RunningJob gives me Counters from the mapred package, while Job gives me counters from the mapreduce package. I tried using new Job(conf, "job_id")
, but that just creates a blank Job in status DEFINE
, not FINISHED
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是我的做法:
包应该是 org.apache.hadoop.mapred (不要更改它),因为 JobSubmissionProtocol 是受保护的接口。此方法的问题是您无法检索“退休”的职位。因此,我宁愿不依赖于此,而是在工作完成后立即按下计数器。
希望这会有所帮助。
Here is a how I do it :
The package should be
org.apache.hadoop.mapred
(don't change it) sinceJobSubmissionProtocol
is protected interface. The problem with this method is you can't retrieve jobs that are "retired". So I prefer not relaying on this and push the counters as soon as the job completes.Hope this would help.