如何向Hadoop作业报告进度以避免任务超时被杀死?
1) 我有一个仅映射的 Hadoop 作业,它将数据流式传输到 Cassandra 集群。
2) 有时,流式传输需要超过 10 分钟,并且由于进度未报告给作业,因此会终止任务。
3)我尝试使用 context.progress() 方法报告进度,但没有帮助。
还需要什么东西来向hadoop job报告进度吗?
我编写了如下示例代码来模拟该问题并使用以下代码。
Thread.sleep(360000);
context.progress();
Thread.sleep(360000);
它失败并显示以下错误消息
12/02/06 11:40:25 信息 mapred.JobClient:任务 ID: attempts_201202061119_0001_m_000001_1,状态:任务失败 attempts_201202061119_0001_m_000001_1 未能报告 601 的状态 秒。杀人!
1) I have a map-only Hadoop job which streams the data to the Cassandra cluster.
2) Sometimes streaming takes more than 10 minutes and as the progress is not reported to the job it kills the task.
3) I have tried to report the progress with context.progress() method but it did not help.
Is there anything else needed to report the progress to hadoop job?
I have written a sample code as following to simulate the issue and with the following code.
Thread.sleep(360000);
context.progress();
Thread.sleep(360000);
It fails with following error message
12/02/06 11:40:25 INFO mapred.JobClient: Task Id :
attempt_201202061119_0001_m_000001_1, Status : FAILED Task
attempt_201202061119_0001_m_000001_1 failed to report status for 601
seconds. Killing!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
请看这个问题:
如何修复“任务尝试_201104251139_0295_r_000006_0 未能报告状态600 秒。”
将
mapred.task.timeout
属性设置为更高的值是解决此问题的最简单方法。Please see this question:
How to fix "Task attempt_201104251139_0295_r_000006_0 failed to report status for 600 seconds."
setting
mapred.task.timeout
property to higher value is the easiest way to fix this problem.context.progress() 应该可以工作,但您可能面临以下问题: https://issues.apache.org/jira/browse/MAPREDUCE-1905 ,在后续版本中修复。
context.progress() should work, but it could be that you are facing the following issue: https://issues.apache.org/jira/browse/MAPREDUCE-1905 , which is fixed in the later versions.