如何向Hadoop作业报告进度以避免任务超时被杀死？

发布于 2025-01-02 20:38:57 字数 525 浏览 0 评论 0原文

1) 我有一个仅映射的 Hadoop 作业，它将数据流式传输到 Cassandra 集群。

2) 有时，流式传输需要超过 10 分钟，并且由于进度未报告给作业，因此会终止任务。

3）我尝试使用 context.progress() 方法报告进度，但没有帮助。

还需要什么东西来向hadoop job报告进度吗？

我编写了如下示例代码来模拟该问题并使用以下代码。

Thread.sleep(360000);

context.progress();

Thread.sleep(360000);

它失败并显示以下错误消息

12/02/06 11:40:25 信息 mapred.JobClient：任务 ID： attempts_201202061119_0001_m_000001_1，状态：任务失败 attempts_201202061119_0001_m_000001_1 未能报告 601 的状态秒。杀人！

原文

1) I have a map-only Hadoop job which streams the data to the Cassandra cluster.

2) Sometimes streaming takes more than 10 minutes and as the progress is not reported to the job it kills the task.

3) I have tried to report the progress with context.progress() method but it did not help.

Is there anything else needed to report the progress to hadoop job?

I have written a sample code as following to simulate the issue and with the following code.

Thread.sleep(360000);

context.progress();

Thread.sleep(360000);

It fails with following error message

12/02/06 11:40:25 INFO mapred.JobClient: Task Id :
attempt_201202061119_0001_m_000001_1, Status : FAILED Task
attempt_201202061119_0001_m_000001_1 failed to report status for 601
seconds. Killing!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

源来凯始玺欢你 2025-01-09 20:38:57

请看这个问题：
如何修复“任务尝试_201104251139_0295_r_000006_0 未能报告状态600 秒。”

将 mapred.task.timeout 属性设置为更高的值是解决此问题的最简单方法。

回复收藏 0 原文

等待我真够勒 2025-01-09 20:38:57

context.progress() 应该可以工作，但您可能面临以下问题： https://issues.apache.org/jira/browse/MAPREDUCE-1905 ，在后续版本中修复。

回复收藏 0 原文

~没有更多了~

关于作者

绝不放开

暂无简介

文章

25 人气

关注发私信

紫罗兰の梦幻

文章 0 评论 0

关注

-2134

文章 0 评论 0

关注

liuxuanli

文章 0 评论 0

关注

意中人

文章 0 评论 0

关注

○愚か者の日

文章 0 评论 0

关注

xxhui

文章 0 评论 0

友情链接

文江博客

如何向Hadoop作业报告进度以避免任务超时被杀死？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签