Amazon Elastic Map Reduce - 保持服务器处于活动状态?

发布于 2024-08-26 15:27:55 字数 113 浏览 9 评论 0原文

我正在 EMR 中测试作业,每个测试都需要很长时间才能启动。有没有办法让服务器/主节点在 Amazon EMR 中保持活动状态?我知道这可以通过 API 来完成。但是,我想知道这是否可以在 aws 控制台中完成?

I am testing jobs in EMR and each and every test takes a lot of time to start up. Is there a way to keep the server/master node alive in Amazon EMR? I know this can be done with the API. But, I wanted to know if this can be done in the aws console?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

白色秋天 2024-09-02 15:27:55

您无法从 AWS 控制台执行此操作。引用开发者指南

AWS 管理控制台中的 Amazon Elastic MapReduce 选项卡不支持向作业流程添加步骤。

您只能通过 CLI 和 API 创建作业流程,然后向其中添加步骤来执行此操作。

$ ./elastic-mapreduce --create --active --stream

You cannot do this from the AWS console. To quote the developer guide

The Amazon Elastic MapReduce tab in the AWS Management Console does not support adding steps to a job flow.

You can only do this via the CLI and API, by creating a job flow, then adding steps to it.

$ ./elastic-mapreduce --create --active --stream
请帮我爱他 2024-09-02 15:27:55

您无法使用 Web 控制台执行此操作 - 但通过 API 和编程工具,您将能够向长时间运行的作业添加多个步骤,这就是我所做的。这样,您就可以在同一个长期运行的集群上一个接一个地启动作业,而不必每次都重新创建一个新作业。

如果您熟悉 Python,我强烈推荐 Boto 库。其他 AWS API 工具也可以让您执行此操作。

如果您遵循 Boto EMR 教程,您会发现一些示例

:给你一个想法,这就是我所做的(对于流作业):

# Connect to EMR
conn = boto.connect_emr()

# Start long-running job, don't forget keep_alive setting
jobid = conn.run_jobflow(name='My jobflow',
                          log_uri='s3://<my log uri>/jobflow_logs',
                          keep_alive=True)

# Create your streaming job
step = StreamingStep(...)

# Add the step to the job
conn.add_jobflow_steps(jobid, [step])

# Wait till its complete
while True:
  state = conn.describe_jobflow(jobid).steps[-1].state
  if (state == "COMPLETED"):
    break
  if (state == "FAILED") or (state == "TERMINATED") or (state == "CANCELLED"):
    print >> sys.stderr, ("EMR job failed! Message = %s!") % (state)
    sys.exit(1)
  time.sleep (60)

# Create your next job here and add it to the EMR cluster
step = StreamingStep(...)
conn.add_jobflow_steps(jobid, [step])

# Repeat :)

You can't do this with the web console - but through the API and programming tools, you will be able to add multiple steps to a long-running job, which is what I do. That way you can fire off jobs one after the other on the same long-running cluster, without having to re-create a new one each time.

If you are familiar with Python, I highly recommend the Boto library. The other AWS API tools let you do this as well.

If you follow the Boto EMR tutorial, you'll find some examples:

Just to give you an idea, this is what I do (with streaming jobs):

# Connect to EMR
conn = boto.connect_emr()

# Start long-running job, don't forget keep_alive setting
jobid = conn.run_jobflow(name='My jobflow',
                          log_uri='s3://<my log uri>/jobflow_logs',
                          keep_alive=True)

# Create your streaming job
step = StreamingStep(...)

# Add the step to the job
conn.add_jobflow_steps(jobid, [step])

# Wait till its complete
while True:
  state = conn.describe_jobflow(jobid).steps[-1].state
  if (state == "COMPLETED"):
    break
  if (state == "FAILED") or (state == "TERMINATED") or (state == "CANCELLED"):
    print >> sys.stderr, ("EMR job failed! Message = %s!") % (state)
    sys.exit(1)
  time.sleep (60)

# Create your next job here and add it to the EMR cluster
step = StreamingStep(...)
conn.add_jobflow_steps(jobid, [step])

# Repeat :)
终陌 2024-09-02 15:27:55

为了让机器保持活力,启动一个交互式的猪会话。然后机器就不会关机了。然后,您可以使用以下命令从命令行执行映射/归约逻辑:

cat infile.txt | yourMapper | sort | yourReducer > outfile.txt

to keep the machine alive start an interactive pig session. Then the machine won't shut down. You can then execute your map/reduce logic from the command line using:

cat infile.txt | yourMapper | sort | yourReducer > outfile.txt
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文