让 Spark 应用程序使用所有可用的 YARN 资源

发布于 2025-01-16 11:39:55 字数 3894 浏览 2 评论 0原文

我目前使用的是 5 个 Raspberry Pi 4 (4GB) 的集群,并且安装了 Hadoop 来管理资源。不幸的是,我无法正确配置设置以使用 Apache Spark 应用程序的完整资源(4 个工作节点、1 个主节点),该应用程序是我在 Hadoop 框架之上提交的。

有人知道,我必须如何配置设置才能仅为 1 个应用程序使用完整资源(16 核,14 GB RAM)?

我当前的设置是: 如果

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
  </property>
  <property>
    <name>mapreduce.map.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
  </property>
  <property>
    <name>mapreduce.reduce.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.resource.memory-mb</name>
    <value>3584</value> <!--512-->
  </property>
  <property>
    <name>mapreduce.map.resource.memory-mb</name>
    <value>3584</value> <!--256-->
  </property>
  <property>
    <name>mapreduce.reduce.resource.memory-mb</name>
    <value>3584</value> <!--256-->
  </property>
</configuration>

<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.acl.enable</name>
    <value>0</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>pi1</value>
  </property>  
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>3584</value> <!--1536-->
  </property>
  <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>8</value>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>3584</value> <!--1536-->
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>64</value> <!--128-->
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value> <!--128-->
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>8</value> <!--128-->
  </property>
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>true</value>
  </property>
</configuration>

有人有

# Example:
# spark.master                     spark://master:7077
# spark.eventLog.enabled           true
# spark.eventLog.dir               hdfs://namenode:8021/directory
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

spark.master yarn
spark.driver.memory 2048m
spark.yarn.am.memory 512m
spark.executor.memory 1024m
spark.executor.cores 4
#spark.driver.memory 512m
#spark.yarn.am.memory 512m
#spark.executor.memory 512m

spark.eventLog.enabled true
spark.eventLog.dir hdfs://pi1:9000/spark-logs
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory hdfs://pi1:9000/spark-logs
spark.history.fs.update.interval 10s
spark.history.ui.port 18080

建议,我将非常感激 :)

PS:如果需要更多信息,请告诉我。

I am currently using a cluster of 5 Raspberry Pi 4 (4GB) and I installed Hadoop to manage the resources. Unfortunately I am not able to config the settings right to use the full resources (4 worker nodes, 1 master node) for the Apache Spark Application, which I submit on top of the Hadoop Framework.

Does somebody knows, how I have to config the settings right to use the full resources (16 cores, 14 GB RAM) for only 1 application?

My current settings are:
mapred-site.xml

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
  </property>
  <property>
    <name>mapreduce.map.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
  </property>
  <property>
    <name>mapreduce.reduce.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.resource.memory-mb</name>
    <value>3584</value> <!--512-->
  </property>
  <property>
    <name>mapreduce.map.resource.memory-mb</name>
    <value>3584</value> <!--256-->
  </property>
  <property>
    <name>mapreduce.reduce.resource.memory-mb</name>
    <value>3584</value> <!--256-->
  </property>
</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.acl.enable</name>
    <value>0</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>pi1</value>
  </property>  
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>3584</value> <!--1536-->
  </property>
  <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>8</value>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>3584</value> <!--1536-->
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>64</value> <!--128-->
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value> <!--128-->
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>8</value> <!--128-->
  </property>
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>true</value>
  </property>
</configuration>

spark-defaults.config

# Example:
# spark.master                     spark://master:7077
# spark.eventLog.enabled           true
# spark.eventLog.dir               hdfs://namenode:8021/directory
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

spark.master yarn
spark.driver.memory 2048m
spark.yarn.am.memory 512m
spark.executor.memory 1024m
spark.executor.cores 4
#spark.driver.memory 512m
#spark.yarn.am.memory 512m
#spark.executor.memory 512m

spark.eventLog.enabled true
spark.eventLog.dir hdfs://pi1:9000/spark-logs
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory hdfs://pi1:9000/spark-logs
spark.history.fs.update.interval 10s
spark.history.ui.port 18080

If somebody has a suggestion, I would be really thankful. :)

P.s: If more information are required, just tell me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文