让 Spark 应用程序使用所有可用的 YARN 资源
我目前使用的是 5 个 Raspberry Pi 4 (4GB) 的集群,并且安装了 Hadoop 来管理资源。不幸的是,我无法正确配置设置以使用 Apache Spark 应用程序的完整资源(4 个工作节点、1 个主节点),该应用程序是我在 Hadoop 框架之上提交的。
有人知道,我必须如何配置设置才能仅为 1 个应用程序使用完整资源(16 核,14 GB RAM)?
我当前的设置是: 如果
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.memory-mb</name>
<value>3584</value> <!--512-->
</property>
<property>
<name>mapreduce.map.resource.memory-mb</name>
<value>3584</value> <!--256-->
</property>
<property>
<name>mapreduce.reduce.resource.memory-mb</name>
<value>3584</value> <!--256-->
</property>
</configuration>
。
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>pi1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3584</value> <!--1536-->
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>3584</value> <!--1536-->
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>64</value> <!--128-->
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value> <!--128-->
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>8</value> <!--128-->
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>true</value>
</property>
</configuration>
有人有
# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled true
# spark.eventLog.dir hdfs://namenode:8021/directory
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory 5g
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.master yarn
spark.driver.memory 2048m
spark.yarn.am.memory 512m
spark.executor.memory 1024m
spark.executor.cores 4
#spark.driver.memory 512m
#spark.yarn.am.memory 512m
#spark.executor.memory 512m
spark.eventLog.enabled true
spark.eventLog.dir hdfs://pi1:9000/spark-logs
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory hdfs://pi1:9000/spark-logs
spark.history.fs.update.interval 10s
spark.history.ui.port 18080
建议,我将非常感激 :)
PS:如果需要更多信息,请告诉我。
I am currently using a cluster of 5 Raspberry Pi 4 (4GB) and I installed Hadoop to manage the resources. Unfortunately I am not able to config the settings right to use the full resources (4 worker nodes, 1 master node) for the Apache Spark Application, which I submit on top of the Hadoop Framework.
Does somebody knows, how I have to config the settings right to use the full resources (16 cores, 14 GB RAM) for only 1 application?
My current settings are:
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.memory-mb</name>
<value>3584</value> <!--512-->
</property>
<property>
<name>mapreduce.map.resource.memory-mb</name>
<value>3584</value> <!--256-->
</property>
<property>
<name>mapreduce.reduce.resource.memory-mb</name>
<value>3584</value> <!--256-->
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>pi1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3584</value> <!--1536-->
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>3584</value> <!--1536-->
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>64</value> <!--128-->
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value> <!--128-->
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>8</value> <!--128-->
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>true</value>
</property>
</configuration>
spark-defaults.config
# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled true
# spark.eventLog.dir hdfs://namenode:8021/directory
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory 5g
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.master yarn
spark.driver.memory 2048m
spark.yarn.am.memory 512m
spark.executor.memory 1024m
spark.executor.cores 4
#spark.driver.memory 512m
#spark.yarn.am.memory 512m
#spark.executor.memory 512m
spark.eventLog.enabled true
spark.eventLog.dir hdfs://pi1:9000/spark-logs
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory hdfs://pi1:9000/spark-logs
spark.history.fs.update.interval 10s
spark.history.ui.port 18080
If somebody has a suggestion, I would be really thankful. :)
P.s: If more information are required, just tell me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论