hadoop.tmp.dir 应该是什么?

发布于 2024-08-23 03:30:41 字数 286 浏览 7 评论 0原文

Hadoop 有配置参数 hadoop.tmp.dir,根据文档,它是““其他临时目录的基础。” 我推测,此路径指的是本地文件系统。

我将此值设置为 /mnt/hadoop-tmp/hadoop-${user.name}。格式化名称节点并启动所有服务后,我看到在 HDFS 上创建了完全相同的路径。

这是否意味着 hadoop.tmp.dir 指的是 HDFS 上的临时位置?

Hadoop has configuration parameter hadoop.tmp.dir which, as per documentation, is `"A base for other temporary directories." I presume, this path refers to local file system.

I set this value to /mnt/hadoop-tmp/hadoop-${user.name}. After formatting the namenode and starting all services, I see exactly same path created on HDFS.

Does this mean, hadoop.tmp.dir refers to temporary location on HDFS?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

谎言月老 2024-08-30 03:30:41

这很令人困惑,但是 hadoop.tmp.dir 被用作本地临时目录的基础,在 HDFS 中也是如此。该文档不是很好,但是 mapred.system.dir 默认设置为 "${hadoop.tmp.dir}/mapred/system",这定义了Map/Reduce 框架存储系统文件的 HDFS 路径。

如果您不希望将它们绑定在一起,您可以编辑 mapred-site.xml ,使 mapred.system.dir 的定义不与 ${hadoop.tmp 绑定.dir}

It's confusing, but hadoop.tmp.dir is used as the base for temporary directories locally, and also in HDFS. The document isn't great, but mapred.system.dir is set by default to "${hadoop.tmp.dir}/mapred/system", and this defines the Path on the HDFS where where the Map/Reduce framework stores system files.

If you want these to not be tied together, you can edit your mapred-site.xml such that the definition of mapred.system.dir is something that's not tied to ${hadoop.tmp.dir}

緦唸λ蓇 2024-08-30 03:30:41

让我在 kkrugler 的答案中添加一些内容:

有三个 HDFS 属性,其值中包含 hadoop.tmp.dir

  1. dfs.name.dir:namenode 存储的目录它的元数据,默认值为 ${hadoop.tmp.dir}/dfs/name
  2. dfs.data.dir:HDFS数据块存储的目录,默认值为${hadoop.tmp.dir}/dfs/data
  3. fs.checkpoint.dir:辅助namenode存储检查点的目录,默认值为${hadoop.tmp.dir}/dfs/namesecondary

这就是为什么在格式化 namenode 后您会在 HDFS 中看到 /mnt/hadoop-tmp/hadoop-${user.name}

Let me add a bit more to kkrugler's answer:

There're three HDFS properties which contain hadoop.tmp.dir in their values

  1. dfs.name.dir: directory where namenode stores its metadata, with default value ${hadoop.tmp.dir}/dfs/name.
  2. dfs.data.dir: directory where HDFS data blocks are stored, with default value ${hadoop.tmp.dir}/dfs/data.
  3. fs.checkpoint.dir: directory where secondary namenode store its checkpoints, default value is ${hadoop.tmp.dir}/dfs/namesecondary.

This is why you saw the /mnt/hadoop-tmp/hadoop-${user.name} in your HDFS after formatting namenode.

深陷 2024-08-30 03:30:41

四处寻找有关此的信息。我唯一能想到的是 Amazon Elastic 上的这篇文章MapReduce 开发指南

在hadoop-site.xml中,我们设置
hadoop.tmp.dir 到
/mnt/var/lib/hadoop/tmp. /mnt 是哪里
我们安装“额外”的 EC2 卷,
其中可以包含比
默认音量。 (具体金额
取决于实例类型。)Hadoop 的
RunJar.java(解压的模块
输入 JAR)解释
hadoop.tmp.dir 作为 Hadoop 文件系统
路径而不是本地路径,所以它
写入 HDFS 中的路径而不是
本地路径。 HDFS挂载在
/mnt(具体来说
/mnt/var/lib/hadoop/dfs/.所以,你可以
向其中写入大量数据。

Had a look around for information on this one. Only thing I could come up with was this post on the Amazon Elastic MapReduce Dev Guide:

In hadoop-site.xml, we set
hadoop.tmp.dir to
/mnt/var/lib/hadoop/tmp. /mnt is where
we mount the “extra” EC2 volumes,
which can contain a lot more data than
the default volume. (The exact amount
depends on instance type.) Hadoop's
RunJar.java (the module that unpacks
the input JARs) interprets
hadoop.tmp.dir as a Hadoop file system
path rather than a local path, so it
writes to the path in HDFS instead of
a local path. HDFS is mounted under
/mnt (specifically
/mnt/var/lib/hadoop/dfs/. So, you can
write lots of data to it.

只有一腔孤勇 2024-08-30 03:30:41

hadoop.tmp.dir 是 Hadoop 的临时目录,它是一个本地目录(非 HDFS),从 Hadoop 3.4.0 开始它是由默认 (core-default.xml)

<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>

不同的进程/服务使用 hadoop.tmp.dir 的子文件夹作为其临时数据。

# cd $HADOOP_HOME;grep -lrH --include="*.xml" "hadoop.tmp.dir" 
share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

所有直接依赖于 hadoop.tmp.dir 的属性都可以通过以下方式提取:

for f in $(grep -lrH --include="*.xml" "hadoop.tmp.dir" $HADOOP_HOME);do
basename $f
xmllint --xpath '/configuration/property[contains(value,"hadoop.tmp.dir")]' $f
echo
done

hdfs-default.xml

<property>
  <name>dfs.namenode.name.dir</name>
  <value>file://${hadoop.tmp.dir}/dfs/name</value>
  <description>Determines where on the local filesystem the DFS name node
      should store the name table(fsimage).  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>file://${hadoop.tmp.dir}/dfs/data</value>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices. The directories should be tagged
  with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]/[NVDIMM]) for HDFS
  storage policies. The default storage type will be DISK if the directory does
  not have a storage type tagged explicitly. Directories that do not exist will
  be created if local filesystem permission allows.
  </description>
</property>
<property>
  <name>dfs.namenode.checkpoint.dir</name>
  <value>file://${hadoop.tmp.dir}/dfs/namesecondary</value>
  <description>Determines where on the local filesystem the DFS secondary
      name node should store the temporary images to merge.
      If this is a comma-delimited list of directories then the image is
      replicated in all of the directories for redundancy.
  </description>
</property>

core-default.xml

<property>
  <name>io.seqfile.local.dir</name>
  <value>${hadoop.tmp.dir}/io/local</value>
  <description>The local directory where sequence file stores intermediate
  data files during merge.  May be a comma-separated list of
  directories on different devices in order to spread disk i/o.
  Directories that do not exist are ignored.
  </description>
</property>
<property>
  <name>fs.s3a.buffer.dir</name>
  <value>${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/s3a</value>
  <description>Comma separated list of directories that will be used to buffer file
    uploads to.
    Yarn container path will be used as default value on yarn applications,
    otherwise fall back to hadoop.tmp.dir
  </description>
</property>
<property>
    <name>fs.azure.buffer.dir</name>
    <value>${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/abfs</value>
    <description>Directory path for buffer files needed to upload data blocks
      in AbfsOutputStream.
      Yarn container path will be used as default value on yarn applications,
      otherwise fall back to hadoop.tmp.dir </description>
  </property>

ma​​pred -default.xml

<property>
  <name>mapreduce.cluster.local.dir</name>
  <value>${hadoop.tmp.dir}/mapred/local</value>
  <description>
      The local directory where MapReduce stores intermediate
      data files.  May be a comma-separated list of
      directories on different devices in order to spread disk i/o.
      Directories that do not exist are ignored.
  </description>
</property>
<property>
  <name>mapreduce.jobhistory.recovery.store.fs.uri</name>
  <value>${hadoop.tmp.dir}/mapred/history/recoverystore</value>
  <!--value>hdfs://localhost:9000/mapred/history/recoverystore</value-->
  <description>The URI where history server state will be stored if
  HistoryServerFileSystemStateStoreService is configured as the recovery
  storage class.</description>
</property>
<property>
  <name>mapreduce.jobhistory.recovery.store.leveldb.path</name>
  <value>${hadoop.tmp.dir}/mapred/history/recoverystore</value>
  <description>The URI where history server state will be stored if
  HistoryServerLeveldbSystemStateStoreService is configured as the recovery
  storage class.</description>
</property>

yarn-default.xml

<property>
    <description>URI pointing to the location of the FileSystem path where
    RM state will be stored. This must be supplied when using
    org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
    as the value for yarn.resourcemanager.store.class</description>
    <name>yarn.resourcemanager.fs.state-store.uri</name>
    <value>${hadoop.tmp.dir}/yarn/system/rmstore</value>
    <!--value>hdfs://localhost:9000/rmstore</value-->
  </property>
<property>
    <description>Local path where the RM state will be stored when using
    org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore
    as the value for yarn.resourcemanager.store.class</description>
    <name>yarn.resourcemanager.leveldb-state-store.path</name>
    <value>${hadoop.tmp.dir}/yarn/system/rmstore</value>
  </property>
<property>
    <description>List of directories to store localized files in. An 
      application's localized file directory will be found in:
      ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
      Individual containers' work directories, called container_${contid}, will
      be subdirectories of this.
   </description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>${hadoop.tmp.dir}/nm-local-dir</value>
  </property>
<property>
    <description>The local filesystem directory in which the node manager will
    store state when recovery is enabled.</description>
    <name>yarn.nodemanager.recovery.dir</name>
    <value>${hadoop.tmp.dir}/yarn-nm-recovery</value>
  </property>
<property>
    <description>Store file name for leveldb timeline store.</description>
    <name>yarn.timeline-service.leveldb-timeline-store.path</name>
    <value>${hadoop.tmp.dir}/yarn/timeline</value>
  </property>
<property>
    <description>Store file name for leveldb state store.</description>
    <name>yarn.timeline-service.leveldb-state-store.path</name>
    <value>${hadoop.tmp.dir}/yarn/timeline</value>
  </property>
<property>
    <description>
      The storage path for LevelDB implementation of configuration store,
      when yarn.scheduler.configuration.store.class is configured to be
      "leveldb".
    </description>
    <name>yarn.scheduler.configuration.leveldb-store.path</name>
    <value>${hadoop.tmp.dir}/yarn/system/confstore</value>
  </property>
<property>
    <description>
      The file system directory to store the configuration files. The path
      can be any format as long as it follows hadoop compatible schema,
      for example value "file:///path/to/dir" means to store files on local
      file system, value "hdfs:///path/to/dir" means to store files on HDFS.
      If resource manager HA is enabled, recommended to use hdfs schema so
      it works in fail-over scenario.
    </description>
    <name>yarn.scheduler.configuration.fs.path</name>
    <value>file://${hadoop.tmp.dir}/yarn/system/schedconf</value>
  </property>

除此之外,还有二级依赖项,例如 dfs.namenode.checkpoint.edits.dir ,具体取决于dfs.namenode.checkpoint.dir

<property>
  <name>dfs.namenode.checkpoint.edits.dir</name>
  <value>${dfs.namenode.checkpoint.dir}</value>
  <description>Determines where on the local filesystem the DFS secondary
      name node should store the temporary edits to merge.
      If this is a comma-delimited list of directories then the edits is
      replicated in all of the directories for redundancy.
      Default value is same as dfs.namenode.checkpoint.dir
  </description>
</property>

所有属性的默认值都可以在相应的-site.xml 文件中覆盖。

hadoop.tmp.dir is Hadoop's temporary directory, it's a local directory (non-HDFS) and as of Hadoop 3.4.0 it is by default (core-default.xml)

<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>

Different processes/services use subfolders of hadoop.tmp.dir for their temporary data.

# cd $HADOOP_HOME;grep -lrH --include="*.xml" "hadoop.tmp.dir" 
share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

All properties depending directly on hadoop.tmp.dir can be extracted by:

for f in $(grep -lrH --include="*.xml" "hadoop.tmp.dir" $HADOOP_HOME);do
basename $f
xmllint --xpath '/configuration/property[contains(value,"hadoop.tmp.dir")]' $f
echo
done

hdfs-default.xml

<property>
  <name>dfs.namenode.name.dir</name>
  <value>file://${hadoop.tmp.dir}/dfs/name</value>
  <description>Determines where on the local filesystem the DFS name node
      should store the name table(fsimage).  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>file://${hadoop.tmp.dir}/dfs/data</value>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices. The directories should be tagged
  with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]/[NVDIMM]) for HDFS
  storage policies. The default storage type will be DISK if the directory does
  not have a storage type tagged explicitly. Directories that do not exist will
  be created if local filesystem permission allows.
  </description>
</property>
<property>
  <name>dfs.namenode.checkpoint.dir</name>
  <value>file://${hadoop.tmp.dir}/dfs/namesecondary</value>
  <description>Determines where on the local filesystem the DFS secondary
      name node should store the temporary images to merge.
      If this is a comma-delimited list of directories then the image is
      replicated in all of the directories for redundancy.
  </description>
</property>

core-default.xml

<property>
  <name>io.seqfile.local.dir</name>
  <value>${hadoop.tmp.dir}/io/local</value>
  <description>The local directory where sequence file stores intermediate
  data files during merge.  May be a comma-separated list of
  directories on different devices in order to spread disk i/o.
  Directories that do not exist are ignored.
  </description>
</property>
<property>
  <name>fs.s3a.buffer.dir</name>
  <value>${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/s3a</value>
  <description>Comma separated list of directories that will be used to buffer file
    uploads to.
    Yarn container path will be used as default value on yarn applications,
    otherwise fall back to hadoop.tmp.dir
  </description>
</property>
<property>
    <name>fs.azure.buffer.dir</name>
    <value>${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/abfs</value>
    <description>Directory path for buffer files needed to upload data blocks
      in AbfsOutputStream.
      Yarn container path will be used as default value on yarn applications,
      otherwise fall back to hadoop.tmp.dir </description>
  </property>

mapred-default.xml

<property>
  <name>mapreduce.cluster.local.dir</name>
  <value>${hadoop.tmp.dir}/mapred/local</value>
  <description>
      The local directory where MapReduce stores intermediate
      data files.  May be a comma-separated list of
      directories on different devices in order to spread disk i/o.
      Directories that do not exist are ignored.
  </description>
</property>
<property>
  <name>mapreduce.jobhistory.recovery.store.fs.uri</name>
  <value>${hadoop.tmp.dir}/mapred/history/recoverystore</value>
  <!--value>hdfs://localhost:9000/mapred/history/recoverystore</value-->
  <description>The URI where history server state will be stored if
  HistoryServerFileSystemStateStoreService is configured as the recovery
  storage class.</description>
</property>
<property>
  <name>mapreduce.jobhistory.recovery.store.leveldb.path</name>
  <value>${hadoop.tmp.dir}/mapred/history/recoverystore</value>
  <description>The URI where history server state will be stored if
  HistoryServerLeveldbSystemStateStoreService is configured as the recovery
  storage class.</description>
</property>

yarn-default.xml

<property>
    <description>URI pointing to the location of the FileSystem path where
    RM state will be stored. This must be supplied when using
    org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
    as the value for yarn.resourcemanager.store.class</description>
    <name>yarn.resourcemanager.fs.state-store.uri</name>
    <value>${hadoop.tmp.dir}/yarn/system/rmstore</value>
    <!--value>hdfs://localhost:9000/rmstore</value-->
  </property>
<property>
    <description>Local path where the RM state will be stored when using
    org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore
    as the value for yarn.resourcemanager.store.class</description>
    <name>yarn.resourcemanager.leveldb-state-store.path</name>
    <value>${hadoop.tmp.dir}/yarn/system/rmstore</value>
  </property>
<property>
    <description>List of directories to store localized files in. An 
      application's localized file directory will be found in:
      ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
      Individual containers' work directories, called container_${contid}, will
      be subdirectories of this.
   </description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>${hadoop.tmp.dir}/nm-local-dir</value>
  </property>
<property>
    <description>The local filesystem directory in which the node manager will
    store state when recovery is enabled.</description>
    <name>yarn.nodemanager.recovery.dir</name>
    <value>${hadoop.tmp.dir}/yarn-nm-recovery</value>
  </property>
<property>
    <description>Store file name for leveldb timeline store.</description>
    <name>yarn.timeline-service.leveldb-timeline-store.path</name>
    <value>${hadoop.tmp.dir}/yarn/timeline</value>
  </property>
<property>
    <description>Store file name for leveldb state store.</description>
    <name>yarn.timeline-service.leveldb-state-store.path</name>
    <value>${hadoop.tmp.dir}/yarn/timeline</value>
  </property>
<property>
    <description>
      The storage path for LevelDB implementation of configuration store,
      when yarn.scheduler.configuration.store.class is configured to be
      "leveldb".
    </description>
    <name>yarn.scheduler.configuration.leveldb-store.path</name>
    <value>${hadoop.tmp.dir}/yarn/system/confstore</value>
  </property>
<property>
    <description>
      The file system directory to store the configuration files. The path
      can be any format as long as it follows hadoop compatible schema,
      for example value "file:///path/to/dir" means to store files on local
      file system, value "hdfs:///path/to/dir" means to store files on HDFS.
      If resource manager HA is enabled, recommended to use hdfs schema so
      it works in fail-over scenario.
    </description>
    <name>yarn.scheduler.configuration.fs.path</name>
    <value>file://${hadoop.tmp.dir}/yarn/system/schedconf</value>
  </property>

In addition to that, there are second level dependencies such as dfs.namenode.checkpoint.edits.dir depending on dfs.namenode.checkpoint.dir:

<property>
  <name>dfs.namenode.checkpoint.edits.dir</name>
  <value>${dfs.namenode.checkpoint.dir}</value>
  <description>Determines where on the local filesystem the DFS secondary
      name node should store the temporary edits to merge.
      If this is a comma-delimited list of directories then the edits is
      replicated in all of the directories for redundancy.
      Default value is same as dfs.namenode.checkpoint.dir
  </description>
</property>

All properties' default values can be overridden in the corresponding -site.xml files.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文