hadoop.tmp.dir 应该是什么?
Hadoop 有配置参数 hadoop.tmp.dir
,根据文档,它是““其他临时目录的基础。” 我推测,此路径指的是本地文件系统。
我将此值设置为 /mnt/hadoop-tmp/hadoop-${user.name}
。格式化名称节点并启动所有服务后,我看到在 HDFS 上创建了完全相同的路径。
这是否意味着 hadoop.tmp.dir
指的是 HDFS 上的临时位置?
Hadoop has configuration parameter hadoop.tmp.dir
which, as per documentation, is `"A base for other temporary directories." I presume, this path refers to local file system.
I set this value to /mnt/hadoop-tmp/hadoop-${user.name}
. After formatting the namenode and starting all services, I see exactly same path created on HDFS.
Does this mean, hadoop.tmp.dir
refers to temporary location on HDFS?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这很令人困惑,但是
hadoop.tmp.dir
被用作本地临时目录的基础,在 HDFS 中也是如此。该文档不是很好,但是mapred.system.dir
默认设置为"${hadoop.tmp.dir}/mapred/system"
,这定义了Map/Reduce 框架存储系统文件的 HDFS 路径。如果您不希望将它们绑定在一起,您可以编辑
mapred-site.xml
,使 mapred.system.dir 的定义不与${hadoop.tmp 绑定.dir}
It's confusing, but
hadoop.tmp.dir
is used as the base for temporary directories locally, and also in HDFS. The document isn't great, butmapred.system.dir
is set by default to"${hadoop.tmp.dir}/mapred/system"
, and this defines the Path on the HDFS where where the Map/Reduce framework stores system files.If you want these to not be tied together, you can edit your
mapred-site.xml
such that the definition of mapred.system.dir is something that's not tied to${hadoop.tmp.dir}
让我在 kkrugler 的答案中添加一些内容:
有三个 HDFS 属性,其值中包含
hadoop.tmp.dir
dfs.name.dir
:namenode 存储的目录它的元数据,默认值为${hadoop.tmp.dir}/dfs/name
。dfs.data.dir
:HDFS数据块存储的目录,默认值为${hadoop.tmp.dir}/dfs/data
。fs.checkpoint.dir
:辅助namenode存储检查点的目录,默认值为${hadoop.tmp.dir}/dfs/namesecondary
。这就是为什么在格式化 namenode 后您会在 HDFS 中看到
/mnt/hadoop-tmp/hadoop-${user.name}
。Let me add a bit more to kkrugler's answer:
There're three HDFS properties which contain
hadoop.tmp.dir
in their valuesdfs.name.dir
: directory where namenode stores its metadata, with default value${hadoop.tmp.dir}/dfs/name
.dfs.data.dir
: directory where HDFS data blocks are stored, with default value${hadoop.tmp.dir}/dfs/data
.fs.checkpoint.dir
: directory where secondary namenode store its checkpoints, default value is${hadoop.tmp.dir}/dfs/namesecondary
.This is why you saw the
/mnt/hadoop-tmp/hadoop-${user.name}
in your HDFS after formatting namenode.四处寻找有关此的信息。我唯一能想到的是 Amazon Elastic 上的这篇文章MapReduce 开发指南:
Had a look around for information on this one. Only thing I could come up with was this post on the Amazon Elastic MapReduce Dev Guide:
hadoop.tmp.dir
是 Hadoop 的临时目录,它是一个本地目录(非 HDFS),从 Hadoop 3.4.0 开始它是由默认 (core-default.xml
)不同的进程/服务使用
hadoop.tmp.dir
的子文件夹作为其临时数据。所有直接依赖于
hadoop.tmp.dir
的属性都可以通过以下方式提取:hdfs-default.xml
core-default.xml
mapred -default.xml
yarn-default.xml
除此之外,还有二级依赖项,例如 dfs.namenode.checkpoint.edits.dir ,具体取决于
dfs.namenode.checkpoint.dir
:所有属性的默认值都可以在相应的
-site.xml
文件中覆盖。hadoop.tmp.dir
is Hadoop's temporary directory, it's a local directory (non-HDFS) and as of Hadoop 3.4.0 it is by default (core-default.xml
)Different processes/services use subfolders of
hadoop.tmp.dir
for their temporary data.All properties depending directly on
hadoop.tmp.dir
can be extracted by:hdfs-default.xml
core-default.xml
mapred-default.xml
yarn-default.xml
In addition to that, there are second level dependencies such as
dfs.namenode.checkpoint.edits.dir
depending ondfs.namenode.checkpoint.dir
:All properties' default values can be overridden in the corresponding
-site.xml
files.