hadoop dfs -copyFromLocal src dest

发布于 2024-12-24 21:47:04 字数 1177 浏览 1 评论 0原文

我的问题是为什么我们需要指定一个目标。我放入hdfs的文件不一定完全位于本地机器上,那么在命令中指定dest有什么用呢?

当我通过命令 lie 运行命令,然后执行 hadoop dfs -ls 时,我可以看到我的文件列在 hdfs 中,但是当我使用编程语法创建文件

FileSystem fs      = FileSystem.get(conf);
Path filenamePath  = new Path("hello.txt");
fs.create(filenamePath);

,然后执行 hadoop dfs -ls 时,我找不到这个文件。

在我的 core-site.xml 中,我有以下内容...

<!-- In: conf/core-site.xml -->
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/apurv/hadoop/hdfs</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

直观上来说,复制的文件驻留在哪里对我来说没有意义,因为它可能足够大,可以驻留在一台计算机上。

My question is that why do we need to specify a dest. The file which I am putting into hdfs does not necessarily lie entirely on the local machine, so what is the use of specifying dest in the command.

When I run the command via command lie and then later do hadoop dfs -ls I can see my file getting listed in the hdfs but when I create the file pro-grammatically using

FileSystem fs      = FileSystem.get(conf);
Path filenamePath  = new Path("hello.txt");
fs.create(filenamePath);

and then later do hadoop dfs -ls I can't find this file.

In my core-site.xml I have the following...

<!-- In: conf/core-site.xml -->
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/apurv/hadoop/hdfs</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

Intuitively also it does not make sense to me where does the copied file reside, as it might be large enough to reside on a single machine.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

高跟鞋的旋律 2024-12-31 21:47:04

我们在“谈论它”上进行了交谈,我有更多时间向您解释这一点。

如果您在代码中使用此代码片段:

FileSystem fs      = FileSystem.get(conf);
// stuff to create

那么 conf 对象内部的内容就很重要。如果您不向其中添加任何内容,则返回的 FileSystem 始终是本地的。

如果您将其放入conf中:

conf.set("fs.default.name", "hdfs://localhost:54310");

那么您应该通过该“服务器”上的名称节点连接到HDFS,并且您可以写入HDFS。

如果您想让配置读取 XML,则必须使用 #addResource() 方法。

查看此处的文档:
http://hadoop.apache.org /common/docs/current/api/org/apache/hadoop/conf/Configuration.html

示例用法可能是:

Configuration conf = new Configuration();
conf.addResource(new Path("/usr/local/hadoop/conf/hdfs-site.xml"));

然后所有 hdfs-site.xml 映射都将位于您的会议。

玩一下它,感觉真的很直观。至少对我来说;)

We chatted on Talk about it and I have a bit more time to explain this to you.

If you use this snippet in your code:

FileSystem fs      = FileSystem.get(conf);
// stuff to create

then it is important what is inside the conf object. If you put nothing into it, the FileSystem returned is always local.

If you put this in your conf:

conf.set("fs.default.name", "hdfs://localhost:54310");

then you should be connected to your HDFS via the namenode on that "server" and you are able to write to HDFS.

If you want to let the configuration read the XMLs, then you have to use the #addResource() methods.

Look into the documentation here:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/conf/Configuration.html

A sample usage could be:

Configuration conf = new Configuration();
conf.addResource(new Path("/usr/local/hadoop/conf/hdfs-site.xml"));

Then all your hdfs-site.xml mappings will be inside your conf.

Play arround a bit with it, it really feels intuitive. At least for me ;)

零度℉ 2024-12-31 21:47:04

FileSystem#Create(Path) 打开指向指定路径的流。在文件可见之前必须关闭流。

我的问题是为什么我们需要指定 dest。我放入hdfs的文件不一定完全在本地机器上,所以在命令中指定dest有什么用。

不确定您的意思,但目的地指定了目标位置。

FileSystem#Create(Path) opens a stream to the indicated path. The stream has to be closed before the file is visible.

My question is that why do we need to specify a dest. The file which I am putting into hdfs does not necessarily lie entirely on the local machine, so what is the use of specifying dest in the command.

Not sure exactly what you mean, but destination specifies the target location.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文