如何在非 Java 客户端中从 HDFS 读取文件
因此,我的 MR 作业会生成一个报告文件,最终用户需要能够下载该文件,最终用户需要单击普通 Web 报告界面上的按钮,并让它下载输出。根据这本 O'Reilly 书籍摘录,有是一个HTTP只读接口。它说它是基于 XML 的,但它似乎只是旨在通过 Web 浏览器查看的普通 Web 界面,而不是可以以编程方式查询、列出和下载的东西。我唯一的办法就是编写自己的基于 servlet 的界面吗?或者执行hadoop cli工具?
So my MR Job generates a report file, and that file needs to be able to be downloaded by an end-user who needs to click a button on a normal web reporting interface, and have it download the output. According to this O'Reilly book excerpt, there is an HTTP read-only interface. It says it's XML based, but it seems that it's simply the normal web interface intended to be viewed through a web browser, not something that can be programatically queried, listed, and downloaded. Is my only recourse to write my own servlet based interface? Or execute the hadoop cli tool?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
通过 Java 以外的方式以编程方式访问 HDFS 的方法是使用 Trift。
HDFS 源代码树中包含针对多种语言(Java、Python、PHP 等)预先生成的客户端类。
请参阅http://wiki.apache.org/hadoop/HDFS-APIs
The way to access HDFS programmatically from something other than Java is by using Trift.
There are pre-generated client classes for several languages (Java, Python, PHP, ...) included in the HDFS source tree.
See http://wiki.apache.org/hadoop/HDFS-APIs
恐怕您可能不得不接受 CLI AFAIK。
不确定它是否适合您的情况,但我认为让任何启动 MR 作业的脚本在作业完成后执行
hadoop dfs -get ...
到已知目录是合理的服务。抱歉,我不知道更简单的解决方案。
I'm afraid you will probably have to settle with the CLI AFAIK.
Not sure if it would fit your situation, but I think it would be reasonable to have whatever script that kicks off the MR job do a
hadoop dfs -get ...
after job completion to a known directory that's served.Sorry that I don't know of an easier solution.