Hadoop从本地机器上传文件到amazon s3
我正在开发一个 Java MapReduce 应用程序,该应用程序必须能够为某些图片从用户的本地计算机到 S3 存储桶提供上传服务。
问题是该应用程序必须在 EC2 集群上运行,因此我不确定在复制文件时如何引用本地计算机。 copyFromLocalFile(..) 方法需要本地计算机的路径,该路径将是 EC2 集群...
我不确定我是否正确地陈述了问题,有人能理解我的意思吗?
谢谢
I am working on a Java MapReduce app that has to be able to provide an upload service for some pictures from the local machine of the user to an S3 bucket.
The thing is the app must run on an EC2 cluster, so I am not sure how I can refer to the local machine when copying the files. The method copyFromLocalFile(..) needs a path from the local machine which will be the EC2 cluster...
I'm not sure if I stated the problem correctly, can anyone understand what I mean?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您还可以调查 s3distcp: http://docs.amazonwebservices.com/ElasticMapReduce/latest /DeveloperGuide/UsingEMR_s3distcp.html
Apache DistCp 是一个开源工具,可用于复制大量数据。 DistCp 使用 MapReduce 以分布式方式进行复制 — 在多个服务器之间共享副本、错误处理、恢复和报告任务。 S3DistCp 是 DistCp 的扩展,经过优化可与 Amazon Web Services,特别是 Amazon Simple Storage Service (Amazon S3) 配合使用。使用 S3DistCp,您可以高效地将大量数据从 Amazon S3 复制到 HDFS,并由 Amazon Elastic MapReduce (Amazon EMR) 作业流对其进行处理。您还可以使用 S3DistCp 在 Amazon S3 存储桶之间或从 HDFS 复制数据到 Amazon S3。
You might also investigate s3distcp: http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html
Apache DistCp is an open-source tool you can use to copy large amounts of data. DistCp uses MapReduce to copy in a distributed manner—sharing the copy, error handling, recovery, and reporting tasks across several servers. S3DistCp is an extension of DistCp that is optimized to work with Amazon Web Services, particularly Amazon Simple Storage Service (Amazon S3). Using S3DistCp, you can efficiently copy large amounts of data from Amazon S3 into HDFS where it can be processed by your Amazon Elastic MapReduce (Amazon EMR) job flow. You can also use S3DistCp to copy data between Amazon S3 buckets or from HDFS to Amazon S3.
您需要将文件从 userMachine 获取到至少 1 个节点,然后才能通过 MapReduce 使用它们。
FileSystem
和FileUtil
函数引用HDFS
上的路径或集群中节点之一的本地磁盘上的路径。它无法引用用户的本地系统。 (也许如果你做了一些 ssh 设置......也许?)
You will need to get the files from the userMachine to at least 1 node before you will be able to use them through a MapReduce.
The
FileSystem
andFileUtil
functions refer to paths either on theHDFS
or the local disk of one of the nodes in the cluster.It cannot reference the user's local system. (Maybe if you did some ssh setup... maybe?)