Hadoop从本地机器上传文件到amazon s3

发布于 2024-10-06 11:15:07 字数 224 浏览 2 评论 0原文

我正在开发一个 Java MapReduce 应用程序,该应用程序必须能够为某些图片从用户的本地计算机到 S3 存储桶提供上传服务。

问题是该应用程序必须在 EC2 集群上运行,因此我不确定在复制文件时如何引用本地计算机。 copyFromLocalFile(..) 方法需要本地计算机的路径,该路径将是 EC2 集群...

我不确定我是否正确地陈述了问题,有人能理解我的意思吗?

谢谢

I am working on a Java MapReduce app that has to be able to provide an upload service for some pictures from the local machine of the user to an S3 bucket.

The thing is the app must run on an EC2 cluster, so I am not sure how I can refer to the local machine when copying the files. The method copyFromLocalFile(..) needs a path from the local machine which will be the EC2 cluster...

I'm not sure if I stated the problem correctly, can anyone understand what I mean?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蒲公英的约定 2024-10-13 11:15:07

您还可以调查 s3distcp: http://docs.amazonwebservices.com/ElasticMapReduce/latest /DeveloperGuide/UsingEMR_s3distcp.html

Apache DistCp 是一个开源工具,可用于复制大量数据。 DistCp 使用 MapReduce 以分布式方式进行复制 — 在多个服务器之间共享副本、错误处理、恢复和报告任务。 S3DistCp 是 DistCp 的扩展,经过优化可与 Amazon Web Services,特别是 Amazon Simple Storage Service (Amazon S3) 配合使用。使用 S3DistCp,您可以高效地将大量数据从 Amazon S3 复制到 HDFS,并由 Amazon Elastic MapReduce (Amazon EMR) 作业流对其进行处理。您还可以使用 S3DistCp 在 Amazon S3 存储桶之间或从 HDFS 复制数据到 Amazon S3。

You might also investigate s3distcp: http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html

Apache DistCp is an open-source tool you can use to copy large amounts of data. DistCp uses MapReduce to copy in a distributed manner—sharing the copy, error handling, recovery, and reporting tasks across several servers. S3DistCp is an extension of DistCp that is optimized to work with Amazon Web Services, particularly Amazon Simple Storage Service (Amazon S3). Using S3DistCp, you can efficiently copy large amounts of data from Amazon S3 into HDFS where it can be processed by your Amazon Elastic MapReduce (Amazon EMR) job flow. You can also use S3DistCp to copy data between Amazon S3 buckets or from HDFS to Amazon S3.

忆梦 2024-10-13 11:15:07

您需要将文件从 userMachine 获取到至少 1 个节点,然后才能通过 MapReduce 使用它们。

FileSystemFileUtil 函数引用 HDFS 上的路径或集群中节点之一的本地磁盘上的路径。
它无法引用用户的本地系统。 (也许如果你做了一些 ssh 设置......也许?)

You will need to get the files from the userMachine to at least 1 node before you will be able to use them through a MapReduce.

The FileSystem and FileUtil functions refer to paths either on the HDFS or the local disk of one of the nodes in the cluster.
It cannot reference the user's local system. (Maybe if you did some ssh setup... maybe?)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文