从 S3 到 HDFS 的 Distcp
我正在尝试使用 distcp 工具将数据从 S3 复制到 HDFS。问题是,S3 集群使用 VPC 端点,我不知道如何正确配置 distcp。我已经尝试了几种配置,但没有一个起作用。目前我使用以下命令:
hadoop distcp
-Dfs.s3a.access.key=[KEY]
-Dfs.s3a.secret.key=[SECRET]
-Dfs.s3a.region=eu-west-1
-Dfs.s3a.bucket.[BUCKET NAME].endpoint=https://bucket.vpce-[vpce id].s3.eu-west-1.vpce.amazonaws.com
s3a://[BUCKET NAME]/[FILE]
hdfs://[DESTINATION]/[FILE]
但我收到此错误:
22/03/16 09:14:39 ERROR tools.DistCp: Exception encountered org.apache.hadoop.fs.s3a.AWSBadRequestException: doesBucketExistV2 on [BUCKET NAME]: com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; the region 'vpce' is wrong; expecting 'eu-west-1'
任何想法如何使用 VPC 端点配置 Distcp?
提前致谢
Im trying to copy data from S3 to HDFS using distcp tool. Problem with that is, that S3 cluster uses VPC endpoint and I dont know how to properly configure distcp. I have trtied several configurations but none has worked. Currently Im using following command:
hadoop distcp
-Dfs.s3a.access.key=[KEY]
-Dfs.s3a.secret.key=[SECRET]
-Dfs.s3a.region=eu-west-1
-Dfs.s3a.bucket.[BUCKET NAME].endpoint=https://bucket.vpce-[vpce id].s3.eu-west-1.vpce.amazonaws.com
s3a://[BUCKET NAME]/[FILE]
hdfs://[DESTINATION]/[FILE]
But im getint this error:
22/03/16 09:14:39 ERROR tools.DistCp: Exception encountered org.apache.hadoop.fs.s3a.AWSBadRequestException: doesBucketExistV2 on [BUCKET NAME]: com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; the region 'vpce' is wrong; expecting 'eu-west-1'
Any ideas how Distcp should be configured with VPC endpoints?
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
storediag< /code> 在接近 distcp 之前调试此命令。
storediag
command to debug this before going near distcp.