从 pyspark 访问 s3 时,亚马逊存储桶的证书不匹配
我有 EC2 实例,我试图将 PySpark 配置为从 S3 读取。 我为 EC2 实例设置了完全访问 IAM 角色,并在 Spark 中使用了以下包:
com.amazonaws:aws-java-sdk-bundle:1.11.563,org.apache.hadoop:hadoop-aws:3.3.1
但是,我收到一些新错误,我不确定这是什么意思:
:org.apache.hadoop.fs.s3a.AWSClientIOException:getFileStatus on s3a://bucket_name.stuff/mycsv.csv:com.amazonaws.SdkClientException:无法执行 HTTP 请求:
的证书不 匹配任何主题备用名称:[*.s3.amazonaws.com, s3.amazonaws.com]
I have EC2 instance where I'm trying to configure PySpark to read from S3.
I set a full access IAM role to EC2 instance and used the following packages in spark:
com.amazonaws:aws-java-sdk-bundle:1.11.563,org.apache.hadoop:hadoop-aws:3.3.1
However, I'm getting some new error, and I'm not sure what does it mean:
: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on
s3a://bucket_name.stuff/mycsv.csv: com.amazonaws.SdkClientException: Unable to execute HTTP
request: Certificate for <bucket_name.stuff.s3.amazonaws.com> doesn't
match any of the subject alternative names: [*.s3.amazonaws.com,
s3.amazonaws.com]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
所以问题是 pyspark、hadoop-aws 和 java-sdk 之间的版本不匹配(在找到正确的版本设置之前,我遇到了各种不同的错误)
对我有用的组合是:
So the issue turned out to be a version mismatch between pyspark, hadoop-aws and java-sdk (I was getting all kind of different errors until I found a proper version setup)
The combination that worked for me was: