调用O91.GetDynamicFrame时发生错误。没有这样的文件或目录
使用AWSGLUESERVICEROLE,我创建了一个胶作业,将一个parquet文件从一个S3存储桶映射到另一个s3。当我尝试运行这项工作时,我会收到以下例外:
“致电O91.getDynamicFrame时发生错误。没有此类文件或目录's3://bucket/path/path/to/file.parquet'
”文件存在。我的第一个想法是这可能是一个权限问题,所以我尝试在本地计算机上使用boto3获取对象,然后将文件恢复到:
import io
import pandas as pd
s3_client = boto3.client('s3')
obj = s3_client.get_object(Bucket='BUCKET', Key='2022/05/24/08/PATH/TO/FILE.parquet')
print(pd.read_parquet(io.BytesIO(obj['Body'].read())))
------------------ Output bellow ------------------
id val1 val2 val3
0 model2 0.612707 None [[2.1931596, 1.5204412, 1.4174217, 1.6540076, ...
1 model2 0.972054 None [[1.8610013, 2.1553798], [1.8610013, 2.1553798...
2 model2 0.526641 None [[1.3793343, 1.430331, 2.1639223], [1.3793343,...
3 model2 0.927919 None [[2.10741, 1.5591071, 2.1414866, 2.920107], [2...
4 model2 0.243281 None [[1.2257551, 1.515327, 2.0952048, 1.1441619], ...
我缺少什么?
Using the AWSGlueServiceRole, I created a Glue job to map a parquet file from one S3 bucket to another. When I attempt to run the job, I receive the following exception:
"An error occurred while calling o91.getDynamicFrame. No such file or directory 's3://BUCKET/PATH/TO/FILE.parquet'"
According to my check, the file exists. My first thought was that it might be a permissions issue, so I tried getting the object using boto3 on my local machine and got the file back:
import io
import pandas as pd
s3_client = boto3.client('s3')
obj = s3_client.get_object(Bucket='BUCKET', Key='2022/05/24/08/PATH/TO/FILE.parquet')
print(pd.read_parquet(io.BytesIO(obj['Body'].read())))
------------------ Output bellow ------------------
id val1 val2 val3
0 model2 0.612707 None [[2.1931596, 1.5204412, 1.4174217, 1.6540076, ...
1 model2 0.972054 None [[1.8610013, 2.1553798], [1.8610013, 2.1553798...
2 model2 0.526641 None [[1.3793343, 1.430331, 2.1639223], [1.3793343,...
3 model2 0.927919 None [[2.10741, 1.5591071, 2.1414866, 2.920107], [2...
4 model2 0.243281 None [[1.2257551, 1.515327, 2.0952048, 1.1441619], ...
What am I missing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要在存储桶及其对象上的这些权限:
我同意错误消息令人困惑和误导。原因是,作为托管服务,GLUE首先在S3位置执行“列表”操作,然后将结果馈送到
getdynamicframe
。因此,当它收到accessDenied
错误(但不知何故显示我们)和一个空的“列表”,getdynamicframe
只是简单地提出no这样的文件或目录错误。总而言之,他们应该在错误处理方面做得更好。
You need these permissions on the bucket and its objects:
I agree that the error message is confusing and misleading. The reason is that, as a managed service, Glue first does a "List" action on the S3 location, and then feeds the results to
getDynamicFrame
. So when it receives theAccessDenied
error (but somehow not showing us) and an empty "list",getDynamicFrame
just simply raises theNo such file or directory
error. All in all, they should have done a better job on the error handling.