调用O91.GetDynamicFrame时发生错误。没有这样的文件或目录

发布于 2025-02-01 12:28:17 字数 1000 浏览 1 评论 0原文

使用AWSGLUESERVICEROLE,我创建了一个胶作业,将一个parquet文件从一个S3存储桶映射到另一个s3。当我尝试运行这项工作时,我会收到以下例外:

“致电O91.getDynamicFrame时发生错误。没有此类文件或目录's3://bucket/path/path/to/file.parquet'

”文件存在。我的第一个想法是这可能是一个权限问题,所以我尝试在本地计算机上使用boto3获取对象,然后将文件恢复到:

import io
import pandas as pd

s3_client = boto3.client('s3')
obj = s3_client.get_object(Bucket='BUCKET', Key='2022/05/24/08/PATH/TO/FILE.parquet')
print(pd.read_parquet(io.BytesIO(obj['Body'].read())))

------------------ Output bellow ------------------
          id  val1           val2  val3
0     model2    0.612707     None  [[2.1931596, 1.5204412, 1.4174217, 1.6540076, ...
1     model2    0.972054     None  [[1.8610013, 2.1553798], [1.8610013, 2.1553798...
2     model2    0.526641     None  [[1.3793343, 1.430331, 2.1639223], [1.3793343,...
3     model2    0.927919     None  [[2.10741, 1.5591071, 2.1414866, 2.920107], [2...
4     model2    0.243281     None  [[1.2257551, 1.515327, 2.0952048, 1.1441619], ...

我缺少什么?

Using the AWSGlueServiceRole, I created a Glue job to map a parquet file from one S3 bucket to another. When I attempt to run the job, I receive the following exception:

"An error occurred while calling o91.getDynamicFrame. No such file or directory 's3://BUCKET/PATH/TO/FILE.parquet'"

According to my check, the file exists. My first thought was that it might be a permissions issue, so I tried getting the object using boto3 on my local machine and got the file back:

import io
import pandas as pd

s3_client = boto3.client('s3')
obj = s3_client.get_object(Bucket='BUCKET', Key='2022/05/24/08/PATH/TO/FILE.parquet')
print(pd.read_parquet(io.BytesIO(obj['Body'].read())))

------------------ Output bellow ------------------
          id  val1           val2  val3
0     model2    0.612707     None  [[2.1931596, 1.5204412, 1.4174217, 1.6540076, ...
1     model2    0.972054     None  [[1.8610013, 2.1553798], [1.8610013, 2.1553798...
2     model2    0.526641     None  [[1.3793343, 1.430331, 2.1639223], [1.3793343,...
3     model2    0.927919     None  [[2.10741, 1.5591071, 2.1414866, 2.920107], [2...
4     model2    0.243281     None  [[1.2257551, 1.515327, 2.0952048, 1.1441619], ...

What am I missing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

最后的乘客 2025-02-08 12:28:17

您需要在存储桶及其对象上的这些权限:

  statement {
    effect = "Allow"
    actions = [
      "s3:Get*",
      "s3:Put*"
    ]
    resources = [
      "arn:aws:s3:::<BUCKET>",
      "arn:aws:s3:::<BUCKET>/*",
    ]
  }

我同意错误消息令人困惑和误导。原因是,作为托管服务,GLUE首先在S3位置执行“列表”操作,然后将结果馈送到getdynamicframe。因此,当它收到accessDenied错误(但不知何故显示我们)和一个空的“列表”,getdynamicframe只是简单地提出no这样的文件或目录错误。总而言之,他们应该在错误处理方面做得更好。

You need these permissions on the bucket and its objects:

  statement {
    effect = "Allow"
    actions = [
      "s3:Get*",
      "s3:Put*"
    ]
    resources = [
      "arn:aws:s3:::<BUCKET>",
      "arn:aws:s3:::<BUCKET>/*",
    ]
  }

I agree that the error message is confusing and misleading. The reason is that, as a managed service, Glue first does a "List" action on the S3 location, and then feeds the results to getDynamicFrame. So when it receives the AccessDenied error (but somehow not showing us) and an empty "list", getDynamicFrame just simply raises the No such file or directory error. All in all, they should have done a better job on the error handling.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文