正确访问 Glue 中的数据目录表

发布于 2025-01-12 12:42:44 字数 786 浏览 3 评论 0原文

我在 Athena 中创建了一个表，没有来自 S3 源的爬虫。它出现在我的数据目录中。然而，当我尝试通过 Glue ETL 中的 python 作业访问它时，它显示它没有列或任何数据。访问列时弹出以下错误：AttributeError: 'DataFrame' object has no attribute ''。

我试图按照粘合方式访问动态框架：

datasource = glueContext.create_dynamic_frame.from_catalog(
  database="datacatalog_database",
  table_name="table_name",
  transformation_ctx="datasource"
)

print(f"Count: {datasource.count()}")
print(f"Schema: {datasource.schema()}")

上面的日志输出：Count：0 & 架构：StructType([], {})，其中 Athena 表显示我有大约 800,000 行。

旁注：

相关 ETL 作业已附加 AWSGlueServiceRole。
我也尝试了 Glue 可视化编辑器，它显示了相关的数据目录数据库/表，但遗憾的是，同样的错误。

原文

I created a table in Athena without a crawler from S3 source. It is showing up in my datacatalog. However, when I try to access it through a python job in Glue ETL, it shows that it has no column or any data. The following error pops up when accessing a column: AttributeError: 'DataFrame' object has no attribute '<COLUMN-NAME>'.

I am trying to access the dynamic frame following the glue way:

datasource = glueContext.create_dynamic_frame.from_catalog(
  database="datacatalog_database",
  table_name="table_name",
  transformation_ctx="datasource"
)

print(f"Count: {datasource.count()}")
print(f"Schema: {datasource.schema()}")

The above logs output: Count: 0 & Schema: StructType([], {}), where the Athena table shows I have around ~800,000 rows.

Sidenotes:

The ETL job concerned has AWSGlueServiceRole attached.
I tried Glue Visual Editor as well, it showed the datacatalog database/table concerned but sadly, same error.

分享到QQ

分享到微博