我有爬网生成的数据目录表是MongoDB的数据源,其次是DataSource Postgres SQL(RDS)。爬行者成功运行&连接测试工作。
我正在尝试定义从MongoDB到Postgres SQL(简单变换)的ETL作业。
在作业中,我将源定义为AWS胶水数据目录(MongoDB),将目标定义为数据目录Postgres。
当我运行工作时,我会收到此错误:
ilegalargumentException:丢失集合名称。通过'Spark.mongodb.input.uri'或'Spark.mongodb.input.Collection'属性设置,
看起来这与MongoDB部分有关。我试图在数据目录表中设置“数据库”和“集合”参数,但它无助
于为源生成的脚本是:
AWSGlueDataCatalog_node1653400663056 = glueContext.create_dynamic_frame.from_catalog(
database="data-catalog-db",
table_name="data-catalog-table",
transformation_ctx="AWSGlueDataCatalog_node1653400663056"
丢失了什么?
I have data catalog tables generated by crawlers one is data source from mongodb, and second is datasource Postgres sql (rds). Crawlers running successfully & connections test working.
I am trying to define an ETL job from mongodb to postgres sql (simple transform).
In the job I defined source as AWS Glue Data Catalog (mongodb) and target as Data catalog Postgres.
When I run the job I get this error:
IllegalArgumentException: Missing collection name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.collection' property
It looks like this is related to the mongodb part. I tried to set the 'database' and 'collection' parameters in the data catalog tables and it didn't help
Script generated for source is:
AWSGlueDataCatalog_node1653400663056 = glueContext.create_dynamic_frame.from_catalog(
database="data-catalog-db",
table_name="data-catalog-table",
transformation_ctx="AWSGlueDataCatalog_node1653400663056"
What could be missing?
发布评论
评论(1)
我遇到了同样的问题,只需添加以下参数即可。
中找到其他参数。
可以在AWS页面
I had the same problem, just add the parameter below.
Additional parameters can be found on the AWS page
https://docs.aws.amazon.com/glue/latest/dg/connection-mongodb.html