成功将 MONGODB 表爬行到 AWS 数据目录后,无法创建动态框架
我成功地创建了一个MongoDB连接,我的连接测试成功了,并且能够使用轨道在胶水数据目录中创建元数据。但是,当我在下面使用下面的位置,将我的mongoDB数据库名称和收集名称中的添加到附加_Options参数中,我会发现一个错误:
data_catalog_database = 'tinkerbell'data_catalog_table = 'tinkerbell_funds'glueContext.create_dynamic_frame_from_catalog(database = data_catalog_database,table_name = data_catalog_table,additional_options = {"database":"tinkerbell","collection":"funds"})
以下是错误:遇到错误:调用o177.getdynamicframe时发生错误。 :java.lang.nosuchmethoderror:com.mongodb.internal.connection.defaultclusterablesermableserverfactory。连接/ConnectionPoolSettings; lcom/mongodb/connection/streamFactory; lcom/mongodb/connection/streamFactory; lcom/mongodb/mongocredential; lcom/mongodb/eventslistener; ljava; ljava; ljava/lang/lang/string; lcom/mongodb/mongodb/mongongodb/mongongodb/mongodrivit; ljodriverin/ljova; ljova;列表;)v
当我没有其他参数时使用它时,
glueContext.create_dynamic_frame_from_catalog(database = data_catalog_database,table_name = data_catalog_table)
我会收到以下错误: 遇到错误:缺少集合名称。通过'spark.mongodb.input.uri'或'spark.mongodb.input.collection'属性追溯(最新呼叫上次)设置:文件“/home/glue_user/aws-glue-libs/pyglue.zip/awsglue/awsglue/ context.py”,第179行,在create_dynamic_frame_frame_from_catalog返回source.getFrame(** kwargs)文件“/home/glue_user/aws-glue-libs/pyglue.zip/awsglue/awsglue/data_source.py” self._jsource.getDynamicFrame()文件“/home/glue_user/spark/python/lib/lib/py4j-0.10.9-src.zip.zip/py4j/java_gateway.py”,第1305行,在呼叫答案,self.gateway_client,self.gateway_client,self中。 target_id,self.name)文件“/home/glue_user/spark/python/pyspark/pyspark/sql/utils.py”,第117行,在deco上升起,从none pyspark.sql.util.utils.utils.illegalargumentexception转换了转换。通过'spark.mongodb.input.uri'或'spark.mongodb.input.collection'属性设置
,请有人请帮助我正确传递这些参数吗?
上面已经解释了我尝试的内容,但是我期望使用目录表创建动态框架。
I created a mongodb connection successfully, my connection tests successfully and was able to use a Crawler to create metadata in the Glue Data Catalog. However, when i use below where i am adding my mongodb database name and collection name in additional_options parameter i get an error:
data_catalog_database = 'tinkerbell'data_catalog_table = 'tinkerbell_funds'glueContext.create_dynamic_frame_from_catalog(database = data_catalog_database,table_name = data_catalog_table,additional_options = {"database":"tinkerbell","collection":"funds"})
following is the error: An error was encountered: An error occurred while calling o177.getDynamicFrame. : java.lang.NoSuchMethodError: com.mongodb.internal.connection.DefaultClusterableServerFactory.<init>(Lcom/mongodb/connection/ClusterId;Lcom/mongodb/connection/ClusterSettings;Lcom/mongodb/connection/ServerSettings;Lcom/mongodb/connection/ConnectionPoolSettings;Lcom/mongodb/connection/StreamFactory;Lcom/mongodb/connection/StreamFactory;Lcom/mongodb/MongoCredential;Lcom/mongodb/event/CommandListener;Ljava/lang/String;Lcom/mongodb/MongoDriverInformation;Ljava/util/List;)V
When I use it without additional parameters
glueContext.create_dynamic_frame_from_catalog(database = data_catalog_database,table_name = data_catalog_table)
I get following error:
An error was encountered: Missing collection name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.collection' property Traceback (most recent call last): File "/home/glue_user/aws-glue-libs/PyGlue.zip/awsglue/context.py", line 179, in create_dynamic_frame_from_catalog return source.getFrame(**kwargs) File "/home/glue_user/aws-glue-libs/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame jframe = self._jsource.getDynamicFrame() File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in call answer, self.gateway_client, self.target_id, self.name) File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.IllegalArgumentException: Missing collection name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.collection' property
Can someone please help me pass these parameters correctly?
Have explained above on what I tried but what I was expecting the dynamic frame to be created using the catalog table.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您会遇到错误,因为Mongo期望与Spark建立联系,并且需要输入和输出属性。
请参阅下面的链接 -
https .com/docs/spark-connector/master/python-api/#std-label-pyspark-shell
You are getting that error as mongo is expecting a connection with spark and need the input and output property.
Please refer to below link-
https://www.mongodb.com/docs/spark-connector/master/python-api/#std-label-pyspark-shell