在 AWS Glue ETL 脚本中使用自定义连接器
我正在使用动态框架
粘合抽象来编写 AWS Glue ETL 脚本,并在 python 中编写代码。
我在胶水数据目录中创建了一个名为 sap-lpr-connection
的 JDBC 连接资源,并希望使用它从代码中检索连接选项。
根据 此链接(和其他来源),我应该使用 "custom.jdbc"
connection_type
来访问我的连接资源创建的。
这就是我的代码的样子:
from pyspark.context import SparkContext
from awsglue.context import GlueContext
# DATABASE
database = 'sap_lpr'
table = 'bsim'
# GLUE CONTEXT
glue_context = GlueContext(SparkContext.getOrCreate())
# CONNECTION OPTIONS
connection_options = {
"connectionName": f"{database.replace('_', '-')}-connection",
"dbTable": table
}
# READ DATA
dyf = glue_context.create_dynamic_frame.from_options(
connection_type="custom.jdbc",
connection_options=connection_options
)
但是当我运行代码时,我收到此错误:
调用 o81.getSource 时发生错误。 Glue ETL Marketplace:无法检索必填字段 CONNECTOR_TYPE。
我知道另一种方法是指定 "jdbc"
connection_type
并传递各种连接选项,例如 jdbc URL、用户名和密码,但我更喜欢从我为此特意创建的胶水连接资源。
另外,我真的很想坚持使用glue_context API,而不是标准的spark API。
知道我可能做错了什么吗?
I am working on an AWS Glue ETL script using the dynamic frame
glue abstraction and writing code in python.
I created a JDBC connection resource named sap-lpr-connection
in the glue data catalog and would like to use it to retrieve the connection options from the code.
As per this link (and other sources), I should be using a "custom.jdbc"
connection_type
to access the connection resource I created.
This is what my code looks like:
from pyspark.context import SparkContext
from awsglue.context import GlueContext
# DATABASE
database = 'sap_lpr'
table = 'bsim'
# GLUE CONTEXT
glue_context = GlueContext(SparkContext.getOrCreate())
# CONNECTION OPTIONS
connection_options = {
"connectionName": f"{database.replace('_', '-')}-connection",
"dbTable": table
}
# READ DATA
dyf = glue_context.create_dynamic_frame.from_options(
connection_type="custom.jdbc",
connection_options=connection_options
)
But when I run the code I get this error:
An error occurred while calling o81.getSource. Glue ETL Marketplace: Can not retrieve required field CONNECTOR_TYPE.
I know an alternative would be to specify a "jdbc"
connection_type
and pass the various connection options such as jdbc URL, username and password, but I prefer to retrieve that information from the glue connection resource I created on purpose for this.
Also, I would really like to stick to the glue_context
API as opposed to the standard spark API.
Any idea what I might be doing wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好的,事实证明,我误解了我正在使用的连接器的类型。
我使用“标准”连接器,JDBC ONE创建了AWS胶水数据目录中的连接资源,这并不被视为
connection_type
字段中的自定义连接器类型,而是您的标准JDBC连接。例如:connection_type ='sqlserver'
。因此,如果使用标准连接器之一(例如JDBC)创建连接,则必须使用
.extract_from_conf()
方法从连接资源提取配置:OK, it turns out that I misunderstood the type of connector I was using.
I created a connection resource in the AWS Glue Data Catalog using a "standard" connector, the JDBC one and this is not considered a custom connector type in the
connection_type
field, but rather a standard JDBC connection that you specify like so for example:connection_type='sqlserver'
.So if you create a connection using one of the standard connectors, such as JDBC, you have to use the
.extract_from_conf()
method to extract the configuration from the connection resource: