在 AWS Glue ETL 脚本中使用自定义连接器

发布于 2025-01-19 01:18:42 字数 1262 浏览 3 评论 0原文

我正在使用动态框架粘合抽象来编写 AWS Glue ETL 脚本，并在 python 中编写代码。

我在胶水数据目录中创建了一个名为 sap-lpr-connection 的 JDBC 连接资源，并希望使用它从代码中检索连接选项。

根据此链接（和其他来源），我应该使用 "custom.jdbc" connection_type 来访问我的连接资源创建的。

这就是我的代码的样子：

from pyspark.context import SparkContext
from awsglue.context import GlueContext

# DATABASE
database = 'sap_lpr'
table = 'bsim'

# GLUE CONTEXT
glue_context = GlueContext(SparkContext.getOrCreate())

# CONNECTION OPTIONS
connection_options = {
    "connectionName": f"{database.replace('_', '-')}-connection",
    "dbTable": table
}

# READ DATA
dyf = glue_context.create_dynamic_frame.from_options(
    connection_type="custom.jdbc",
    connection_options=connection_options
)

但是当我运行代码时，我收到此错误：

调用 o81.getSource 时发生错误。 Glue ETL Marketplace：无法检索必填字段 CONNECTOR_TYPE。

我知道另一种方法是指定 "jdbc" connection_type 并传递各种连接选项，例如 jdbc URL、用户名和密码，但我更喜欢从我为此特意创建的胶水连接资源。

另外，我真的很想坚持使用glue_context API，而不是标准的spark API。

知道我可能做错了什么吗？

原文

I am working on an AWS Glue ETL script using the dynamic frame glue abstraction and writing code in python.

I created a JDBC connection resource named sap-lpr-connection in the glue data catalog and would like to use it to retrieve the connection options from the code.

As per this link (and other sources), I should be using a "custom.jdbc" connection_type to access the connection resource I created.

This is what my code looks like:

from pyspark.context import SparkContext
from awsglue.context import GlueContext

# DATABASE
database = 'sap_lpr'
table = 'bsim'

# GLUE CONTEXT
glue_context = GlueContext(SparkContext.getOrCreate())

# CONNECTION OPTIONS
connection_options = {
    "connectionName": f"{database.replace('_', '-')}-connection",
    "dbTable": table
}

# READ DATA
dyf = glue_context.create_dynamic_frame.from_options(
    connection_type="custom.jdbc",
    connection_options=connection_options
)

But when I run the code I get this error:

An error occurred while calling o81.getSource. Glue ETL Marketplace: Can not retrieve required field CONNECTOR_TYPE.

I know an alternative would be to specify a "jdbc" connection_type and pass the various connection options such as jdbc URL, username and password, but I prefer to retrieve that information from the glue connection resource I created on purpose for this.

Also, I would really like to stick to the glue_context API as opposed to the standard spark API.

Any idea what I might be doing wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

昔日梦未散 2025-01-26 01:18:42

好的，事实证明，我误解了我正在使用的连接器的类型。

我使用“标准”连接器，JDBC ONE创建了AWS胶水数据目录中的连接资源，这并不被视为connection_type字段中的自定义连接器类型，而是您的标准JDBC连接。例如：connection_type ='sqlserver'。

因此，如果使用标准连接器之一（例如JDBC）创建连接，则必须使用.extract_from_conf（）方法从连接资源提取配置：

configuration = glue_context.extract_jdbc_conf(
    connection_name,
    catalog_id=None
)

connection_options = {
    "url": configuration["url"],
    "user": configuration["user"]
    "password": configuration["password"]
}

OK, it turns out that I misunderstood the type of connector I was using.

I created a connection resource in the AWS Glue Data Catalog using a "standard" connector, the JDBC one and this is not considered a custom connector type in the connection_type field, but rather a standard JDBC connection that you specify like so for example: connection_type='sqlserver'.

So if you create a connection using one of the standard connectors, such as JDBC, you have to use the .extract_from_conf() method to extract the configuration from the connection resource:

configuration = glue_context.extract_jdbc_conf(
    connection_name,
    catalog_id=None
)

connection_options = {
    "url": configuration["url"],
    "user": configuration["user"]
    "password": configuration["password"]
}

回复收藏 0 原文

~没有更多了~