如何解决 AWS Glue pyspark 脚本从 DocumentDB 引发 retryWrite 错误

发布于 2025-01-16 01:32:10 字数 1389 浏览 1 评论 0原文

在 AWSglue 中运行以下代码。作业能够从数据库读取数据,但在写入时失败。

调用 o102.pyWriteDynamicFrame 时发生错误。命令失败,出现错误 301:服务器上“不支持可重试写入”:。完整响应为 {"ok": 0.0, "code": 301, "errmsg": "不支持可重试写入", "operationTime": {"$timestamp": {"t": 1647921685, "i": 1}}}

在“作业详细信息”部分中使用了目录 DocumentDB 连接

尝试在连接字符串中使用 retryWrite=false 但仍然收到错误


documentdb_uri = "mongodb://<host name>:27017"
documentdb_write_uri = "mongodb://<host name>:27017"

read_docdb_options = {
    "uri": documentdb_uri,
    "database": "test",
    "collection": "profiles",
    "username": "<username>",
    "password": "<password>",
    "ssl": "true",
    "ssl.domain_match": "false"
}

write_documentdb_options = {
    "uri": documentdb_write_uri,
    "database": "test",
    "collection": "collection1",
    "username": "<username>",
    "password": "<password>",
    "ssl": "true",
    "ssl.domain_match": "false"
}

# Get DynamicFrame from DocumentDB
dynamic_frame2 = glueContext.create_dynamic_frame.from_options(connection_type="documentdb",
                                                               connection_options=read_docdb_options)

# Write DynamicFrame to DocumentDB
glueContext.write_dynamic_frame.from_options(dynamic_frame2, connection_type="documentdb",
                                             connection_options=write_documentdb_options)

job.commit()

Running below code in AWS glue. Job is able to read the Data from DB but failing while writing.

An error occurred while calling o102.pyWriteDynamicFrame. Command failed with error 301: 'Retryable writes are not supported' on server :. The full response is {"ok": 0.0, "code": 301, "errmsg": "Retryable writes are not supported", "operationTime": {"$timestamp": {"t": 1647921685, "i": 1}}}

Used the catalogue DocumentDB connection in Job Details section

Tried using retryWrite=false in connection string but still getting the error


documentdb_uri = "mongodb://<host name>:27017"
documentdb_write_uri = "mongodb://<host name>:27017"

read_docdb_options = {
    "uri": documentdb_uri,
    "database": "test",
    "collection": "profiles",
    "username": "<username>",
    "password": "<password>",
    "ssl": "true",
    "ssl.domain_match": "false"
}

write_documentdb_options = {
    "uri": documentdb_write_uri,
    "database": "test",
    "collection": "collection1",
    "username": "<username>",
    "password": "<password>",
    "ssl": "true",
    "ssl.domain_match": "false"
}

# Get DynamicFrame from DocumentDB
dynamic_frame2 = glueContext.create_dynamic_frame.from_options(connection_type="documentdb",
                                                               connection_options=read_docdb_options)

# Write DynamicFrame to DocumentDB
glueContext.write_dynamic_frame.from_options(dynamic_frame2, connection_type="documentdb",
                                             connection_options=write_documentdb_options)

job.commit()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

烟酒忠诚 2025-01-23 01:32:10

正确的选项是 retryWrites=false 并且需要位于 uri 的末尾。

在您的情况下: documentdb_write_uri = "mongodb://:27017/?retryWrites=false"

The correct option is retryWrites=false and needs to be at the end of the uri.

In your case: documentdb_write_uri = "mongodb://<host name>:27017/?retryWrites=false"

嘿咻 2025-01-23 01:32:10

通过将 Glue 版本从 3.0 降级到 2.0 解决了这个问题。
在 3.0 中,使用动态框架时无法设置 retryWrite 设置。

已在他们的板上创建了一个票证,但尚未解决。
AWS 板上的问题供参考 - https://github.com/awslabs/aws -glue-libs/issues/111 [调用 o365.pyWriteDynamicFrame 时发生错误。命令失败,出现错误 301:服务器 ****.*****.docdb.amazonaws.com:27017 上“不支持可重试写入”。]

Solved it by downgrading Glue version from 3.0 to 2.0.
In 3.0 there is no way to set retryWrite setting while using dynamic frame.

A ticket has been created in their board and it was not resolved yet.
Issue in AWS board for reference - https://github.com/awslabs/aws-glue-libs/issues/111 [An error occurred while calling o365.pyWriteDynamicFrame. Command failed with error 301: 'Retryable writes are not supported' on server ****.*****.docdb.amazonaws.com:27017.]

停顿的约定 2025-01-23 01:32:10

正如其他答案 retryWrites=false 中所述,并且需要位于 uri 的末尾。但是,这样做会干扰 uri 的其他参数的处理方式,特别是数据库名称和 ssl 配置。因此,这些还需要作为参数添加到 uri 并从写入选项中删除,如下所示:

documentdb_uri = "mongodb://<host name>:27017"
documentdb_write_uri = "mongodb://<host name>:27017/test?retryWrites=false&ssl=true&sslInvalidHostNameAllowed=true"

read_docdb_options = {
    "uri": documentdb_uri,
    "database": "test",
    "collection": "profiles",
    "username": "<username>",
    "password": "<password>",
    "ssl": "true",
    "ssl.domain_match": "false"
}

write_documentdb_options = {
    "uri": documentdb_write_uri,
    "collection": "collection1",
    "username": "<username>",
    "password": "<password>"
}

# Get DynamicFrame from DocumentDB
dynamic_frame2 = glueContext.create_dynamic_frame.from_options(connection_type="documentdb",
                                                               connection_options=read_docdb_options)

# Write DynamicFrame to DocumentDB
glueContext.write_dynamic_frame.from_options(dynamic_frame2, connection_type="documentdb",
                                             connection_options=write_documentdb_options)

job.commit()

为了使代码片段正常工作,我还希望分区器选项需要作为写入选项的一部分,不过这可能超出了最初问题的范围,例如

write_documentdb_options = {
    "uri": documentdb_write_uri,
    "collection": "collection1",
    "username": "<username>",
    "password": "<password>"
    "partitioner": "com.mongodb.spark.sql.connector.read.partitioner.SinglePartitionPartitioner",
    "partitionerOptions.partitionSizeMB": "10",
    "partitionerOptions.partitionKey": "_id",
}

As stated in other answers retryWrites=false and needs to be at the end of the uri. However, doing so interferes with how other parameters to the uri are handled, specifically the database name, and the ssl configuration. As a result, these will also need to be added as parameters to the uri and removed from the write options, as so:

documentdb_uri = "mongodb://<host name>:27017"
documentdb_write_uri = "mongodb://<host name>:27017/test?retryWrites=false&ssl=true&sslInvalidHostNameAllowed=true"

read_docdb_options = {
    "uri": documentdb_uri,
    "database": "test",
    "collection": "profiles",
    "username": "<username>",
    "password": "<password>",
    "ssl": "true",
    "ssl.domain_match": "false"
}

write_documentdb_options = {
    "uri": documentdb_write_uri,
    "collection": "collection1",
    "username": "<username>",
    "password": "<password>"
}

# Get DynamicFrame from DocumentDB
dynamic_frame2 = glueContext.create_dynamic_frame.from_options(connection_type="documentdb",
                                                               connection_options=read_docdb_options)

# Write DynamicFrame to DocumentDB
glueContext.write_dynamic_frame.from_options(dynamic_frame2, connection_type="documentdb",
                                             connection_options=write_documentdb_options)

job.commit()

In orderfor the code snippet to work I would also expect partitioner options to be required as part of the write options, though that may be straying outside the scope of the initial question e.g.

write_documentdb_options = {
    "uri": documentdb_write_uri,
    "collection": "collection1",
    "username": "<username>",
    "password": "<password>"
    "partitioner": "com.mongodb.spark.sql.connector.read.partitioner.SinglePartitionPartitioner",
    "partitionerOptions.partitionSizeMB": "10",
    "partitionerOptions.partitionKey": "_id",
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文