从S3导入数据后获取额外的 / charactor

发布于 2025-02-09 10:25:20 字数 2238 浏览 1 评论 0原文

从S3备份将数据导入新的DynamoDB之后,我会得到额外的\ carture。我正在使用的是胶水ETL的工作,而且工作正常。但是导入后,我会遇到这个问题。

我需要在没有\字符的情况下导入它 AWS胶水ETL脚本

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
Source = glueContext.create_dynamic_frame.from_catalog(database = "Dev-UserProgram", table_name = "data", transformation_ctx = "Source")
# Script generated for node S3 bucket
S3bucket_node1 = glueContext.create_dynamic_frame.from_catalog(
    database="dev-userprogram",
    table_name="data",
    transformation_ctx="S3bucket_node1",
)
Mapped = ApplyMapping.apply(frame = Source, mappings = [
    ("Item.UserProgramId", "string", "UserProgramId", "string"),
    ("Item.UserProgram", "string", "UserProgram", "string")],
    transformation_ctx = "Mapped")
glueContext.write_dynamic_frame_from_options (
    frame = Mapped, 
    connection_type = "dynamodb", 
    connection_options = { "dynamodb.region": "ap-southeast-2", 
    "dynamodb.output.tableName": "Dev-UserProgram", 
    "dynamodb.throughput.write.percent": "1.0" } 
    )
job.commit()

输出:

{
  "UserProgramId": {
    "S": "{\"S\": \"d135a9a8163d486d9398622e4301ab1b\"}"
  },
  "UserProgram": {
    "S": "{\"M\": {\"EmailAddress\": {\"S\": \"[email protected]\"}, \"EndDateUTC\": {\"S\": \"2021-02-12T11:12:55.543Z\"}, \"FixedDuration\": {\"BOOL\": true}, \"HasCompleted\": {\"BOOL\": false}, \"PlanType\": {\"N\": \"1\"}, \"RowVersion\": {\"S\": \"637516535873894162\"}, \"ScheduleId\": {\"S\": \"4f4fe32cd8424cc190bbcfa3cdc8f2c1\"}, \"StartDateRangeKey\": {\"S\": \"2021-03-18T08:39:47\"}, \"StartDateUTC\": {\"S\": \"2021-03-18T08:39:47.3851194Z\"}, \"UserProgramId\": {\"S\": \"d135a9a8163d486d9398622e4301ab1b\"}, \"UserProgress\": {\"L\": []}, \"__typename\": {\"S\": \"UserProgram\"}}}"
  }
}

I'm getting extra \ character after importing data to a new dynamodb from an S3 backup. Im am using was glue ETL job for this and it's working fine. But after importing I'm getting this issue.

I need to import it without \ character to the dynamoDB
AWS Glue ETL script

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
Source = glueContext.create_dynamic_frame.from_catalog(database = "Dev-UserProgram", table_name = "data", transformation_ctx = "Source")
# Script generated for node S3 bucket
S3bucket_node1 = glueContext.create_dynamic_frame.from_catalog(
    database="dev-userprogram",
    table_name="data",
    transformation_ctx="S3bucket_node1",
)
Mapped = ApplyMapping.apply(frame = Source, mappings = [
    ("Item.UserProgramId", "string", "UserProgramId", "string"),
    ("Item.UserProgram", "string", "UserProgram", "string")],
    transformation_ctx = "Mapped")
glueContext.write_dynamic_frame_from_options (
    frame = Mapped, 
    connection_type = "dynamodb", 
    connection_options = { "dynamodb.region": "ap-southeast-2", 
    "dynamodb.output.tableName": "Dev-UserProgram", 
    "dynamodb.throughput.write.percent": "1.0" } 
    )
job.commit()

output :

{
  "UserProgramId": {
    "S": "{\"S\": \"d135a9a8163d486d9398622e4301ab1b\"}"
  },
  "UserProgram": {
    "S": "{\"M\": {\"EmailAddress\": {\"S\": \"[email protected]\"}, \"EndDateUTC\": {\"S\": \"2021-02-12T11:12:55.543Z\"}, \"FixedDuration\": {\"BOOL\": true}, \"HasCompleted\": {\"BOOL\": false}, \"PlanType\": {\"N\": \"1\"}, \"RowVersion\": {\"S\": \"637516535873894162\"}, \"ScheduleId\": {\"S\": \"4f4fe32cd8424cc190bbcfa3cdc8f2c1\"}, \"StartDateRangeKey\": {\"S\": \"2021-03-18T08:39:47\"}, \"StartDateUTC\": {\"S\": \"2021-03-18T08:39:47.3851194Z\"}, \"UserProgramId\": {\"S\": \"d135a9a8163d486d9398622e4301ab1b\"}, \"UserProgress\": {\"L\": []}, \"__typename\": {\"S\": \"UserProgram\"}}}"
  }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文