AWS胶:调用O100.PyWritedYnamicFrame时发生错误。无法找到数据来源:未知
运行胶水管道时会遇到以下
An error occurred while calling o100.pyWriteDynamicFrame. Failed to find data source: UNKNOWN. Please find packages at http://spark.apache.org/third-party-projects.html
错误
尝试
我在 java.lang.reflect.undeclaredthrowable Exception
我的脚本是:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
# Script generated for node S3 bucket
S3bucket_node1 = glueContext.create_dynamic_frame.from_options(
format_options={"multiline": False},
connection_type="s3",
format="json",
connection_options={"paths": ["s3://numbeo-bucket/results.json"], "recurse": True},
transformation_ctx="S3bucket_node1",
)
# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
frame=S3bucket_node1, mappings=[], transformation_ctx="ApplyMapping_node2"
)
# Script generated for node Redshift Cluster
RedshiftCluster_node3 = glueContext.write_dynamic_frame.from_catalog(
frame=ApplyMapping_node2,
database="redshift-cluster-1",
table_name="results_json",
redshift_tmp_dir=args["TempDir"],
transformation_ctx="RedshiftCluster_node3",
)
I'm getting the following error when attempting to run a Glue pipeline that uploads a json stored in S3 to Redshift
An error occurred while calling o100.pyWriteDynamicFrame. Failed to find data source: UNKNOWN. Please find packages at http://spark.apache.org/third-party-projects.html
I have an outputted log file that includes the following errors in this order:
InvocationTargetException java.lang.reflect.InvocationTargetException
Exception in User Class java.lang.reflect.UndeclaredThrowableException
My script is the following:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
# Script generated for node S3 bucket
S3bucket_node1 = glueContext.create_dynamic_frame.from_options(
format_options={"multiline": False},
connection_type="s3",
format="json",
connection_options={"paths": ["s3://numbeo-bucket/results.json"], "recurse": True},
transformation_ctx="S3bucket_node1",
)
# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
frame=S3bucket_node1, mappings=[], transformation_ctx="ApplyMapping_node2"
)
# Script generated for node Redshift Cluster
RedshiftCluster_node3 = glueContext.write_dynamic_frame.from_catalog(
frame=ApplyMapping_node2,
database="redshift-cluster-1",
table_name="results_json",
redshift_tmp_dir=args["TempDir"],
transformation_ctx="RedshiftCluster_node3",
)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论