使用Awsglueml.transforms.findmatches使用AWS胶合作业,似乎随机地错误了
我有一个胶水ETL作业(使用PySpark),在尝试访问Awsglueml.transforms.findmatches库时,它似乎是随机的。胶水仪表板上的错误是:
An error occurred while calling z:com.amazonaws.services.glue.ml.FindMatches.apply. The target server failed to respond
基本上,如果我尝试在深夜运行此胶水ETL作业,那么大多数时候都会成功。但是,如果我尝试在一天中期运行此ETL作业,则此错误将失败。有时只是重试足够的时间会使它成功,但这似乎不是一个很好的解决方案。似乎问题是AWS FindMatches库没有足够的带宽来支持想要使用此库的人,但我在这里可能是错的。
使用该选项 AWS Glue生成的建议脚本
这是我创建此作业时胶水提供的一行的代码行:
from awsglueml.transforms import FindMatches
...
findmatches2 = FindMatches.apply(frame = datasource0, transformId = "<redacted>", computeMatchConfidenceScores = True, transformation_ctx = "findmatches2")
欢迎有关此难以捉摸的问题的任何信息。
I have a Glue ETL Job (using pyspark) that gives a timeout error when trying to access the awsglueml.transforms.FindMatches library seemingly randomly. The error given on the glue dashboard is:
An error occurred while calling z:com.amazonaws.services.glue.ml.FindMatches.apply. The target server failed to respond
Basically if I try to run this Glue ETL job late at night, it most of the time succeeds. But if I try to run this ETL Job in the middle of the day, it fails with this error. Sometimes just retrying it enough times causes it to succeed, but this doesn't seem like a good solution. It seems like the issue is with AWS FindMatches library not having enough bandwidth to support people wanting to use this library, but I could be wrong here.
The Glue ETL job was setup using the option A proposed script generated by AWS Glue
The line of code that this is timing out on is a line that was provided by glue when I created this job:
from awsglueml.transforms import FindMatches
...
findmatches2 = FindMatches.apply(frame = datasource0, transformId = "<redacted>", computeMatchConfidenceScores = True, transformation_ctx = "findmatches2")
Welcoming any information on this elusive issue.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过在胶水作业中包括此配置来解决这:
This was solved by including this configuration in the glue job: