调用O137.分区时发生错误。 :org.apache.hadoop.mapred.invalidinputexception:不存在输入路径:hdfs:// ip ip
我正在尝试在AWS EMR Spark群集中执行此GitHub项目
我已经成功运行了2个FISRT代码
- tweet_stream_producer.py
- sparkml_train_model.py,
但是当我使用命令运行消费者时,
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0,io.delta:delta-core_2.12:0.7.0 tweet_stream_consumer.py
我有文件路径错误
Py4JJavaError: An error occurred while calling o137.partitions.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ip-10-0-0-61.ec2.internal:8020/home/hadoop/spark-twitter-streaming/TwitterStreaming/src/app/models/metadata
,似乎问题位于本地文件系统路径和hadoop文件系统路径之间映射
model_path = str(SRC_DIR / 'models')
pipeline_model = PipelineModel.load(model_path)
I am trying to execute this github project in an aws emr spark cluster
https://github.com/pran4ajith/spark-twitter-streaming.git
I've succeeded to run 2 fisrt codes
- tweet_stream_producer.py
- sparkml_train_model.py
But when I run consumer part with command
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0,io.delta:delta-core_2.12:0.7.0 tweet_stream_consumer.py
I got file path error
Py4JJavaError: An error occurred while calling o137.partitions.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ip-10-0-0-61.ec2.internal:8020/home/hadoop/spark-twitter-streaming/TwitterStreaming/src/app/models/metadata
It seems that the problem is located mapping between local file system path and hadoop file system path
model_path = str(SRC_DIR / 'models')
pipeline_model = PipelineModel.load(model_path)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论