Spark Spark-sql-kafka - java.lang.NoClassDefFoundError:org/apache/kafka/common/serialization/ByteArraySerializer

发布于 2025-01-19 16:08:25 字数 1743 浏览 2 评论 0 原文

我正在尝试通过

火花版:3.2.1
Scala版本:2.12.15

遵循其火花壳的指南,包括依赖项,我开始了我的外壳:

spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1

但是,一旦我在外壳中运行类似以下内容:

val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers","http://HOST:PORT").option("subscribe", "my-topic").load()

我会得到以下例外:

java.lang.noclassdeffounderror:org/apache/kafka/common/serialization/bytearrayserializer`

任何想法如何克服这个问题?

我的假设是使用 - 包装,也应加载所有依赖项。但这似乎并非如此。从日志中,我假设软件包已成功加载,包括kafka-clients依赖关系:

org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
resolving dependencies :: org.apache.spark#spark-submit-parent-3b04f646-471c-4cc8-88fb-7e32bc3226ed;1.0
confs: \[default\]
found org.apache.spark#spark-sql-kafka-0-10_2.12;3.2.1 in central
found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.2.1 in central
found org.apache.kafka#kafka-clients;2.8.0 in central
found org.lz4#lz4-java;1.7.1 in central
found org.xerial.snappy#snappy-java;1.1.8.4 in central
found org.slf4j#slf4j-api;1.7.30 in central
found org.apache.hadoop#hadoop-client-runtime;3.3.1 in central
found org.spark-project.spark#unused;1.0.0 in central
found org.apache.hadoop#hadoop-client-api;3.3.1 in central
found org.apache.htrace#htrace-core4;4.1.0-incubating in central
found commons-logging#commons-logging;1.1.3 in central
found com.google.code.findbugs#jsr305;3.0.0 in central
found org.apache.commons#commons-pool2;2.6.2 in central

I am experimenting with spark reading from a kafka topic through "Structured Streaming + Kafka Integration Guide".

Spark version: 3.2.1
Scala version: 2.12.15

Following their guide on the spark-shell including the dependencies, I start my shell:

spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1

However, once I run something like the following in my shell:

val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers","http://HOST:PORT").option("subscribe", "my-topic").load()

I get the following exception:

java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer`

Any ideas how to overcome this issue?

My assumption was with using --packages, all dependencies should be loaded as well. But this does not seem to be the case. From the logs I assume that the package gets loaded successfully, including the kafka-clients dependency:

org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
resolving dependencies :: org.apache.spark#spark-submit-parent-3b04f646-471c-4cc8-88fb-7e32bc3226ed;1.0
confs: \[default\]
found org.apache.spark#spark-sql-kafka-0-10_2.12;3.2.1 in central
found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.2.1 in central
found org.apache.kafka#kafka-clients;2.8.0 in central
found org.lz4#lz4-java;1.7.1 in central
found org.xerial.snappy#snappy-java;1.1.8.4 in central
found org.slf4j#slf4j-api;1.7.30 in central
found org.apache.hadoop#hadoop-client-runtime;3.3.1 in central
found org.spark-project.spark#unused;1.0.0 in central
found org.apache.hadoop#hadoop-client-api;3.3.1 in central
found org.apache.htrace#htrace-core4;4.1.0-incubating in central
found commons-logging#commons-logging;1.1.3 in central
found com.google.code.findbugs#jsr305;3.0.0 in central
found org.apache.commons#commons-pool2;2.6.2 in central

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦明 2025-01-26 16:08:25

日志看起来还不错,但是您可以尝试在中包括 kafka-clients 中 - packages 参数

,我建议创建一个Uber Jar而不是下载库每次提交应用

The logs seem fine, but you can try to include kafka-clients dependency in --packages argument as well

Otherwise, I'd suggest creating an uber jar instead of downloading libraries every time you submit the app

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文