kubernetes上的Spark 3.2不断投掷Okhttp3/Okio EofException

发布于 2025-01-25 08:21:17 字数 1402 浏览 2 评论 0原文

我正在使用Kubernetes 1.18群集上的“ docker-image-tool.sh”官方发行版构建的Spark 3.2.1图像。一切正常,除了每90秒一次错误消息外:

 WARN WatcherWebSocketListener: Exec Failure
 java.io.EOFException
    at okio.RealBufferedSource.require(RealBufferedSource.java:61)
    at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
    at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
    at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
    at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

此错误消息不会影响应用程序,但这确实很烦人,尤其是对于Jupyter用户而言,缺乏细节使得很难调试。

它出现在任何提交变体上 - spark-submitpysparkspark-> spark-shell,无论启用或禁用动态执行。

我在Internet上发现了它的痕迹,但是所有发生的事件均来自Spark的较旧版本,并通过使用Fabric8(4.X)的“较新”版本解决。 Spark 3.2.1已经使用Fabric8 5.4.1。

我想知道是否有人在Spark 3.X中仍然看到此错误,并且有一个分辨率。

谢谢。


更新: 这似乎与Kubernetes群集本身有关。迁移到新集群后,此错误消失了。

I'm using Spark 3.2.1 image that was built from the official distribution via `docker-image-tool.sh', on Kubernetes 1.18 cluster. Everything works fine, except for this error message every 90 seconds:

 WARN WatcherWebSocketListener: Exec Failure
 java.io.EOFException
    at okio.RealBufferedSource.require(RealBufferedSource.java:61)
    at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
    at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
    at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
    at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

This error message does not effect the application, but it's really annoying, especially for Jupyter users, and the lack of details makes it very hard to debug.

It appears on any submit variation - spark-submit, pyspark, spark-shell, and regardless to dynamic execution enabled or disabled.

I've found traces of it on the internet, but all occurrences were from older versions of Spark and resolved by using "newer" version of fabric8 (4.x). Spark 3.2.1 already use fabric8 5.4.1.

I wonder if anyone else still sees this error in Spark 3.x, and has a resolution.

Thanks.


Update:
This seems to be related to the Kubernetes cluster itself. After migrating to a new cluster this error was gone.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文