Groovy 程序作为 kubernetes 作业在执行期间转储线程运行的可能原因

发布于 2025-01-10 00:16:37 字数 3693 浏览 1 评论 0原文

我有一个简单的 groovy 脚本,它利用 GPars 库的 withPool 功能向两个内部 API 端点并行启动 HTTP GET 请求。

该脚本在本地运行良好,无论是直接运行还是在 Docker 容器中运行。

当我将其部署为 Kubernetes Job(在我们的内部 EKS 集群中:1.20)时,它也会在那里运行,但当它到达第一个 withPool< /code> 调用,我看到一个巨大的线程转储,但执行仍在继续,并成功完成。

注意:集群中的容器使用以下 Pod 安全上下文运行:

      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000

环境

# From the k8s job container
groovy@app-271df1d7-15848624-mzhhj:/app$ groovy --version
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
Groovy Version: 4.0.0 JVM: 17.0.2 Vendor: Eclipse Adoptium OS: Linux


groovy@app-271df1d7-15848624-mzhhj:/app$ ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
groovy       1     0  0 21:04 ?        00:00:00 /bin/bash bin/run-script.sh
groovy      12     1 42 21:04 ?        00:00:17 /opt/java/openjdk/bin/java -Xms3g -Xmx3g --add-modules=ALL-SYSTEM -classpath /opt/groovy/lib/groovy-4.0.0.jar -Dscript.name=/usr/bin/groovy -Dprogram.name=groovy -Dgroovy.starter.conf=/opt/groovy/conf/groovy-starter.conf -Dgroovy.home=/opt/groovy -Dtools.jar=/opt/java/openjdk/lib/tools.jar org.codehaus.groovy.tools.GroovyStarter --main groovy.ui.GroovyMain --conf /opt/groovy/conf/groovy-starter.conf --classpath . /tmp/script.groovy
groovy     116     0  0 21:05 pts/0    00:00:00 bash
groovy     160   116  0 21:05 pts/0    00:00:00 ps -ef

脚本(相关部分)

@Grab('org.codehaus.gpars:gpars:1.2.1')
import static groovyx.gpars.GParsPool.withPool

import groovy.json.JsonSlurper
final def jsl = new JsonSlurper()


//...

while (!(nextBatch = getBatch(batchSize)).isEmpty()) {
    def devThread = Thread.start {
        withPool(poolSize) {
            nextBatch.eachParallel { kw ->
                String url = dev + "&" + "query=$kw"
                try {
                    def response = jsl.parseText(url.toURL().getText(connectTimeout: 10.seconds, readTimeout: 10.seconds,
                            useCaches: true, allowUserInteraction: false))
                    devResponses[kw] = response
                } catch (e) {
                    println("\tFailed to fetch: $url | error: $e")
                }
            }
        }
    }

    def stgThread = Thread.start {
        withPool(poolSize) {
            nextBatch.eachParallel { kw ->
                String url = stg + "&" + "query=$kw"
                try {
                    def response = jsl.parseText(url.toURL().getText(connectTimeout: 10.seconds, readTimeout: 10.seconds,
                            useCaches: true, allowUserInteraction: false))
                    stgResponses[kw] = response
                } catch (e) {
                    println("\tFailed to fetch: $url | error: $e")
                }
            }
        }
    }
    devThread.join()
    stgThread.join()
}

Dockerfile

FROM groovy:4.0.0-jdk17 as builder

USER root
RUN apt-get update && apt-get install -yq bash curl wget jq

WORKDIR /app

COPY bin /app/bin

RUN chmod +x /app/bin/*

USER groovy
ENTRYPOINT ["/bin/bash"]
CMD ["bin/run-script.sh"]

bin/run-script.sh 只是在运行时下载上述 groovy 脚本并执行它。

wget "$GROOVY_SCRIPT" -O "$LOCAL_FILE"

...

groovy "$LOCAL_FILE"

一旦执行第一次调用 withPool(poolSize),就会出现一个巨大的线程转储,但执行会继续。

我试图找出什么可能导致这种行为。有什么想法

I have a simple groovy script that leverages the GPars library's withPool functionality to launch HTTP GET requests to two internal API endpoints in parallel.

The script runs fine locally, both directly as well as a docker container.

When I deploy it as a Kubernetes Job (in our internal EKS cluster: 1.20), it runs there as well, but the moment it hits the first withPool call, I see a giant thread dump, but the execution continues, and completes successfully.

NOTE: Containers in our cluster run with the following pod security context:

      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000

Environment

# From the k8s job container
groovy@app-271df1d7-15848624-mzhhj:/app$ groovy --version
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
Groovy Version: 4.0.0 JVM: 17.0.2 Vendor: Eclipse Adoptium OS: Linux


groovy@app-271df1d7-15848624-mzhhj:/app$ ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
groovy       1     0  0 21:04 ?        00:00:00 /bin/bash bin/run-script.sh
groovy      12     1 42 21:04 ?        00:00:17 /opt/java/openjdk/bin/java -Xms3g -Xmx3g --add-modules=ALL-SYSTEM -classpath /opt/groovy/lib/groovy-4.0.0.jar -Dscript.name=/usr/bin/groovy -Dprogram.name=groovy -Dgroovy.starter.conf=/opt/groovy/conf/groovy-starter.conf -Dgroovy.home=/opt/groovy -Dtools.jar=/opt/java/openjdk/lib/tools.jar org.codehaus.groovy.tools.GroovyStarter --main groovy.ui.GroovyMain --conf /opt/groovy/conf/groovy-starter.conf --classpath . /tmp/script.groovy
groovy     116     0  0 21:05 pts/0    00:00:00 bash
groovy     160   116  0 21:05 pts/0    00:00:00 ps -ef

Script (relevant parts)

@Grab('org.codehaus.gpars:gpars:1.2.1')
import static groovyx.gpars.GParsPool.withPool

import groovy.json.JsonSlurper
final def jsl = new JsonSlurper()


//...

while (!(nextBatch = getBatch(batchSize)).isEmpty()) {
    def devThread = Thread.start {
        withPool(poolSize) {
            nextBatch.eachParallel { kw ->
                String url = dev + "&" + "query=$kw"
                try {
                    def response = jsl.parseText(url.toURL().getText(connectTimeout: 10.seconds, readTimeout: 10.seconds,
                            useCaches: true, allowUserInteraction: false))
                    devResponses[kw] = response
                } catch (e) {
                    println("\tFailed to fetch: $url | error: $e")
                }
            }
        }
    }

    def stgThread = Thread.start {
        withPool(poolSize) {
            nextBatch.eachParallel { kw ->
                String url = stg + "&" + "query=$kw"
                try {
                    def response = jsl.parseText(url.toURL().getText(connectTimeout: 10.seconds, readTimeout: 10.seconds,
                            useCaches: true, allowUserInteraction: false))
                    stgResponses[kw] = response
                } catch (e) {
                    println("\tFailed to fetch: $url | error: $e")
                }
            }
        }
    }
    devThread.join()
    stgThread.join()
}

Dockerfile

FROM groovy:4.0.0-jdk17 as builder

USER root
RUN apt-get update && apt-get install -yq bash curl wget jq

WORKDIR /app

COPY bin /app/bin

RUN chmod +x /app/bin/*

USER groovy
ENTRYPOINT ["/bin/bash"]
CMD ["bin/run-script.sh"]

The bin/run-script.sh simply downloads the above groovy script at runtime and executes it.

wget "$GROOVY_SCRIPT" -O "$LOCAL_FILE"

...

groovy "$LOCAL_FILE"

As soon as the execution hits the first call to withPool(poolSize), there's a giant thread dump, but execution continues.

I'm trying to figure out what could be causing this behavior. Any ideas ????????‍♂️?

Thread dump

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

鹿港小镇 2025-01-17 00:16:37

为了后代,在这里回答我自己的问题。

问题原来是 这个 log4j2 JVM 热补丁 我们目前正在利用它来修复最近的 log4j2 漏洞。该代理(作为 DaemonSet 运行)修补我们所有 k8s 集群中所有正在运行的 JVM。

这不知何故导致我基于 OpenJDK 17 的应用程序进行线程转储。我在 ElasticSearch 8.1.0 部署中也发现了同样的问题(也使用预打包的 OpenJDK 17)。这是一项服务,所以我几乎每半小时就会看到一次线程转储!有趣的是,还有其他 JVM 服务(以及一些 SOLR 8 部署)没有存在此问题

For posterity, answering my own question here.

The issue turned out to be this log4j2 JVM hot-patch that we're currently leveraging to fix the recent log4j2 vulnerability. This agent (running as a DaemonSet) patches all running JVMs in all our k8s clusters.

This, somehow, causes my OpenJDK 17 based app to thread dump. I found the same issue with an ElasticSearch 8.1.0 deployment as well (also uses a pre-packaged OpenJDK 17). This one is a service, so I could see a thread dump happening pretty much every half hour! Interestingly, there are other JVM services (and some SOLR 8 deployments) that don't have this issue ????????‍♂️.

Anyway, I worked with our devops team to temporarily exclude the node that deployment was running on, and lo and behold, the thread dumps disappeared!

Balance in the universe has been restored ????????‍♂️.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文