詹金斯代理商“无法创建实时文件path”并在线标记

发布于 2025-01-17 20:53:05 字数 1044 浏览 5 评论 0原文

Jenkins 控制器报告:无法为 i-xxxxxxxxxxxxx 创建实时文件路径并且代理被标记为脱机谷歌

搜索此错误表明控制器和代理之间的通信路径存在问题,但是什么?

背景:

Jenkins 控制器在 Docker 容器内运行 v2.332.1、Java 11 64 位操作系统 Jenkins 代理在启动时运行从控制器下载的 Swarm-Client jar。 Swarm 插件 版本 3.32 Java 11 和 64 位操作系统,位于 docker 容器

内和控制器托管在 AWS 中单独的 EC2 实例上,并在相关端口上具有安全组权限。

实例启动时运行 Cloud-Init,从 Jenkins 控制器下载 swarm-client.jar,然后使用连接到控制器所需的参数运行它。我提到这一点是为了避免出现“您使用的版本是否正确”的评论:-)

代理已连接并且完全在线,并忙于为待处理的作业队列提供服务。

然后一段时间后,不确定,有些工作最后> > 24小时都没有失败过,其他工作最后几分钟有时也会失败。

我尝试过的事情:(一些)

Swarm Client jar 可以使用 WebSockets 并连接到 Jenkins 控制器的 FQDN,或者使用 JNLP 协议连接到 IP 和专用代理连接端口(控制器上的固定值)。 这两种协议都可以看到类似的行为。

打开所有 AWS 安全组:以防还有另一个未提及的端口需要打开。 绕过 AWS 负载均衡器:代理通过 JNLP 直接连接到控制器 IP:PORT 匹配版本:从控制器下载的Swarm客户端 更新版本:Jenkins 2.319.3、2.332.1 标准化 Java 环境:Java 11 64 位操作系统 在代理上启用日志记录:定期进行通信,然后在一段时间后停止,没有明显的原因。 增加控制器实例大小:m5.xlarge -> m5.2x大号

Jenkins Controller reports : Unable to create live FilePath for i-xxxxxxxxxxxxx and Agent is marked Offline

Googling this error indicates that it is a problem with the communication paths between Controller and Agent, but what?

Background:

Jenkins Controller running v2.332.1, Java 11 64bit OS, inside a docker container
Jenkins Agents running Swarm-Client jar downloaded from the Controller on startup. Swarm Plugin Version 3.32 Java 11 and 64bit OS, inside a docker container

Agents and Controller are hosted on separate EC2 instances in AWS with Security Group permissions on the relevant ports.

The Instance starts up runs the Cloud-Init, downloads the swarm-client.jar from Jenkins Controller and then runs it with the parameters required to connect to the controller. I mention this to avoid the "are you using the correct version" comments :-)

The Agent connects and is all fully online and gets busy servicing the pending Job queue.

Then some time later, indeterminate, some jobs last > 24 hours and have not failed, other jobs last minutes and sometimes fail.

Things I have tried: (some)

The Swarm Client jar can use either WebSockets and connect to the FQDN of the Jenkins controller or use the JNLP protocol to connect to the IP and dedicated agent connection port (fixed value on the Controller).
Similar behavior is seen with either protocols.

Opening all the AWS Security Groups: incase there was another port, not mentioned, that needed to be open.
Bypass AWS Load balancer: Agent connects directly to Controller IP:PORT via JNLP
Matching Versions: Swarm Client downloaded from Controller
Updated Versions: Jenkins 2.319.3, 2.332.1
Normalized Java environments: Java 11 64bit OS
Enabled Logging on the Agents: periodic communications happens and then stops after a while, without obvious reason.
Increased Controller Instance size: m5.xlarge -> m5.2xlarge

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

倾城泪 2025-01-24 20:53:06

通过升级到 Jenkins 2.344 修复

Fixed by upgrading to Jenkins 2.344

浅沫记忆 2025-01-24 20:53:06

将 Jenkins 升级到非 LTS 版本可以让连接变得更加稳定。
Jenkins 2.341 和 Swarm-Client 版本 3.32 都使用 Remoting 版本 4.13

现在,虽然我对运行非 LTS 版本不是特别高兴詹金斯,我很高兴找到了解决方法

实例的响应时间更好

Bumping Jenkins up to a non-LTS version allowed the connections to become more stable.
Jenkins 2.341 and Swarm-Client version 3.32 both use Remoting version 4.13

Now, while I am not particularly happy about running a non-LTS version of Jenkins, I am pleased to have found a workaround

Response times of the instances is better

嘴硬脾气大 2025-01-24 20:53:06

我也为这个问题苦苦挣扎,我在这里添加了详细信息,因此,其他人不必挣扎。

这就是我尝试的一切
当我们在主人和奴隶中拥有JDK 8时,我们的一切都在运行。
因此,我们在两者中都添加了JDK 11的代码,在ASG的帮助下,我用新的代码替换了Jenkins的EC2。
因此,问题来了,我们恢复了,但是问题仍然相同。
因此,我只是在詹金斯(Jenkins)上说Moveto jdk 11在詹金斯(Jenkins)上进行了警告,因为有些弃用之类的东西……所以,我也在检查我们也可以尝试使用这种新版本的詹金斯(Jenkins),他们已经提到了什么。 - 带有JDK8的Jenkins 2.344,同样的问题,以及不同的Jenkins版本没有帮助,我失去了希望。
我尝试使用最大的EC2类型的奴隶-DID无济于事
我检查了奴隶中的HTOP-毫无用处。
我尝试重新启动詹金斯大师 - 毫无用处。
我尝试将远程DIR更改为从堆叠溢出中提到的远程DIR-如果您无济于事。
因此,我有一个想法,因为Jenkins EC2被终止并出现了新的EC2,因此,这种情况可能会在Jenkins中进行更新...并警告显示有新版本的Jenkins和JDK 11 ..对我来说有点希望。
我尝试通过在奴隶设置中增加20分钟的Tomeout尝试,这无济于事。
我尝试添加此命令:jenkins ec2 plgin的初始脚本中的sudo yum -y更新 - 安全性 - 将无济于事。
我们尝试了JDK 11 Image,JDK8图像和新的JDK8 Jenkins版本图像,问题总体相同。

因此,最终解决问题的方法
我们搬到了较旧版本的詹金斯:
https://hub.docker.com/layers/jenkins/jenkins/jenkins/jenkins/jenkins/2.330-jdk8/images/sha256-97fcb […]

enter image description hereI have also struggled with this issue, I am adding details here, so, that others don't have to struggle.

This is all what i tried:
we had everything running when we had JDK 8 in both master and slave.
So, we added code to have JDK 11 in both and I replaced ec2 of Jenkins with a new one with help of ASG.
So, issue came, and we reverted, but still the issue was the same.
So, I was just assuming by this warning in jenkins as it says moveto jdk 11,as there anything like deprecated...so, I was just checking also we can try this new version of Jenkins as well, what they have mentioned. --going to Jenkins 2.344 with jdk8 ,same issue, and also to different jenkins version didn't help and I lost hope.
I have tried with a biggest ec2 type for slave --didn't help
I checked htop in slave --didn't help.
I tried restarting jenkins master --didn't help.
I tried changing remote dir for slave as mentioned in stack overflow --didn't help.
So, I have a thought, as Jenkins ec2 is terminated and new ec2 came up, so, things may get updated in jenkins by that...and also warning showing to have a new version of jenkins and jdk 11..so, that looked somewhat a hope to me.
I tried by increasing tomeout 20 min in slave setup, didn't help.
I tried adding this command :sudo yum -y update --security in init script of node of jenkins ec2 plgin--will not help.
we have tried jdk 11 image, jdk8 image and new jdk8 jenkins version image, issue was same in all.

So, what finally solved the issue:
that we moved to older version of jenkins:
https://hub.docker.com/layers/jenkins/jenkins/jenkins/2.330-jdk8/images/sha256-97fcb[…]17da34f0d07c021ab57083ee8c77dc4b21281d3498137?context=explore

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文