Tomcat 停止响应 JK 请求

发布于 2024-09-01 20:16:37 字数 2403 浏览 2 评论 0原文

我遇到了负载平衡 Tomcat 服务器挂起的严重问题。任何帮助将不胜感激。

我在 HotSpot Server 14.3-b01 (Java 1.6.0_17-b04) 上运行 Tomcat 6.0.26 的系统

位于另一台充当负载平衡器的服务器后面。负载均衡器运行 Apache (2.2.8-1) + MOD_JK (1.2.25)。所有服务器都运行 Ubuntu 8.04。

Tomcat 配置了 2 个连接器:一个 AJP,一个 HTTP。 AJP 将与负载均衡器一起使用,而开发团队使用 HTTP 直接连接到选定的服务器(如果我们有理由这样做)。

我在 Tomcat 服务器上安装了 Lambda Probe 1.7b,以帮助我诊断和修复即将描述的问题。

问题 问题

如下:应用程序服务器启动大约 1 天后,JK 状态管理器开始报告 Tomcat2 的状态 ERR。它只会陷入这种状态,到目前为止我发现的唯一修复方法是 ssh 盒子并重新启动 Tomcat。

我还必须提到,当 Tomcat 服务器处于此状态时,JK 状态管理器需要更长的时间来刷新。

最后,JK 状态管理器上卡住的 Tomcat 的“繁忙”计数始终很高,并且本身不会下降 - 我必须重新启动 Tomcat 服务器,等待,然后重置 JK 上的工作程序。

分析

由于我在每个 Tomcat 上都有 2 个连接器(AJP 和 HTTP),因此我仍然可以通过 HTTP 连接到应用程序。该应用程序像这样运行得很好,非常非常快。这是完全正常的,因为我是唯一使用该服务器的人(因为 JK 停止将请求委托给该 Tomcat)。

为了更好地理解这个问题,我从不再响应的 Tomcat 以及最近重新启动的另一个 Tomcat(例如 1 小时前)获取了线程转储。

正常响应 JK 的实例显示大多数 TP-ProcessorXXX 线程处于“可运行”状态,并具有以下堆栈跟踪:

java.net.SocketInputStream.socketRead0 ( native code )
java.net.SocketInputStream.read ( SocketInputStream.java:129 )
java.io.BufferedInputStream.fill ( BufferedInputStream.java:218 )
java.io.BufferedInputStream.read1 ( BufferedInputStream.java:258 )
java.io.BufferedInputStream.read ( BufferedInputStream.java:317 )
org.apache.jk.common.ChannelSocket.read ( ChannelSocket.java:621 )
org.apache.jk.common.ChannelSocket.receive ( ChannelSocket.java:559 )
org.apache.jk.common.ChannelSocket.processConnection ( ChannelSocket.java:686 )
org.apache.jk.common.ChannelSocket$SocketConnection.runIt ( ChannelSocket.java:891 )
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run ( ThreadPool.java:690 )
java.lang.Thread.run ( Thread.java:619 )

卡住的实例显示大多数 (所有?)的 TP-ProcessorXXX 线程处于“等待”状态。它们具有以下堆栈跟踪:

java.lang.Object.wait ( native code )
java.lang.Object.wait ( Object.java:485 )
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run ( ThreadPool.java:662 )
java.lang.Thread.run ( Thread.java:619 ) 

我不知道 Tomcat 的内部结构,但我推断“等待”线程只是位于线程池上的线程。那么,如果它们是在线程池内等待的线程,为什么 Tomcat 不让它们处理来自 JK 的请求呢?

编辑:我不知道这是否正常,但 Lambda Probe 在“状态”部分向我显示,有很多线程处于 KeepAlive 状态。这与我遇到的问题有某种关系吗?

解决方案?

因此,正如我之前所说,我发现的唯一修复方法是停止 Tomcat 实例,停止 JK Worker,等待后者的繁忙计数慢慢下降,再次启动 Tomcat,然后启用再次是JK工人。

是什么导致了这个问题?我该如何进一步调查呢?我能做什么来解决它?

提前致谢。

I have a nasty issue with load-balanced Tomcat servers that are hanging up. Any help would be greatly appreciated.

The system

I'm running Tomcat 6.0.26 on HotSpot Server 14.3-b01 (Java 1.6.0_17-b04) on three servers sitting behind another server that acts as load balancer. The load balancer runs Apache (2.2.8-1) + MOD_JK (1.2.25). All of the servers are running Ubuntu 8.04.

The Tomcat's have 2 connectors configured: an AJP one, and a HTTP one. The AJP is to be used with the load balancer, while the HTTP is used by the dev team to directly connect to a chosen server (if we have a reason to do so).

I have Lambda Probe 1.7b installed on the Tomcat servers to help me diagnose and fix the problem soon to be described.

The problem

Here's the problem: after about 1 day the application servers are up, JK Status Manager starts reporting status ERR for, say, Tomcat2. It will simply get stuck on this state, and the only fix I've found so far is to ssh the box and restart Tomcat.

I must also mention that JK Status Manager takes a lot longer to refresh when there's a Tomcat server in this state.

Finally, the "Busy" count of the stuck Tomcat on JK Status Manager is always high, and won't go down per se -- I must restart the Tomcat server, wait, then reset the worker on JK.

Analysis

Since I have 2 connectors on each Tomcat (AJP and HTTP), I still can connect to the application through the HTTP one. The application works just fine like this, very, very fast. That is perfectly normal, since I'm the only one using this server (as JK stopped delegating requests to this Tomcat).

To try to better understand the problem, I've taken a thread dump from a Tomcat which is not responding anymore, and from another one that has been restarted recently (say, 1 hour before).

The instance that is responding normally to JK shows most of the TP-ProcessorXXX threads in "Runnable" state, with the following stack trace:

java.net.SocketInputStream.socketRead0 ( native code )
java.net.SocketInputStream.read ( SocketInputStream.java:129 )
java.io.BufferedInputStream.fill ( BufferedInputStream.java:218 )
java.io.BufferedInputStream.read1 ( BufferedInputStream.java:258 )
java.io.BufferedInputStream.read ( BufferedInputStream.java:317 )
org.apache.jk.common.ChannelSocket.read ( ChannelSocket.java:621 )
org.apache.jk.common.ChannelSocket.receive ( ChannelSocket.java:559 )
org.apache.jk.common.ChannelSocket.processConnection ( ChannelSocket.java:686 )
org.apache.jk.common.ChannelSocket$SocketConnection.runIt ( ChannelSocket.java:891 )
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run ( ThreadPool.java:690 )
java.lang.Thread.run ( Thread.java:619 )

The instance that is stuck shows most (all?) of the TP-ProcessorXXX threads in "Waiting" state. These have the following stack trace:

java.lang.Object.wait ( native code )
java.lang.Object.wait ( Object.java:485 )
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run ( ThreadPool.java:662 )
java.lang.Thread.run ( Thread.java:619 ) 

I don't know of the internals of Tomcat, but I would infer that the "Waiting" threads are simply threads sitting on a thread pool. So, if they are threads waiting inside of a thread pool, why wouldn't Tomcat put them to work on processing requests from JK?

EDIT: I don't know if this is normal, but Lambda Probe shows me, in the Status section, that there are lots of threads in KeepAlive state. Is this somehow related to the problem I'm experiencing?

Solution?

So, as I've stated before, the only fix I've found is to stop the Tomcat instance, stop the JK worker, wait the latter's busy count slowly go down, start Tomcat again, and enable the JK worker once again.

What is causing this problem? How should I further investigate it? What can I do to solve it?

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

甜点 2024-09-08 20:16:37

您是否配置了 JVM 内存设置和垃圾收集?您可以在设置 CATALINA_OPTS

示例时执行此操作:

CATALINA_OPTS="$CATALINA_OPTS -server -Xnoclassgc -Djava.awt.headless=true"
CATALINA_OPTS="$CATALINA_OPTS -Xms1024M -Xmx5120M -XX:MaxPermSize=256m"
CATALINA_OPTS="$CATALINA_OPTS -XX:-UseParallelGC"
CATALINA_OPTS="$CATALINA_OPTS -Xnoclassgc"

GC 设置最佳的原则有多种。这取决于您正在执行的代码类型。上面的配置最适合 JSP 密集型环境(标签库而不是 MVC 框架)。

Do you have JVM memory settings and garbage collection configured? You would do this where you set your CATALINA_OPTS

examples:

CATALINA_OPTS="$CATALINA_OPTS -server -Xnoclassgc -Djava.awt.headless=true"
CATALINA_OPTS="$CATALINA_OPTS -Xms1024M -Xmx5120M -XX:MaxPermSize=256m"
CATALINA_OPTS="$CATALINA_OPTS -XX:-UseParallelGC"
CATALINA_OPTS="$CATALINA_OPTS -Xnoclassgc"

There are multiple philosophies on which GC setting is best. It depends on the kind of code that you are executing. The config above worked best for a JSP-intensive environment (taglibs instead of MVC framework).

暖风昔人 2024-09-08 20:16:37

首先检查您的日志文件。

我认为默认日志文件位于/var/log/daemon.log。 (该文件不仅仅包含来自tomcat的日志)

Check your log file first.

I think the default log file is located in /var/log/daemon.log. (this file does not contains only the logs from tomcat)

灯角 2024-09-08 20:16:37

检查您的保活时间设置。看来您正在使线程进入保持活动状态,并且它们不会超时。您的服务器似乎没有在合理的时间内检测到客户端断开连接。涉及几个超时和计数变量。

Check your keepalive time setting. It seems you are getting threads into keepalive state, and they don't time out. It appears your server is not detecting client disconnects within a reasonable time. There are several timeout and count variables involved.

秋千易 2024-09-08 20:16:37

我在 Weblogic 上也遇到过类似的问题。原因是太多线程正在等待网络响应,并且 Weblogic 内存不足。 Tomcat 的行为方式可能相同。您可以尝试的操作有:

  • 减少连接的超时值。
  • 减少同时连接的总数,以便 tomcat 在达到该数量时不会启动新线程。
  • 很容易修复,但无法纠正根本原因:可能是 tomcat 处于内存不足状态,即使它尚未显示在日志中。像前面描述的那样增加tomcat的内存。

I've had a similar problem with Weblogic. The cause was that too many threads were waiting for network responses and Weblogic was running out of memory. Tomcat probably behaves the same way. Things you can try are:

  • Decrease the timeout value of your connections.
  • Decrease the total amount of simultaneous connections, so that tomcat doesn't start new threads when that amount is reached.
  • Easy fix, but doesn't correct the root cause: It might be that tomcat is in out of memory state, even though it's not showing up in the logs yet. Increase tomcat's memory like previously described.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文