JNDI 查找时间差异巨大
我们在 Weblogic 10.3 上运行的旧版 J2EE Web 应用程序的响应时间上遇到了巨大的差异。该系统由两个在同一台物理服务器上运行的 Weblogic 服务器实例(前端和后端)和一个在单独主机上运行的 Oracle 数据库组成。每次登录系统时间超过四秒时,外部测量工具都会向我们发出警报。最近这些警告频繁出现。查看处理登录请求的 servlet 写入的日志可以发现,时间花费在从前端到后端的 EJB 调用上。
测量时间示例:
time ms
8:40:43 25
8:42:14 26
8:44:04 26
8:44:25 26
8:44:47 26
8:46:06 26
8:46:41 7744
8:47:00 27
8:47:37 27
8:49:00 26
8:49:37 26
8:50:03 8213
8:50:57 27
8:51:04 26
8:51:06 25
8:57:26 2545
8:58:13 26
9:00:06 5195
可以看出,大多数请求(70%,取自较大样本)及时完成,但其中很大一部分需要很长时间才能完成。
测量时间内执行的步骤如下:
- 提供身份验证接口的会话 Bean 的 JNDI 查找(前端)
- 调用会话 Bean 的身份验证方法(前端 -> 后端)
- 从连接池保留 JDBC 连接(后端)
- 查询用户数据库(表大小适中,表应有适当的索引)(后端)
- 读取结果集,创建POJO用户对象(后端)
- 返回POJO用户对象(后端->前端)
服务器机器上的负载很小(99%空闲)并且用户数量非常适中。两台服务器上 Weblogic 报告的可用内存量在 60% 到 90% 之间变化。记录垃圾收集。主要收集很少见,并且在发生时会在 2-3 秒内完成。此外,当响应时间较长时,主要的 GC 似乎不会同时发生。在繁忙时间和非繁忙时间都会出现较长的响应时间。 JDBC 连接池最大大小当前设置为 80,这超过了并发用户数。
更新:
获得了重新启动系统的权限,并添加了更多性能日志记录。日志清楚地表明JNDI查找是花费时间的部分:
03:01:23.977 PERFORMANCE: looking up foo.bar.Bar from JNDI took 6 ms
03:14:47.179 PERFORMANCE: looking up foo.bar.Bar from JNDI took 2332 ms
03:15:55.040 PERFORMANCE: looking up foo.bar.Bar from JNDI took 1585 ms
03:29:25.548 PERFORMANCE: looking up foo.bar.Bar from JNDI took 7 ms
03:31:09.010 PERFORMANCE: looking up foo.bar.Bar from JNDI took 6 ms
03:44:25.587 PERFORMANCE: looking up foo.bar.Bar from JNDI took 6 ms
03:46:00.289 PERFORMANCE: looking up foo.bar.Bar from JNDI took 7 ms
03:59:28.028 PERFORMANCE: looking up foo.bar.Bar from JNDI took 2052 ms
查看前端和后端的GC日志可以看到,当发生缓慢的JNDI查找时,GC没有完成。
创建会话时,上下文通过以下方式获取:
Hashtable ht = new Hashtable();
ht.put(Context.PROVIDER_URL, url);
ht.put(Context.INITIAL_CONTEXT_FACTORY, "weblogic.jndi.WLInitialContextFactory");
jndiContext = new InitialContext(ht);
其中 url
是指向后端服务器的 DNS 名称和端口的 t3 url。这应该没问题吧?
首先想到的是缓存从 JNDI 获取的引用,至少这是 10 年前的首选方法...但是 Weblogic 的 InitialContext 实现不应该已经执行此缓存吗?或者它真的从 JNDI 获取引用吗?每次调用的后端服务器?
是什么导致 JNDI 查找频繁缓慢?是否有解决方法(例如缓存引用会有所帮助)?
We're experiencing huge variance in response times on a legacy J2EE web application running on Weblogic 10.3. The system consists of two Weblogic server instances (frontend and backend) which run on the same physical server machine and an Oracle database running on a separate host. External measurement tool alerts us every time a login to the system takes more than four seconds. Lately these warnings have been frequent. Looking at the log written by the servlet handling the login requests reveals that the time is spent on an EJB call from the frontend to the backend.
Example of the measured times:
time ms
8:40:43 25
8:42:14 26
8:44:04 26
8:44:25 26
8:44:47 26
8:46:06 26
8:46:41 7744
8:47:00 27
8:47:37 27
8:49:00 26
8:49:37 26
8:50:03 8213
8:50:57 27
8:51:04 26
8:51:06 25
8:57:26 2545
8:58:13 26
9:00:06 5195
As can be seen, most of the requests (70%, taken from a larger sample) complete in timely fashion but a significant portion of them take a very long time to complete.
The steps performed during the measured time are the following:
- JNDI lookup of the session bean offering the authentication interface (frontend)
- Calling the authentication method of the session bean (frontend->backend)
- Reserving a JDBC connection from the connection pool (backend)
- Making a query to the user database (table size is very moderate and the table should be properly indexed) (backend)
- Reading the result set, creating the POJO user object (backend)
- Returning the POJO user object (backend->frontend)
The load on the server machine is very small (99% idle) and the number of users is very moderate. Amount of free memory reported by Weblogic varies between 60% and 90% on both servers. Garbage collection is logged. Major collections are rare and complete in 2-3 seconds when they do occur. Further, the major GC occurances seem not to happen at the same time when long response times are seen. Long response times occur on both busy and non-busy hours. The JDBC connection pool max size is currently set to 80 which is more than the number of concurrent users.
Update:
Got a permission to reboot the system with some more performance logging added. The log clearly shows that the JNDI lookup is the part where the time is spent:
03:01:23.977 PERFORMANCE: looking up foo.bar.Bar from JNDI took 6 ms
03:14:47.179 PERFORMANCE: looking up foo.bar.Bar from JNDI took 2332 ms
03:15:55.040 PERFORMANCE: looking up foo.bar.Bar from JNDI took 1585 ms
03:29:25.548 PERFORMANCE: looking up foo.bar.Bar from JNDI took 7 ms
03:31:09.010 PERFORMANCE: looking up foo.bar.Bar from JNDI took 6 ms
03:44:25.587 PERFORMANCE: looking up foo.bar.Bar from JNDI took 6 ms
03:46:00.289 PERFORMANCE: looking up foo.bar.Bar from JNDI took 7 ms
03:59:28.028 PERFORMANCE: looking up foo.bar.Bar from JNDI took 2052 ms
Looking at the GC logs of the frontend and backend shows that GC is not done when the slow JNDI lookups occur.
The context is got the following way when a session is created:
Hashtable ht = new Hashtable();
ht.put(Context.PROVIDER_URL, url);
ht.put(Context.INITIAL_CONTEXT_FACTORY, "weblogic.jndi.WLInitialContextFactory");
jndiContext = new InitialContext(ht);
where url
is a t3 url pointing to the DNS name and port of the backend server. This should be ok, right?
First thing to pop into mind is to cache the references got from JNDI, at least this was the preferred way to go 10 years ago... But shouldn't Weblogic's InitialContext implementation already do this caching, or does it really fetch the reference from the backend server on each call?
What could be causing the frequent slow JNDI lookups? Is there a workaround for this (e.g. would caching the references help)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我们所说的任何事情都可能是猜测。不要玩这个游戏,这里有一些调查问题的建议:
Anything we say is likely to be a guess. Rather than playing that game, here are some suggestions for investigating the problem:
Thread.sleep(...)
calls. (Ooops ... that's a guess.)作为第一步,我将尝试通过记录每个步骤所花费的时间来隔离执行的这些步骤的哪一部分导致了问题。这样你就可以消除不相关的事情并专注于正确的领域,当你弄清楚时可以再次在这里发布任何内容,以便人们可以提供具体的建议。
As a first step, I would to try and isolate which part of those steps performed caused the problem by logging the amount of time each of them take. That way you can eliminate things that are irrelevant and focus on the right area, any may be post here again when you figure that out so that people can give specific advice.
正如 StephenC 所说,其中一些是猜测,中间没有足够的日志语句。您已经清楚地列出了事务中的每个元素,但我假设您没有 logger.debug,您可以打开它,上面有时间戳。
需要注意的一些问题:
每个前端和后端 bean 的池中有多少个 bean - 它应该是
weblogic-ejb 的
max-beans-in-free-pool
元素-jar.xml如果对后端 EJB 的请求多于可用的 beans,则将出现等待堆积。
同样,在 JDBC 前端,您可以使用 Weblogic 控制台来监视获取连接的任何争用 - 您是否在 JDBC 监视选项卡中遇到高计数和等待?这应该是接下来要检查的事情。
As StephenC says, some of this is guesswork without adequate logging statements in between. You have clearly listed each element in the transaction, but I assume you don't have a logger.debug which you can switch on which has timestamps on it.
Some questions to look at:
How many beans in pool for each of the front end and backend beans - it should be the
max-beans-in-free-pool
element of theweblogic-ejb-jar.xml
If you have more requests to the backend EJB than beans available, then there will be a waiting pileup.
Similarly on the JDBC front, you can use the Weblogic console to monitor any contention with getting connections - are you hitting High Counts and Waits in the JDBC Monitoring tab? That should be the next thing to check.