是什么导致从 Websphere 内部对 Oracle 的 JDBC 调用出现峰值?
我想知道是否有人可以阐明以下问题:
我们发现,在 AIX 上的 Websphere 6.1 上运行的基于 Spring 2.5.6 的 Web 服务中,对 Oracle 64 位 10.2.0.5 的调用的 JDBC 调用出现峰值。 0 JDBC 驱动程序版本为 10.2.0.3.0。
我们使用单个线程访问数据库,Web 服务的平均响应时间为 16 毫秒,但我们看到 11 个大约 1 秒或更长的峰值(其中 5 分钟内约有 11,000 个调用)。 Introscope 告诉我们,大约一半的峰值是由“select 1 from Dual”(Websphere 连接池用于验证连接)引起的。
在数据库方面,我们跟踪了 Websphere 连接池创建的会话,没有一个会话不表明数据库内部存在任何峰值。
关于可能导致这些峰值的原因有什么想法/建议吗?
编辑:
我们的连接池设置有 20 个连接,监控显示仅使用了 1 个连接。
EDIT2:
我们已将 Oracle JDBC 驱动程序升级到 10.2.0.5,没有任何区别。
I was wondering whether someone can shed some light on the following issue:
We've been seeing spikes for JDBC calls from within a Spring 2.5.6 based web service run on Websphere 6.1 on AIX for calls into Oracle 64-bit 10.2.0.5.0 The JDBC driver version is 10.2.0.3.0.
We're hitting the database with a single thread, the average response time is for the web service is 16ms, but we're seeing 11 spikes of about 1 seconds or higher (amongst about 11,000 calls in 5 minutes). Introscope is telling us that about half these spikes are caused by "select 1 from dual" (which the Websphere connection pool uses to validate the connection).
On the database side, we've traced the sessions created by the Websphere connection pool, and none that does not indicate any spikes inside the database.
Any ideas/suggestions on what could be causing these spikes?
EDIT:
Our connection pool is set up with 20 connections, and monitoring is showing that only one connection is used.
EDIT2:
We've upgraded our Oracle JDBC driver to 10.2.0.5 with no difference.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
也许是游泳池的尺寸不合适。
5 分钟(300 秒)内 11,000 个呼叫意味着每秒 37 个呼叫。每个连接平均 0.016 秒意味着每个连接可以处理 2,313 个调用。 4-5 的池大小应该能够处理流量。我不知道如果请求最终等待连接可用,其中一个查询是否运行得有点长。
池将执行“SELECT 1 FROM DUAL”查询来检查连接是否有效且可用。
您可以尝试增加池的大小或查看一些其他参数来控制池对连接执行的操作以确保其处于活动状态。
Perhaps it's a pool that's not sized properly.
11,000 calls in 5 minutes, or 300 seconds, means 37 calls per second. An average of 0.016 seconds per connection means that you can handle 2,313 calls per connection. A pool size of 4-5 should be able to handle the traffic. I don't know if one of those queries runs a little long if a request ends up waiting for a connection to become available.
The 'SELECT 1 FROM DUAL' query is what the pool will execute to check and see if the connection is live and usable.
You could try increasing the size of the pool or looking at some of the other parameters that govern what the pool does with a connection to ensure that it's live.
这个问题的答案最终与 WebSphere 或 Oracle 无关,而是一个很好的老式网络配置问题,导致 WebSphere 服务器和 Oracle RAC 集群之间的 TCP 重新传输超时。
为了获得该诊断结果,我在测试运行之前和之后查看了
netstat -p tcp
的输出,发现统计数据正在增加。现在可以使用以下命令查看重传超时算法配置:
这表明重传超时将持续 1 到 64 秒,并且会逐渐减少,这解释了为什么我们看到了 1 秒、2 秒、4 秒的峰值, 10 秒和 22 秒,但与这些峰值相差无几(即没有 6 秒峰值)。
修复网络配置后,问题就消失了。
The answer to this problem ended up not being related to WebSphere or Oracle but was a good old fashioned network configuration problem which resulted in TCP retransmission timeouts between the WebSphere server and the Oracle RAC cluster.
In order to arrive at that diagnostic I was looking at the output of
netstat -p tcp
before and after a test run and found that thestat was increasing. Now the Retransmission Timeout Algorithm configuration can be viewed using:
Which indicates that the retransmission timeouts will take between 1 and 64 seconds and will back-off increasingly, which explains why we've been seeing spikes of 1 second, 2 seconds, 4 second, 10 seconds and 22 seconds but nothing away from these peaks (i.e. no 6 second spike).
Once the network config was fixed, the problem went away.
是否关闭 "预测试新连接" 有帮助吗?
Does switching off "Pretest new connections" help?