如何确定 Java 应用程序速度慢的原因

发布于 2024-09-15 10:44:45 字数 376 浏览 8 评论 0原文

我们有一个 Java ERP 类型的应用程序。服务器和客户端之间的通信是通过 RMI 进行的。在高峰时段,最多可以有 250 个用户登录,其中大约 20 个用户同时工作。这意味着在高峰时段的任何给定时间大约有 20 个线程处于活动状态。 服务器可以运行几个小时而没有任何问题,但突然响应时间变得越来越长。响应时间可能在几分钟内。

我们在带有 Sun JDK 1.6.0_16 的 Windows 2008 R2 上运行。我们一直在使用 perfmon 和 Process Explorer 来查看发生了什么。我们发现唯一奇怪的是,当服务器开始工作缓慢时,java.exe 进程打开的句柄数量约为 3500。我并不是说这是实际问题。

我只是好奇是否应该遵循一些准则才能查明问题。我应该使用什么工具? ....

We have an Java ERP type of application. Communication between server an client is via RMI. In peak hours there can be up to 250 users logged in and about 20 of them are working at the same time. This means that about 20 threads are live at any given time in peak hours.
The server can run for hours without any problems, but all of a sudden response times get higher and higher. Response times can be in minutes.

We are running on Windows 2008 R2 with Sun's JDK 1.6.0_16. We have been using perfmon and Process Explorer to see what is going on. The only thing that we find odd is that when server starts to work slow, the number of handles java.exe process has opened is around 3500. I'm not saying that this is the acual problem.

I'm just curious if there are some guidelines I should follow to be able to pinpoint the problem. What tools should I use? ....

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

彼岸花ソ最美的依靠 2024-09-22 10:44:45

您可以访问该应用程序的日志配置吗?

如果可以的话,您应该将日志级别更改为“DEBUG”。跟踪请求的 DEBUG 日志可以为您提供有关争用点的有用信息。

如果您不能,探查器工具可以帮助您:

  • VisualVM(免费且良好的产品)
  • Eclipse TPTP(免费,但比 VisualVM 更复杂)
  • JProbe(不是免费的,但非常强大。它是我最喜欢的 Java 分析器,但价格昂贵)

如果应用程序是使用 JMX 控制点开发的,您可以插入 JMX 查看器来获取信息...

如果您想对应用程序施加压力以触发问题(如果您想验证是否是收费问题),您可以使用压力工具,例如 JMeter

Can you access to the log configuration of this application.

If you can, you should change the log level to "DEBUG". Tracing the DEBUG logs of a request could give you a usefull information about the contention point.

If you can't, profiler tools are can help you :

  • VisualVM (Free, and good product)
  • Eclipse TPTP (Free, but more complicated than VisualVM)
  • JProbe (not Free but very powerful. It is my favorite Java profiler, but it is expensive)

If the application has been developped with JMX control points, you can plug a JMX viewer to get informations...

If you want to stress the application to trigger the problem (if you want to verify whether it is a charge problem), you can use stress tools like JMeter

灵芸 2024-09-22 10:44:45

听起来垃圾收集无法跟上并由于某种原因开始“停止世界”收集。

启动时附加JDK中的jvisualvm,看看性能下降时收集到的数据。

Sounds like the garbage collection cannot keep up and starts "halt-the-world" collecting for some reason.

Attach with jvisualvm in the JDK when starting and have a look at the collected data when the performance drops.

倾听心声的旋律 2024-09-22 10:44:45

您描述的问题很典型,但也很普遍。原因可能包括内存泄漏、资源争用等,以及不良的 GC 策略和堆/永久代空间分配。要指出应用程序的确切问题,您需要对其进行分析(我知道像 Yourkit 和 JProfiler 这样的工具)。如果您明智地分析您的应用程序,只有某些应用程序周期才会暴露问题,否则分析本身就不是很容易。

The problem you'r describing is quite typical but general as well. Causes can range from memory leaks, resource contention etcetera to bad GC policies and heap/PermGen-space allocation. To point out exact problems with your application, you need to profile it (I am aware of tools like Yourkit and JProfiler). If you profile your application wisely, only some application cycles would reveal the problems otherwise profiling isn't very easy itself.

归属感 2024-09-22 10:44:45

在类似的情况下,我自己编写了一个简单的分析代码。基本上,我使用了一个带有“StopWatch”(基于 LinkedHashMap)的 ThreadLocal,然后将这样的代码插入到应用程序的各个点中: watch.time("OperationX");

然后在线程完成任务后,我会调用 watch.logTime(); ,该类将编写如下所示的日志:[DEBUG] StopWatch time:Stuff=0, AnotherEvent=102, OperationX=150

之后,我编写了一个简单的解析器,从该日志(每个代码路径)生成 CSV。你能做的最好的事情就是创建一个直方图(可以使用 Excel 轻松完成)。平均值、中等甚至模式都可以欺骗你。我强烈建议创建直方图。

与此直方图一起,您可以使用平均/中/模式创建折线图(最好代表数据,您可以从直方图中确定这一点)。

这样,您就可以 100% 确切地确定哪些操作需要时间。如果您无法确定罪魁祸首,二分搜索是您的朋友(对事件进行细粒度)。

听起来可能很原始,但是很有效。此外,如果您用它创建一个库,则可以在任何项目中使用它。它也很酷,因为您也可以在生产中轻松打开它。

In a similar situation, I have coded a simple profiling code myself. Basically I used a ThreadLocal that has a "StopWatch" (based on a LinkedHashMap) in it, and I then insert code like this into various points of the application: watch.time("OperationX");

then after the thread finishes a task, I'd call watch.logTime(); and the class would write a log that looks like this: [DEBUG] StopWatch time:Stuff=0, AnotherEvent=102, OperationX=150

After this I wrote a simple parser that generates CSV out from this log (per code path). The best thing you can do is to create a histogram (can be easily done using excel). Averages, medium and even mode can fool you.. I highly recommend to create a histogram.

Together with this histogram, you can create line graphs using average/medium/mode (which ever represents data best, you can determine this from the histogram).

This way, you can be 100% sure exactly what operation is taking time. If you can't determine the culprit, binary search is your friend (fine grain the events).

Might sound really primitive, but works. Also, if you make a library out of it, you can use it in any project. It's also cool because you can easily turn it on in production as well..

酒中人 2024-09-22 10:44:45

除了其他人提到的 GC 之外,尝试在减速期间每 5-10 秒进行一次线程转储,持续大约 30 秒。在某些情况下,数据库调用、Web 服务或其他依赖项可能会变慢。如果您查看胎面转储,您将能够看到似乎没有移动的螺纹,并且您可以通过这种方式缩小罪魁祸首的范围。

从 GC 的角度来看,您是否监控这些时间段的 CPU 使用情况?如果 GC 频繁运行,您会发现整体 CPU 使用率猛增。

如果这是一个 Solaris 机器,prstat 将是您的朋友。

Aside from the GC that others have mentioned, Try taking thread dumps every 5-10 seconds for about 30 seconds during your slow down. There could be a case where DB calls, Web Service, or some other dependency becomes slow. If you take a look at the tread dumps you will be able to see threads which don't appear to move, and you could narrow your culprit that way.

From the GC stand point, do you monitor your CPU usage during these times? If the GC is running frequently you will see a jump in your overall CPU usage.

If only this was a Solaris box, prstat would be your friend.

淡水深流 2024-09-22 10:44:45

对于像这样的尖锐问题,快速的 jstack 应该能够快速指出问题区域。也许没有必要花太多心思。

如果我不得不猜测的话,我会说 Hotspot 介入并严格优化了一些写得不好的代码。 Netbeans 逐渐停止,它使用 WeakHashMap 和新创建的对象来缓存文件数据。优化后,条目在添加后可以直接从地图中删除。显然,如果依赖缓存,就会发生大量文件活动。您可能不会看到驱动器亮起,因为它都会被操作系统缓存。

For acute issues like this a quick jstack <pid> should quickly point out the problem area. Probably no need to get all fancy on it.

If I had to guess, I'd say Hotspot jumped in and tightly optimised some badly written code. Netbeans grinds to a halt where it uses a WeakHashMap with newly created objects to cache file data. When optimised, the entries can be removed from the map straight after being added. Obviously, if the cache is being relied upon, much file activity follows. You probably wont see the drive light up, because it'll all be cached by the OS.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文