GC 花了三个小时才减少了 1.2GB 的堆,可能是什么原因?
在我们的一台服务器中,垃圾收集花了近三个小时才尝试(成功)减少 1.2GB 堆内存。从 1.4GB 到 200MB。
在此期间,CPU 使用率很高,几乎达到 80-100%。可能是什么原因?我们有 4 个具有相同配置(JVM 设置、服务器配置、硬件、网络)的此类服务器,假设没有人对其进行任何更改,那么特定服务器运行 3 小时 GC 的原因可能是什么。
所有其他服务器的每次 GC 活动仅花费 5 到 10 分钟。
请附上 HP BAC 的图表以供您参考。显示我认为 GC 启动的时间以及 GC 停止的时间。
(正如 Stephen 指出的更结论性的发现)当服务器管理员回复我时提供这些信息:
- 您的 JVM 的确切版本 使用。 (标准 Java SE 1.4.2)
- JVM 选项。 (即将推出)
- 详细信息 Web 容器/服务器基础。 (即将推出)
- 有关服务内容的信息 做。任何相关线索 服务器/服务日志文件(即将推出)
- 请求日志中的任何相关模式(即将推出)
- GC 日志的时间 事件。 (如果您目前没有 启用 GC 日志记录,您可能需要 启用它并等待问题出现 重复发生。)(即将)
in one of our servers, Garbage Collection took nearly three hours to try to bring down (successfully) 1.2GB of heap memory. From 1.4GB to 200MB.
During this time the CPU usage was high, almost 80-100%. What could be the reason? We have 4 of such servers with the same configuration (JVM settings, server configuration, hardware, network), assuming nobody has made any changes to it, what could be the reason that the particular server ran a 3 hours GC.
All the other servers were taking only 5 to 10 minutes for each GC activity.
Kindly attached a graph from HP BAC for your easy reference. Shows the time where i suppose GC kicked in, and when GC stopped.
(As Stephen points out for more conclusive findings) Providing these information when the server administrator gets back to me:
- The exact version of the JVM you are
using. (Standard Java SE 1.4.2) - The JVM options. (Coming)
- Details of
the web container / server base. (Coming) - Information about what the service
does. Any relevant clues from the
server / service log files (Coming) - Any relevant patterns in the request logs (Coming)
- The GC logs for the time of the
event. (If you don't currently have
GC logging enabled, you may need to
enable it and wait until the problem
recurs.) (Coming)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里没有太多数据可供使用,但我的直觉是:你正在交换。我们唯一一次看到 GC 时间达到如此高的水平是当您过度使用该框并且它正在分页到磁盘时。这可能会使性能下降一个数量级(或更多)。
您需要收集操作系统(以及可能的虚拟机管理程序,如果适用)交换统计数据来证明或反驳这一理论。
(我知道 CPU 时间比我预期的交换时间要长,但你永远不知道。)
如果你发布硬件配置、“java -version”信息和 JVM 命令行参数(例如:-Xmx和-Xms)来帮助缩小你真正运行的范围。
There's not much data to work from here, but my hunch: you're swapping. The only time we ever see GC times go that high is when you've overcommitted the box and it's paging to disk. That can turn things into an order of magnitude (or more) performance degredation.
You need to gather OS (and potentially hypervisor if it applies) swapping statistics to prove or disprove this theory.
(I know CPU time is higher than I'd expect for swapping, but you never know.)
It would also be helpful if you posted the hardware configuration, "java -version" information, and JVM command line arguments (eg: -Xmx and -Xms) to help narrow down what you're really running.
您没有提供太多信息,但可能的原因可能是:
您的应用程序中存在错误;例如,具有某些相当特殊特征的内存泄漏,或者持续耗尽内存然后重新启动的任务。
意外或故意的拒绝服务攻击;例如,某些客户端不断重试过大的请求,每次都使用减小“问题大小”的参数。
具有某些特征的单个运行时间极长的请求。
Thrashing - 请参阅@Trent Gray-Donald 的回答。 (如果您过度分配了内存,那么 GC 算法(涉及查看随机分散在许多页面上的大量对象)很可能会引发系统抖动。我只是不确定这是否会导致堆使用量像您一样逐渐下降正在看到。)
JVM 设置的病态组合。
您正在使用的特定 JVM 中的垃圾收集器中存在错误。
以上的一些组合。
这种问题需要获得 Oracle/Java 支持合同。
以下信息可能有助于诊断此问题:
You don't provide much information, but possible reasons might be:
Bugs in your application; e.g. a memory leak with some rather peculiar characteristics, or a task that kept on running out of memory and then restarting.
An accidental or deliberate denial of service attack; e.g. some client that keeps retrying an over-sized request with parameters that reduce the "problem size" each time.
A single extremely long-running request with certain characteristics.
Thrashing - see @Trent Gray-Donald's answer. (If you have overallocated memory, then the GC algorithms, which involve looking at lots objects scattered randomly over lots of pages, are highly likely to provoke thrashing. I'm just not sure that this would result in a gradually falling heap usage like you are seeing.)
A pathological combination of JVM settings.
A bug in the garbage collector in the particular JVM you are using.
Some combination of the above.
This is the kind of problem that would warrant getting an Oracle / Java support contract.
The following information might help diagnose this: