在生产环境中分析 Web 应用程序的性能成本

发布于 2024-07-29 03:22:15 字数 690 浏览 2 评论 0 原文

我正在尝试解决大型且复杂的 tomcat java web 应用程序的性能问题。 目前最大的问题是,内存使用量会不时出现峰值并且应用程序变得无响应。 我已经通过日志分析器和日志文件的贝叶斯分析修复了所有可以修复的问题。 我正在考虑在生产 tomcat 服务器上运行探查器。

给具有温和敏感性的读者的注释:

我知道有些人可能会觉得分析生产应用程序的概念本身就是一种冒犯。 请放心,我已经用尽了大多数其他选项。 我考虑这一点的原因是我没有资源在我的测试服务器上完全复制我们的生产设置,并且我无法在我的测试服务器上引起感兴趣的故障。

问题:

我正在寻找适用于在 tomcat 上运行的 java web 应用程序的答案,或者以与语言无关的方式回答这个问题。

  • 分析的性能成本是多少?
  • 还有其他原因说明远程连接和分析生产中的 Web 应用程序是一个坏主意(奇怪的故障模式、安全问题等)?
  • 分析对内存占用有多大影响?
  • 具体来说,是否有性能成本非常低的java分析工具?
  • 有没有专为分析 Web 应用程序而设计的 java 分析工具?
  • 有人有 VisualVM 分析性能成本的基准吗?
  • VisualVM 可以扩展到什么大小的应用程序和数据集?

I am attempting to solve performance issues with a large and complex tomcat java web application. The biggest issue at the moment is that, from time to time, the memory usage spikes and the application becomes unresponsive. I've fixed everything I can fix with log profilers and Bayesian analysis of the log files. I'm considering running a profiler on the production tomcat server.

A Note to the Reader with Gentle Sensitivities:

I understand that some may find the very notion of profiling a production app offensive. Please be assured that I have exhausted most of the other options. The reason I am considering this is that I do not have the resources to completely duplicate our production setup on my test server, and I have been unable to cause the failures of interest on my test server.

Questions:

I am looking for answers which work either for a java web application running on tomcat, or answer this question in a language agnostic way.

  • What are the performance costs of profiling?
  • Any other reasons why it is a bad idea to remotely connect and profile a web application in production (strange failure modes, security issues, etc)?
  • How much does profiling effect the memory foot print?
  • Specifically are there java profiling tools that have very low performance costs?
  • Any java profiling tools designed for profiling web applications?
  • Does anyone have benchmarks on the performance costs of profiling with visualVM?
  • What size applications and datasets can visualVM scale to?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

肤浅与狂妄 2024-08-05 03:22:15

OProfile 及其祖先 DPCI 是为分析生产系统而开发的。 这些的开销非常低,并且它们会分析您的整个系统,包括内核,因此您可以在虚拟机以及内核和库中发现性能问题。

回答您的问题:

  1. 开销:这些是采样分析器,也就是说,它们生成计时器或性能计数器会定期中断,并查看当前正在执行的代码。 他们用它来构建您花费时间的直方图,并且开销非常低(1-8% 是 他们声称)合理的采样间隔。

    查看此图表,显示 OProfile 的采样频率与开销。 如果默认值不符合您的喜好,您可以调整采样频率以降低开销。

  2. 生产中的使用:使用 OProfile 的唯一注意事项是您需要将其安装在生产计算机上。 我相信从 RHEL3 开始,Red Hat 就支持内核,而且我很确定其他发行版也支持它。

  3. 内存:我不确定 OProfile 的确切内存占用量是多少,但我相信它会保留相对较小的缓冲区,并偶尔将它们转储到日志文件中。

  4. Java: OProfile 包括支持 Java 并了解 JIT 中运行的代码的分析代理。 因此,您将能够看到 Java 调用,而不仅仅是解释器和 JIT 中的 C 调用。

  5. Web 应用程序:OProfile 是一个系统级分析器,因此它不知道 Web 应用程序将具有的会话、事务等内容。

    也就是说,它是一个全系统分析器,因此,如果您的性能问题是由操作系统和 JIT 之间的不良交互引起的,或者如果它位于某些第三方库中,那么您'您将能够看到这一点,因为 OProfile 分析了内核和库。 这对于生产系统来说是一个优势,因为您可以发现由于生产环境配置错误或测试环境中可能不存在的特殊情况而导致的问题。

  6. VisualVM:对此不确定,因为我没有使用 VisualVM 的经验

这是 使用 OProfile 查找性能瓶颈的教程

OProfile and its ancestor DPCI were developed for profiling production systems. The overhead for these is very low, and they profile your full system, including the kernel, so you can find performance problems in the VM and in the kernel and libraries.

To answer your questions:

  1. Overhead: These are sampled profilers, that is, they generate timer or performance counter interrupts at some regular interval, and they take a look at what code is currently executing. They use that to build a histogram of where you spend your time, and the overhead is very low (1-8% is what they claim) for reasonable sampling intervals.

    Take a look at this graph of sampling frequency vs. overhead for OProfile. You can tune the sampling frequency for lower overhead if the defaults are not to your liking.

  2. Usage in production: The only caveat to using OProfile is that you'll need to install it on your production machine. I believe there's kernel support in Red Hat since RHEL3, and I'm pretty sure other distributions support it.

  3. Memory: I'm not sure what the exact memory footprint of OProfile is, but I believe it keeps relatively small buffers around and dumps them to log files occasionally.

  4. Java: OProfile includes profiling agents that support Java and that are aware of code running in JITs. So you'll be able to see Java calls, not just the C calls in the interpreter and JIT.

  5. Web Apps: OProfile is a system-level profiler, so it's not aware of things like sessions, transactions, etc. that a web app would have.

    That said, it is a full-system profiler, so if your performance problem is caused by bad interactions between the OS and the JIT, or if it's in some third-party library, you'll be able to see that, because OProfile profiles the kernel and libraries. This is an advantage for production systems, as you can catch problems that are due to misconfigurations or particulars of the production environment that might not exist in your test environment.

  6. VisualVM: Not sure about this one, as I have no experience with VisualVM

Here's a tutorial on using OProfile to find performance bottlenecks.

熊抱啵儿 2024-08-05 03:22:15

我已经使用 YourKit 在高负载生产环境中分析应用程序,虽然肯定会产生影响,但它很容易被接受。 Yourkit 非常重要能够在非-侵入性方式,例如有选择地关闭某些更昂贵的分析功能(实际上,这是一个滑动比例)。

我最喜欢的一点是,您可以在运行 YourKit 代理的情况下运行虚拟机,并且对性能的影响为零。 只有当您连接 GUI 并开始分析时它才会产生效果。

I've used YourKit to profile apps in a high-load production environment, and while there was certainly an impact, it was easily an acceptable one. Yourkit makes a big deal of being able to do this in a non-invasive manner, such as selectively turning off certain profiling features that are more expensive (it's a sliding scale, really).

My favourite aspect of it is that you can run the VM with the YourKit agent running, and it has zero performance impact. it's only when you connect the GUI and start profiling that it has an effect.

十年不长 2024-08-05 03:22:15

分析生产应用程序没有任何问题。 如果您从事分布式应用程序,有时会在非常独特的概率场景中发生内存不足异常,而这种情况在开发/阶段/uat 环境中很难重现。

您可以尝试使用自定义分析器,但如果您很着急,并且在生产机器上插入/设置分析器将需要时间,您也可以使用 jvm 进行内存转储(jvms 内存转储还为您提供线程转储)

  1. < p>您可以使用以下选项在 JVM 命令行上激活自动生成:
    -XX:+HeapDumpOnOutOfMemoryError

  2. Eclipse Memory Analyzer 项目有一个非常强大的功能,称为“按值分组”,它使得构建对象查询并按字段值重新分组实例成为可能。 如果您有很多实例包含较小的可能值集,并且您可以查看哪些值使用最多,则这非常有用。 这确实帮助我理解了一些复杂的内存转储,所以我建议您尝试一下。

There is nothing wrong in profiling production apps. If you work on distributed applications, there are times when a outofmemory exception occurs in a very unique probability scenario which is very difficult to reproduce in a dev/stage/uat environment.

You can try using custom profilers but if you are in a hurry and plugging in/ setting upa profiler on a production box will take time, you can also use the jvm to take a memory dump(jvms memory dump also gives you thread dump)

  1. You can activate the automatic generation on the JVM command line, by using the following option :
    -XX:+HeapDumpOnOutOfMemoryError

  2. he Eclipse Memory Analyzer project has a very powerful feature called “group by value”, which makes it possible to build an object query and regroup the instances by a field value. This is useful in the case where you have a lot of instances that are containing a smaller set of possible values, and you can to see which values are being used the most. This has really helped me understand some complex memory dumps so I recommend you try it out.

罪歌 2024-08-05 03:22:15

您还可以考虑使用现代 HotSpot JVM 之一 - Java Flight Recorder 和 Java 任务控制。 它是一套工具,可以让你以大约5%的CPU开销收集低级运行时信息(无论如何我无法证明最后一个说法,这是介绍该功能和现场演示的Oracle工程师的说法)。

只要您的应用程序运行 1_7u40 JVM 或更高版本,您就可以使用此工具。 要启用运行时信息收集,您需要使用特定标志启动 JVM:

默认情况下,JFR 在 JVM 中处于禁用状态。 要启用 JFR,您必须使用 -XX:+FlightRecorder 选项启动 Java 应用程序。 由于 JFR 是一项商业功能,仅在基于 Java Platform, Standard Edition(Oracle Java SE Advanced 和 Oracle Java SE Suite)的商业包中提供,因此您还必须使用 -XX:+UnlockCommercialFeatures< 来启用商业功能/strong> 选项。


(引用 http://docs.oracle.com /javase/8/docs/technotes/guides/jfr/about.html#sthref7

我添加了这个答案,因为这是在生产中进行分析的可行选项(IMO)。

还有一个 Eclipse 插件 支持JFR和JMC能够以用户友好的方式显示信息。

You may also consider using one of the modern HotSpot JVM - Java Flight Recorder and Java Mission Control. It is a set of tools that allow you to collect low-level runtime information with the CPU overhead about 5% (I cannot prove the last statement anyhow, this is the statement of Oracle engineer who presented the feature and live demo).

You can use this tool as long as your application is running 1_7u40 JVM or higher. To enable the runtime info collection, you need to start JVM with particular flags:

By default, JFR is disabled in the JVM. To enable JFR, you must launch your Java application with the -XX:+FlightRecorder option. Because JFR is a commercial feature, available only in the commercial packages based on Java Platform, Standard Edition (Oracle Java SE Advanced and Oracle Java SE Suite), you also have to enable commercial features using the -XX:+UnlockCommercialFeatures options.

(Quoted http://docs.oracle.com/javase/8/docs/technotes/guides/jfr/about.html#sthref7)

I added this answer as this is viable option for profiling in production IMO.

Also there is an Eclipse plugin that supports JFR and JMC and capable of displaying information user-friendly.

情栀口红 2024-08-05 03:22:15

多年来,这些工具已经有了很大的改进。 如今,大多数有此类需求的人都使用与 Java 的检测 API 挂钩的工具,而不是分析 API。 当然还有更多示例,但是 NewRelicAppDynamics。 基于仪器的解决方案通常作为 JVM 中的代理运行并不断收集数据。 它们在比旧的分析方法更高的级别(业务事务、Web 事务、数据库事务)报告数据,并允许您在必要时进行更深入的挖掘(深入到方法或行)。 您甚至可以设置监控和警报,以便根据 SLA 跟踪页面加载时间和性能等指标/发出警报。 有了这些出色的工具,您真的应该没有理由再在生产中运行分析器了。 运行它们的成本可以忽略不计。

The tools have improved vastly over the years. These days, most people who have needs like these use a tool that hooks into Java's instrumentation API instead of the profiling API. Surely there are more examples, but NewRelic and AppDynamics come to mind. Instrumentation-based solutions usually run as an agent in the JVM and constantly collect data. They report the data at a higher level (business transaction, web transaction, database transaction) than the old profiling approach and allow you to dig deeper (down to the method or line) if necessary. You can even setup monitoring and alerts, so you can track/alert on metrics like page load times and performance against SLAs. With these great tools, you really should have no reason to run a profiler in production any longer. The cost of running them is negligible.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文