如何使用jmeter和visualVM进行负载测试?
我想对我的网站的 1000 万用户进行负载测试。该网站是一个基于 Java 的网络应用程序。我的方法是为所有链接创建一个 Jmeter 测试计划,然后为 1000 万用户生成报告。然后使用jvisualVM进行分析并检查是否存在任何瓶颈。
有没有更好的方法来做到这一点?有现成的演示可以做到这一点吗?我是第一次这样做,所以任何帮助都会非常有帮助。
I want to do load testing for 10 million users for my site. The site is a Java based web-app. My approach is to create a Jmeter test plan for all the links and then take a report for the 10 million users. Then use jvisualVM to do profiling and check if there are any bottlenecks.
Is there any better way to do this? Is there any existing demo for doing this? I am doing this for the first time, so any assistance will be very helpful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您走在正确的道路上,但您的负载限制因素很大。
我之所以这么说是因为您的站点可能需要更多的机器来处理 10Milj 并发用户。一个进程可能很难处理并发的 32K TCP 流。还要计算一下实际处理 10Milj 用户所需的带宽。
现在我不知道您想在您的站点上提供什么样的服务,但是当想到 JVisualVM 会使处理速度减慢 10 倍(或者对于方法跟踪而言更多)时,您实际上不会测量“现实世界”,如果您让 JMeter 和 JVisualVM 同时工作。
当您在较低负载下运行时,JVisualVM 更有用。
要创建良好的测量结果,首先请确保您有一个良好的基线。
用 10 个并发用户进行测试,连接 JVisuamVM 并让它运行一段时间,而不是关闭所有有趣的值。
有了基线后,您就可以开始添加更多负载。
添加10倍负载(ea:100个用户),看看JVisualVM的变化。继续此操作,直到 JVisualVM 明显拖慢您的速度为止,每次添加额外负载时,请确保您已写下您感兴趣的数字。将这些数字绘制在图表中。
现在...根据您想要的用户数量对图表进行插值(手动)。这适用于内存使用、磁盘访问等,但不适用于使用的 CPU 时间,因为 JVisualVM 会占用 CPU 并为您提供无效的数字(特别是如果您打开了方法跟踪)。
如果你真的想要高达 10Milj 用户,我也不会信任 JMeter,我会自己编写一个小测试程序来执行你想要的测试。这没关系,因为设置站点来处理 10Milj 也需要时间,因此在测试工具上花费一点额外的时间并不是浪费。
You are on the correct path, but your load limit is of with a high factor.
Why I'm saying this is cause your site probably will need more machine to handle 10Milj Concurrent users. A process alone would probably struggle to handle concurrent 32K TCP-streams. Also do some math of the bandwidth it would take to actually handle 10Milj users.
Now I do not know what kind of service you thinking of providing on your site, but when thinking of that JVisualVM slows down processing by a factor 10 (or more for method tracing), you would not actually measure the "real world" if you got JMeter and JVisualVM to work at the same time.
JVisualVM is more useful when you run on lower loads.
To create a good measurement first make sure your have a good baseline.
Make a test with 10 concurrent users, connect up JVisuamVM and let it run for a while, not down all interesting values.
After you have your baseline, then you can start adding more load.
Add 10times the load (ea: 100 users), look at the changes in JVisualVM. Continue this until it becomes obvious that JVisualVM slows you down, for every time to add extra load, make sure you have written down the numbers your are interested in. Plot down the numbers in a graph.
Now... Interpolate the graph (by hand) for the number of users you want. This works for memory usage, disc access etc, but not for used CPU time, cause JVisualVM will eat CPU and give you invalid numbers on that (especially if you have method tracing turned on).
If you really want to go as high as 10Milj users, I would not trust JMeter either, I would write a little test program of my own that performs the test you want. This would be okey, since the the setting up the site to handle 10Milj will also take time, so spending a little extra time of the test tools are not a waste.
仅仅因为数据库中有 1000 万个用户,并不意味着您需要使用那么多用户进行负载测试。想一想 - 您的网站真的会同时拥有 1000 万用户吗?对于 Web 应用程序,1:100 的注册用户比例很常见,即任何时候都不可能拥有超过 10 万用户。
JMeter 可以处理这种负载吗?我对此表示怀疑。请尝试使用faban。它非常轻量级,可以在单个虚拟机上支持数千个用户。您在创建工作负载时还具有更好的灵活性,并且还可以自动监控整个测试基础架构。
现在进入分析部分。你没有说你用的是什么服务器。任何 Java 应用程序服务器都会提供足够的监控支持。商业服务器提供了很好的 GUI 工具,而 Tomcat 通过 JMX 提供了广泛的监控。在深入了解 JVM 级别之前,您可能需要从这里开始。
对于 JVM,您确实不想在运行如此大的性能测试时使用 VisualVM。除了支持这样的负载之外,我假设您正在使用多个应用程序服务器/JVM 实例。主要的性能问题通常是 GC,因此使用 JVM 选项来收集和记录 GC 信息。您将必须对数据进行后处理。
这是一个不平凡的练习 - 祝你好运!
Just because you have 10 million users in the database, doesn't mean that you need to load test using that many users. Think about it - is your site really going to have 10 million simultaneous users? For web applications, a ratio of 1:100 registered users is common i.e. you are unlikely to have more than 100K users at any moment.
Can JMeter handle that kind of load? I doubt it. Please try faban instead. It is very light-weight and can support thousands of users on a single VM. You also have much better flexibility in creating your workload and can also automate monitoring of your entire test infrastructure.
Now to the analysis part. You didn't say what server you were using. Any Java appserver will provide sufficient monitoring support. Commercial servers provide nice GUI tools while Tomcat provides extensive monitoring via JMX. You may want to start here before getting down to the JVM level.
For the JVM, you really don't want to use VisualVM while running such a large performance test. Besides to support such a load, I assume you are using multiple appserver/JVM instances. The major performance issue is usually GC, so use the JVM options to collect and log GC information. You will have to post-process the data.
This is a non-trivial exercise - good luck!
有两种类型的负载测试 - 瓶颈识别和吞吐量。这个问题让我相信这是关于瓶颈的,所以用户数量是一个转移注意力的东西,而不是目标是给定的配置找到可以改进以增加并发性的区域。
应用程序瓶颈通常分为三类:数据库、内存泄漏或缓慢的算法。找到它们需要将有问题的应用程序长时间置于压力(即负载)下 - 至少一个小时,也许长达几天。 Jmeter 是实现此目的的一个很好的工具。需要考虑的事情之一是在启用 cookie 处理(即 Jmeter 保留 cookie 并随每个后续请求发送)和禁用 cookie 处理的情况下运行相同的测试 - 有时您会得到非常不同的结果,这很重要,因为后者实际上是对某些内容的模拟爬虫对您的网站进行的操作。瓶颈检测的详细信息如下:
数据库
没有索引的表或涉及多个连接的 SQL 语句是常见的应用瓶颈。我使用过的每个数据库服务器(MySQL、SQL Server 和 Oracle)都有某种方法来记录或识别运行缓慢的 SQL 语句。 MySQL 有慢查询日志,而 SQL Server 有动态管理视图来跟踪运行最慢的 SQL。一旦您掌握了慢速语句,请使用解释计划来查看数据库引擎正在尝试执行的操作,使用建议索引的任何功能,并考虑其他策略 - 例如非规范化 - 如果这两个选项不能解决瓶颈。
内存泄漏
打开详细垃圾收集日志记录和 JMX 监视端口。然后使用 jConsole(它提供更好的图表)来观察趋势。特别是,泄漏通常表现为填充 Old Gen 或 Perm Gen 空间。泄漏是一个瓶颈,JVM 会花费越来越多的时间尝试垃圾收集但失败,直到抛出 OOM 错误。
Perm Gen 意味着需要增加空间作为 JVM 的命令行参数。虽然 Old Gen 意味着存在泄漏,但您应该停止负载测试,生成堆转储,然后使用 Eclipse 内存分析工具来识别泄漏。
慢算法
这更难追踪。最常见的问题是同步、进程间通信(例如 RMI、Web 服务)和磁盘 I/O。另一个常见问题是使用嵌套循环的代码(看看妈妈的 O(n^2) 性能!)。
我发现在缺乏更深入知识的情况下找到这些问题的最佳方法是生成堆栈跟踪。这些将告诉所有线程在给定时间点正在做什么。您正在寻找的是被阻止的线程或多个线程都访问相同的代码。这通常表明代码库中存在一些缓慢的情况。
There are two types of load testing - bottleneck identification and throughput. The question leads me to believe this is about bottlenecks, so number of users is a something of a red herring, instead the goal being for a given configuration finding areas that can be improved to increase concurrency.
Application bottlenecks usually fall into three categories: database, memory leak, or slow algorithm. Finding them involves putting the application in question under stress (i.e. load) for an extended period of time - at least an hour, perhaps up to several days. Jmeter is a good tool for this purpose. One of the things to consider is running the same test with cookie handling enabled (i.e. Jmeter retains cookies and sends with each subsequent request) and disabled - sometimes you get very different results and this is important because the latter is effectively a simulation of what some crawlers do to your site. Details for bottleneck detection follow:
Database
Tables without indices or SQL statements involving multiple joins are frequent app bottlenecks. Every database server I've dealt with, MySQL, SQL Server, and Oracle has some way of logging or identifying slow running SQL statements. MySQL has the slow query log, whereas SQL Server has dynamic management views that track the slowest running SQL. Once you've got your hands on the slow statements use explain plan to see what the database engine is trying to do, use any features that suggest indices, and consider other strategies - such as denormalization - if those two options do not solve the bottleneck.
Memory Leak
Turn on verbose garbage collection logging and a JMX monitoring port. Then use jConsole, which provides much better graphs, to observe trends. In particular leaks usually show up as filling the Old Gen or Perm Gen spaces. Leaks are a bottleneck with the JVM spends increasing amounts of time attempting garbage collection unsuccessfully until an OOM Error is thrown.
Perm Gen implies the need to increase the space as a command line parameter to the JVM. While Old Gen implies a leak where you should stop the load test, generate a heap dump, and then use Eclipse Memory Analysis Tool to identify the leak.
Slow Algorithm
This is more difficult to track down. The most frequent offenders are synchronization, inter process communication (e.g. RMI, web services), and disk I/O. Another common issue is code using nested loops (look mom O(n^2) performance!).
Best way I've found to find these issues absent some deeper knowledge is generating stack traces. These will tell what all threads are doing at a given point in time. What you're looking for are BLOCKED threads or several threads all accessing the same code. This usually points at some slowness within the codebase.
我在博客中介绍了我进行性能测试的方式:
详细说明:http://www.daemonthread.com /2011/06/site-performance-tuning-using-jmeter.html
I blogged, the way I proceeded with the performance test:
For detailed explanation: http://www.daemonthread.com/2011/06/site-performance-tuning-using-jmeter.html
我开始使用 JMeter 插件。
这使我能够收集通过 JMX 可用的应用程序指标,以便在负载测试中使用。
I started using JMeter plugins.
This allows me to gather application metrics available over JMX to use in my Load Test.