关于（类似工具）LoadRunner 的概念问题

发布于 2024-10-14 09:26:06 字数 1336 浏览 7 评论 0原文

我正在使用 LoadRunner 对 J2EE 应用程序进行压力测试。

我有：1 个 MySQL DB 服务器和 1 个 JBoss 应用服务器。每个都是 16 核 (1.8GHz) / 8GB RAM 盒。

连接池：数据库服务器在 my.cnf 中使用 max_connections = 100。应用服务器也在 mysql-ds.xml 和 中使用 min-pool-size 和 max-pool-size = 100 mysql-ro-ds.xml。

我正在从“常规”单核 PC 模拟 100 个虚拟用户的负载。这是一个 1.8GHz / 1GB RAM 盒子。

该应用程序在 100 Mbps 以太网 LAN 上部署和使用。

我在压力测试脚本的各个部分中使用集合点来模拟现实世界的并行（和非并发）使用。

问题：

这台负载生成 PC 上的 CPU 利用率永远不会达到 100%，而且我相信内存也是可用的。因此，我可以尝试在这台电脑上添加更多虚拟用户。但在此之前，我想了解有关并发/并行性和硬件的 1 或 2 个基础知识：

仅使用像这样的单核负载生成器，我可以真正模拟并行吗？ 100 个用户的负载（每个用户在现实生活中使用专用 PC 进行操作）？我可能不正确的理解是，单核 PC 上的 100 个线程将并发运行（即交错运行），但不是并行运行......这意味着，我无法真正模拟真实世界的负载只需一台单核 PC 即可实现 100 个并行用户（在 100 台 PC 上）！这是正确的吗？
网络带宽对用户并行性的限制：即使假设我有一台 100 核负载生成 PC（或者，假设我的 LAN 上有 100 台单核 PC），以太网的工作方式也不允许仅将负载生成 PC 连接到服务器的以太网线上的用户并发性而非并行性。事实上，即使在实际应用程序使用中（每个用户 1 台 PC），这个问题（缺乏用户并行性）也会持续存在，因为到达多核机器上的应用程序服务器的用户请求只能交错到达。也就是说，多核服务器唯一可以并行处理用户请求的情况是每个用户与服务器之间都有自己的专用物理层连接！！
假设并行性无法实现（由于上述“问题”），并且只有称为并发的下一个最好的事情是可能的，我将如何选择硬件和网络规范来使用我的模拟。例如，(a) 我的负载生成 PC 应该有多强大？ (b) 每台 PC 创建多少个虚拟用户？ (c) LAN 上的每台 PC 是否必须通过交换机连接到服务器（以避免）如果使用集线器而不是交换机会出现广播流量？

提前致谢，

/HS

原文

I'm using LoadRunner to stress-test a J2EE application.

I have got: 1 MySQL DB server, and 1 JBoss App server. Each is a 16-core (1.8GHz) / 8GB RAM box.

Connection Pooling: The DB server is using max_connections = 100 in my.cnf. The App Server too is using min-pool-size and max-pool-size = 100 in mysql-ds.xml and mysql-ro-ds.xml.

I'm simulating a load of 100 virtual users from a 'regular', single-core PC. This is a 1.8GHz / 1GB RAM box.

The application is deployed and being used on a 100 Mbps ethernet LAN.

I'm using rendezvous points in sections of my stress-testing script to simulate real-world parallel (and not concurrent) use.

Question:

The CPU utilization on this load-generating PC never reaches 100% and memory too, I believe, is available. So, I could try adding more virtual users on this PC. But before I do that, I would like to know 1 or 2 fundamentals about concurrency/parallelism and hardware:

With only a single-core load generator as this one, can I really simulate a parallel load of 100 users (with each user using operating from a dedicated PC in real-life)? My possibly incorrect understanding is that, 100 threads on a single-core PC will run concurrently (interleaved, that is) but not parallely... Which means, I cannot really simulate a real-world load of 100 parallel users (on 100 PCs) from just one, single-core PC! Is that correct?
Network bandwidth limitations on user parallelism: Even assuming I had a 100-core load-generating PC (or alternatively, let's say I had 100, single-core PCs sitting on my LAN), won't the way ethernet works permit only concurrency and not parallelism of users on the ethernet wire connecting the load-generating PC to the server. In fact, it seems, this issue (of absence of user parallelism) will persist even in a real-world application usage (with 1 PC per user) since the user requests reaching the app server on a multi-core box can only arrive interleaved. That is, the only time the multi-core server could process user requests in parallel would be if each user had her own, dedicated physical layer connection between it and the server!!
Assuming parallelism is not achievable (due to the above 'issues') and only the next best thing called concurrency is possible, how would I go about selecting the hardware and network specification to use my simulation. For example, (a) How powerful my load-generating PCs should be? (b) How many virtual users to create per each of these PCs? (c) Does each PC on the LAN have to be connected via a switch to the server (to avoid) broadcast traffic which would occur if a hub were to be used in instead of a switch?

Thanks in advance,

/HS

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

溺ぐ爱和你が 2024-10-21 09:26:06

您不仅使用以太网，假设您正在编写 Web 服务，您还通过 HTTP(S) 进行通信，HTTP(S) 位于 TCP 套接字之上，这是一种可靠、有序的协议，具有可靠协议固有的内置往返行程。套接字位于 IP 之上，如果您的 IP 数据包与以太网帧不相符，您将永远无法充分利用您的网络。即使您使用 UDP，调整数据报以适应以太网帧，服务器上有 100 个负载生成器和 100 个 1Gbit 以太网卡，它们仍然会在中断上运行，并且您还有时间多路复用堆栈。

这里的每个级别都可以从交易的角度来考虑，但同时考虑每个级别是没有意义的。如果您正在编写一个在 OSI 模型的第 7 层运行的 SOAP 应用程序a>，那么这就是您的域。就您而言，您的事务是 SOAP HTTP(S) 请求，它们是并行的，并且需要不同的时间才能完成。

现在，真正开始回答您的问题：这取决于您的测试脚本、它们使用的内存量，甚至您的应用程序的响应速度。 200 个或更多虚拟用户应该没问题，但找到瓶颈需要科学探究。做实验，找到它们，扩大它们，重复直到你满意为止。从负载生成器和被测系统收集系统指标，并与操作系统提供商的建议进行比较，查看垂死系统和工作系统之间的差异，寻找达到稳定状态的图表等。

回复收藏 0 原文

最单纯的乌龟 2024-10-21 09:26:06

我觉得你好像有点想多了。您的服务器速度快且新，非常适合处理大量客户端。您的瓶颈（如果有的话）要么是您的应用程序本身，要么是您的 100m 网络。

1./2.您正在测试服务器，而不是客户端。在这种情况下，客户端所做的只是发送和接收数据——客户端处理没有任何开销（渲染 HTML、解码图像、执行 javascript 以及其他任何可能的事情）。最近的单核机器很容易使千兆位链路饱和； 100 Mbit 的管道应该是小菜一碟。

另外 - 较新/更高级的以太网卡中的处理器从 CPU 上卸载了大量工作，因此您不必期望 CPU 命中。

3.不要使用集线器。您可以在 craigslist 上花 5 美元购买 100m 集线器，这是有原因的。

回复收藏 0 原文

怀念你的温柔 2024-10-21 09:26:06

如果没有更好地了解您的应用程序，很难回答其中的一些问题，但一般来说，您是正确的，要实现服务器的“真正”压力测试，理想的情况是拥有 100 个核心（使用 100 个并发的目标）用户），即 100 台 PC。然而，各种问题可能表明这是理所当然的。

我有一个几年前构建的通信引擎（.NET / C#），它使用异步套接字 - 需要尽可能最快的速度，因此我们不得不忘记在套接字之上添加任何附加层，例如 HTTP 或任何其他更高的抽象。该服务器在具有 4GB RAM 的四核 3.0GHz 计算机上运行，可轻松处理约 2,200 个并发连接的流量。有一个 Gb 交换机，所有 PC 都有 Gb NIC。即使所有 PC 同时通信，也很少会看到处理器负载 > 30% 在该服务器上。我认为这是因为“整个系统”固有的所有延迟。

我们目前正在实施一个支持 50,000 个并发用户的新要求。该服务器具有双四核 2.8GHz 处理器、64 位操作系统和 12GB RAM。我们的模型显示这台计算机足以处理 5 万用户。

我提到的网络延迟等问题（不要忘记 CAT 3 与 CAT 5 与 CAT 6 问题）、数据库连接、存储的数据类型和平均记录大小、参考问题、背板和总线速度、硬盘驱动器速度和大小等等在“总体”减慢平台速度方面发挥着同样重要的作用。我的猜测是，您的系统可能有 500、750、1,000 甚至更多用户。

过去的目标是永远不要让线程阻塞太久……新的目标是让所有核心保持忙碌。

我有另一个应用程序，每天下载并分析约 7,800 个 URL 的内容。在双四核 3.0GHz（Windows Ultimate 7 64 位版本）和 24GB RAM 上运行，该过程过去大约需要 28 分钟才能完成。通过简单地将循环切换到 Parallel.ForEach() ，整个过程现在需要 < 5分钟。我们看到的处理器负载始终低于 20%，最大网络负载仅为 14%（通过标准 Gb 哑集线器和 T-1 线路在 Gb NIC 上实现 CAT 5）。

保持所有内核忙碌会产生巨大的差异，对于花费大量时间等待 IO 的应用程序尤其如此。

Without having a better understanding of your application it's tough to answer some of this, but generally speaking you are correct that to achieve a "true" stress test of your server it would be ideal to have 100 cores (using a target of a 100 concurrent users), i.e. 100 PC's. Various issues, though, will probably show this as a no-brainer.

I have a communication engine I built a couple of years back (.NET / C#) that uses asyncrhonous sockets - needed the fastest speeds possible so we had to forget adding any additional layers on top of the socket like HTTP or any other higher abstractions. Running on a quad core 3.0GHz computer with 4GB of RAM that server easily handles the traffic of ~2,200 concurrent connections. There's a Gb switch and all the PC's have Gb NIC's. Even with all PC's communicating at the same time it's rare to see processor loads > 30% on that server. I assume this is because of all the latency that is inherent in the "total system."

We have a new requirement to support 50,000 concurrent users that I'm currently implementing. The server has dual quad core 2.8GHz processors, a 64-bit OS, and 12GB of RAM. Our modeling shows this computer is more than enough to handle the 50K users.

Issues like the network latency I mentioned (don't forget CAT 3 vs. CAT 5 vs. CAT 6 issue), database connections, types of data being stored and mean record sizes, referential issues, backplane and bus speeds, hard drive speeds and size, etc., etc., etc. play as much a role as anything in slowing down a platform "in total." My guess would be that you could have 500, 750, a 1,000, or even more users to your system.

The goal in the past was to never leave a thread blocked for too long ... the new goal is to keep all the cores busy.

I have another application that downloads and analyzes the content of ~7,800 URL's daily. Running on a dual quad core 3.0GHz (Windows Ultimate 7 64-bit edition) with 24GB of RAM that process used to take ~28 minutes to complete. By simply swiching the loop to a Parallel.ForEach() the entire process now take < 5 minutes. My processor load that we've seen is always less than 20% and maximum network loading of only 14% (CAT 5 on a Gb NIC through a standard Gb dumb hub and a T-1 line).

Keeping all the cores busy makes a huge difference, especially true on applications that spend allot of time waiting on IO.

回复收藏 0 原文