在没有明确需求的情况下如何进行性能和可扩展性测试?
如果没有定义明确的性能要求,您知道如何进行性能和可扩展性测试吗?
有关我的申请的更多信息。
该应用程序有 3 个组件。 其中一个组件只能在 Linux 上运行,另外两个组件是 Java 程序,因此它们可以在 Linux/Windows/Mac 上运行...这 3 个组件可以部署到一台机器上,也可以将每个组件部署到一台机器上。 部署非常灵活。 仅限 Linux 的组件将通过网络捕获原始 TCP/IP 数据包,然后一个 Java 组件将从中获取这些原始数据,并将它们组装成最终用户需要的数据,并将它们作为数据文件输出到硬盘。 最后一个Java组件会将数据从数据文件批量上传到我的数据库。
Any idea how to do performance and scalability testing if no clear performance requirements have been defined?
More information about my application.
The application has 3 components. One component can only run on Linux, the other two components are Java programs so they can run on Linux/Windows/Mac... The 3 components can be deployed to one box or each component can be deployed to one box. Deployment is very flexible. The Linux-only component will capture raw TCP/IP packages over the network, then one Java component will get those raw data from it and assemble them into the data end users will need and output them to hard disk as data files. The last Java component will upload data from data files to my database in batch.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
在缺乏“必须能够在 Y 秒内执行 X 次迭代...”类型的要求的情况下,这些事情怎么样:
In the absence of 'must be able to perform X iterations within Y seconds...' type requirements, how about these kinds of things:
令人惊讶的是,这就是大多数性能和可扩展性测试的开始方式。
您显然可以在没有标准的情况下进行测试,只需定义测试并测量结果即可。 我认为你的问题更多的是“在没有性能要求的情况下如何建立测试通过标准”。 事实上,这种情况并不少见。 许多新项目没有制定明确的标准。 通俗地说,这可能是“如果它每秒不能完成 X 件事,我们就失败了”。 但是一旦你每秒通过了 X(你最好这样做!),X 就是“通过”标准吗? 通常不会,发生的情况是您建立了一个新的基线,并且您的性能测试防止回归:您将当前的数字与您获得的最佳数字进行比较,并决定新构建是否“可接受”作为构建验证通过(通常组织会解决) 可接受的,开放的性能错误,并确保在发布时恢复到 90-95% 或 100%+,所以基本上性能测试本身成为他们自己的要求。
这里大约 70-80%是 更复杂的是,因为测试的范围应该是找出产品在哪里损坏,并且最终它会在哪里损坏。限制是,并且非常重要的是,找出您的产品如何损坏。它是否给出了良好的错误消息并恢复,或者是否将其内脏洒在地板上?
Surprisingly this is how most perf and scalability tests start.
You can clearly do the testing without criteria, you just define the tests and measure the results. I think your question is more in the lines 'how can I establish test passing criteria without performance requirements'. Actually this is not at all uncommon. Many new projects have no clear criteria established. Informally it would be something like 'if it cannot do X per second we failed'. But once you passed X per second (and you better do!) is X the 'pass' criteria? Usually not, what happens is that you establish a new baseline and your performance tests guard against regression: you compare your current numbers with the best you got, and decide if the new build is 'acceptable' as build validation pass (usually orgs will settle here at something like 70-80% as acceptable, open perf bugs, and make sure that by ship time you get back to 90-95% or 100%+. So basically the performance test themselves become their own requirement.
Scalability is a bit more complicated, because there there is no limit. The scope of your test should be to find out where does the product break. Throw enough load at anything and eventually it will break. You need to know where that limit is and, very importantly, find out how does your product break. Does it give a nice error message and revert or does it spills its guts on the floor?
定义你自己的。 采取主动并亲自描述绩效目标。
为了更好地回答,我们必须更多地了解您的项目。
Define your own. Take the initiative and describe the performance goals yourself.
To answer any better, we'd have to know more about your project.
如果“没有定义性能要求”,那么为什么要测试这个呢?
如果定义了性能要求,但它很“模糊”,您能否指出它在哪些方面是模糊的,以便我们更好地帮助您?
除此之外,从“模糊”要求开始,选择一个至少在您看来满足或超过模糊要求的合理目标,然后返回给客户并让他们确认您的澄清满足他们的要求,并理想地得到正式签字。
If there has been 'no performance requirement defined', then why are you even testing this?
If there is a performance requirement defined, but it is 'vague', can you indicate in what way it is vague, so that we can better help you?
Short of that, start from the 'vague' requirement, and pick a reasonable target that at least in your opinion meets or exceeds the vague requirement, then go back to the customer and get them to confirm that your clarification meets their requirements and ideally get formal sign-off on that.
一些定义/假设:
性能 = 应用程序响应用户输入的速度,例如网页加载时间
可扩展性 = 应用程序可以处理多少峰值并发用户。
首先是性能。 性能测试可以非常简单,例如在开发环境中测量和记录页面加载时间以及使用应用程序分析等技术来识别和修复瓶颈。
加载。 要执行负载测试,有四个关键因素,您需要将所有这些因素都落实到位才能成功。
1. 用户如何使用您的网站和/或应用程序的良好使用模型。如果应用程序已经在使用,这可能很容易,但如果您要启动新的东西(例如 Facebook 应用程序),这可能会非常困难。
如果您无法按照要求获得目标,请进行一些研究并做出一些有根据的假设,记录并分发以获取反馈。
2. 工具。您需要有性能测试脚本和工具,可以执行步骤 1 中定义的场景以及步骤 1 中的预期用户数量。(这可能非常昂贵)
3. 环境。您将需要一个类似生产环境的隔离环境,以便您的测试可以产生可重复的结果。 (这也可能非常昂贵。)
4。 技术专家。一旦应用程序和环境开始出现故障,您将需要能够识别故障并重新配置环境,或者在发现故障后重新编码应用程序。
一般来说,大多数项目都有一个“性能测试”框,由于过去的一些失败,他们需要勾选该框,但是他们从来没有计划或预算来正确地执行它。 我通常建议适当地制定预算并进行可扩展性测试,或者为了省钱而根本不这样做。 试图以便宜的方式做一半是浪费时间。
然而,任何优秀的开发人员都应该能够在本地计算机上进行性能测试并获得一些良好的好处。
Some definitions / assumptions:
Performance = how quickly the application responds to user input, e.g. web page load times
Scalability = how many peak concurrent users the applicaiton can handle.
Firstly perfomance. Performance testing can be quite simple, such as measuring and recording page load times in a development environment and using techniques like applicaiton profiling to identify and fix bottlenecks.
Load. To execute a load test there are four key factors, you will need to get all of these in place to be successfull.
1. Good usage models of how users will use your site and/or application. This can be easy of the application is already in use, but it can be extermely difficult if you are launching a something new, e.g. a Facebook application.
If you can't get targets as requirements, do some research and make some educated assumptions, document and circulate them for feedback.
2. Tools. You need to have performance testing scripts and tools that can excute the scenarios defined in step 1, with the number of expected users in step 1. (This can be quite expensive)
3. Environment. You will need a production like environment that is isolated so your tests can produce repoducible results. (This can also be very expensive.)
4. Technical experts. Once the applicaiton and environment starts breaking you will need to be able to identify the faults and re-configure the environment and or re-code the application once faults are found.
Generally most projects have a "performance testing" box that they need to tick because of some past failure, however they never plan or budget to do it properley. I normally recommend to do budget for and do scalability testing properley or save your money and don't do it at all. Trying to half do it on the cheap is a waste of time.
However any good developer should be able to do performance testing on their local machine and get some good benefits.
依赖工具(我想到了fxcop)
依靠常识
rely on tools (fxcop comes to mind)
rely on common sense
如果您想在没有任何要求的情况下测试性能和可扩展性,那么您应该创建自己的要求/规范,可以在给定的时间表/截止日期内完成。 确定上述要求后,如果您的主管同意,您应该告诉他/她。
要测试可扩展性(假设您正在测试程序/网站):
创建大量用户和数据,并检查您的系统和数据库是否可以处理它们。 MySQL 中的 MyISAM 表类型可以完成这项工作。
测试性能:
优化代码,在网速较慢的情况下检查等。
If you want to test performance and scalability with no requirements then you should create your own requirements / specs that can be done in the timeline / deadline given to you. After defining the said requirements, you should then tell your supervisor about it if he/she agrees.
To test scalability (assuming you're testing a program/website):
Create lots of users and data and check if your system and database can handle it. MyISAM table type in MySQL can get the job done.
To test performance:
Optimize codes, check it in a slow internet connection, etc.
简短的回答:不要这样做!
为了获得(更好的)定义,编写性能测试概念,您可以与应该定义需求的专家讨论。
对您不知道的一切做出假设并明确记录这些假设。 假设包括与系统在负载下的行为可能相关的所有内容。 正确的假设会得到专家的认可,错误的假设会引起反应。
对于所有读过 Tom DeMarcos 最新书(Adrenaline Junkies ...)的人:这就是稻草人模式。 大多数不愿意从头开始编写某些规范的人会毫不犹豫地向您的文档提供反馈。 因为在编写版本时您需要多次猜测,所以您需要做好在审阅时被嘲笑的准备。 但至少你会获得更好的信息。
Short answer: Don't do it!
In order to get a (better) definition write a performance test concept you can discuss with the experts that should define the requirements.
Make assumptions for everything you don't know and document these assumptions explicitly. Assumptions comprise everything that may be relevant to your system's behaviour under load. Correct assumptions will be approved by the experts, incorrect ones will provoke reactions.
For all of those who have read Tom DeMarcos latest book (Adrenaline Junkies ...): This is the strawman pattern. Most people who are not willing to write some specification from scratch will not hesitate to give feedback to your document. Because you need to guess several times when writing your version you need to prepare for being laughed at when being reviewed. But at least you will have better information.
我通常处理此类问题的方式只是获得真实或模拟的实际工作负载,并使程序在合理范围内尽可能快地运行。 然后,如果它无法处理负载,我需要考虑更快的硬件、并行执行部分工作等。
性能调整分为两个部分。
第 1 部分是同步部分,我在实际工作负载下调整每个“线程”,直到它 确实没有什么改进的空间。
第 2 部分是异步部分,这是一项艰苦的工作,但需要完成。 对于每个“线程”,我提取一个带时间戳的日志文件,其中记录了每条消息的发送时间、每条消息的接收时间以及每条收到的消息的执行时间。 我将这些日志合并到一个共同的事件时间表中。 然后我会浏览所有内容或随机选择的部分,并跟踪进程之间的消息流。 我想确定每个消息序列的目的是什么(即是否确实有必要),以及接收时间和处理时间之间是否存在延迟,如果有,原因是什么。
我发现通过这种方式我可以“减脂”,并且异步进程可以运行得非常快。
然后,如果它们不满足要求,无论它们是什么,软件都不会做得更好。 它将需要硬件或根本性的重新设计。
The way I usually approach problems like this is just to get a real or simulated realistic workload and make the program go as fast as possible, within reason. Then if it can't handle the load I need to think about faster hardware, doing parts of the job in parallel, etc.
The performance tuning is in two parts.
Part 1 is the synchronous part, where I tune each "thread", under realistic workload, until it really has little room for improvement.
Part 2 is the asynchronous part, and it is hard work, but needs to be done. For each "thread" I extract a time-stamped log file of when each message sent, each message received, and when each received message is acted upon. I merge these logs into a common timeline of events. Then I go through all of it, or randomly selected parts, and trace the flow of messages between processes. I want to identify, for each message-sequence, what its purpose is (i.e. is it truly necessary), and are there delays between the time of receipt and time of processing, and if so, why.
I've found in this way I can "cut out the fat", and asynchronous processes can run very quickly.
Then if they don't meet requirements, whatever they are, it's not like the software can do any better. It will either take hardware or a fundamental redesign.
尽管没有定义明确的性能和可扩展性目标,但我们可以使用您提到的三个组件的高级描述来推动一般性能/可扩展性目标。
组件 1:它看起来像是一个网络 I/O 绑定组件,因此您可以使用任何可用的网络负载模拟器来生成各种工作负载以使链接饱和。 可扩展性可以通过改变工作负载(10MB、100MB、1000MB 链路)并测量响应时间(或更精确的方式是与接收原始数据相关的延迟)来测量。 您还可以测量链接框的工作集,以了解您的服务器要求(需要多少额外内存来接收 X 个数据包工作负载等)
组件 2:该组件有 2 个部分, I/O 绑定部分(从组件 1 接收数据)和 CPU 绑定部分(组装数据包),您可以从整体上看待问题,当您想要测量 CPU 绑定部分时,请确保使您的链路饱和,如果是多线程组件,如果没有获得 100% CPU 利用率,则可以寻找改进外观的方法,并且可以测量组装 X 消息所需的时间,由此可以计算处理一条消息的平均等待时间消息,稍后可以使用它来驱动系统的总体性能特征,并为您的用户提供 SLA(例如,您将保证 X 毫秒内的响应时间)。
组件 3:完全 I/O 限制,取决于您的硬盘带宽和您使用的后端数据库服务器,但是您可以测量磁盘 I/O 饱和程度以优化吞吐量、多少您需要读取 X MB 数据的 I/O 计数,并围绕这些参数进行改进。
希望有帮助。
谢谢
Although no clear performance and scalability goals are defined, we can use the high level description of the three components you mention to drive general performance/scalability goals.
Component 1: It seems like a network I/O bound component, so you can use any available network load simulators to generate various work load to saturate the link. Scalability can be measure by varying the workload (10MB, 100MB, 1000MB link ), and measuring the response time , or in a more precise way, the delay associated with receiving the raw data. You can also measure the working set of the links box to drive a realistic idea about your sever requirement ( how much extra memory needed to receive X more workload of packets, ..etc )
Component 2: This component has 2 parts, an I/O bound part ( receiving data from Component 1 ), and a CPU bound part ( assembling the packets ), you can look at the problem as a whole, make sure to saturate your link when you want to measure the CPU bound part, if is is a multi threaded component, you can look for ways to improve look if you don't get 100% CPU utilization, and you can measure time required to assembly X messages, from this you can calculate average wait time to process a message, this can be used later to drive the general performance characteristic of your system and provide and SLA for your users ( you are going to guarantee a response time within X millisecond for example ).
Component 3: Completely I/O bound, and depends on both your hard disk bandwidth, and the back-end database server you use, however you can measure how much do you saturate disk I/O to optimize throughput, how much I/O counts do you require to read X MB of data, and improve around these parameters.
Hope that helps.
Thanks