生产中 .NET 应用程序的持续性能监控?

发布于 2024-09-14 05:41:01 字数 854 浏览 13 评论 0原文

给定 SOA 环境中相对典型的 .NET 4 系统(即 Windows Server 2008 R2、IIS 7 上的 RESTful Web 服务、NServiceBus 消息传递的 Windows 服务、SQL Server 2008 R2 等),最佳实践或事实上的解决方案是什么(没有企业价格标签)用于在生产中执行 24x7 性能监控?

不一定消耗多少 CPU/内存/磁盘 IO,而是例如每分钟进行了多少次 createAccount() 调用,generateResponse() 方法花费的平均时间是多少,并检测例如generateResponseStarted 和generateResponseComplete 之间的异常增量峰值(方法被调用(进而可以调用第 3 方)并且响应已准备好分别返回)。

经过一番谷歌搜索后,似乎可以选择低级别分析器(如 dotTrace)和实现性能计数器并使用 PerfMon 或其他 OpManager 类型产品来使用这些计数器。

你会推荐什么?为实时应用程序实施性能计数器是否会显着降低生产系统的性能?如果没有,是否有任何好的库可以简化 .NET 中的实现?如果是,除了内存-磁盘-CPU 之外,人们如何监控应用程序的性能?


@Ryan Hayes

谢谢,我正在寻找一种方法来查看生产系统上异常的减速或峰值。例如,在压力测试期间一切都很好,但由于某种原因,我们依赖的第 3 方出现了一些问题,或者 DB 由于线程锁定而变慢,或者 SAN 让位,或者任何其他意外情况。低级分析的开销太大,而仅在出现问题时才打开计数器为时已晚。另外,我们将缺少历史数据来进行比较(当增量超出可接受的阈值时,我需要某种警报系统)。我想知道人们如何监控其生产系统的性能,以及根据他们的经验,非内存/CPU/服务器相关类型的监控的最佳方法是什么。

Given a relatively typical .NET 4 system in an SOA environment (i.e. Windows Server 2008 R2, RESTful Web Services on IIS 7, Windows Services for NServiceBus messaging, SQL Server 2008 R2, etc) what are the best practices or de facto solutions (without enterprise price tag) for performing 24x7 performance monitoring in production?

Not necessarily how much CPU/Memory/Disk IO it consumes but rather for example how many createAccount() calls per minute were made, what is the average time generateResponse() method takes and detect unusual delta spikes between for example generateResponseStarted and generateResponseComplete (method was invoked (which in turn can call 3rd party) and response is ready to be returned respectively).

After some googling it seems options are for low level profilers (like dotTrace) and implementing Performance Counters and consuming those with PerfMon or some other OpManager type product.

What would you recommend? Would implementing performance counters for a live application significantly degrade performance on production system? If not, are there any good libraries that streamline the implementation in .NET? If yes, how do people monitor their applications' performance other than memory-disk-cpu?


@Ryan Hayes

Thanks, I'm looking for a way to see an unusual slowing down or spikes on production systems. For example all was good during stress testing but for some reason 3rd party we rely on is having some problems or DB is slowing down due to thread locking, or SAN is giving way, or any other unexpected scenarios. Low level profiling is too much of an overhead while turning counters on only when there is a problem is too late at that point. Plus we'll be missing historical data to compare it to (I would need some sort of alert system for when delta is outside of an acceptable threshold). I'm wondering how people monitor performance of their production systems and in their experience what would be the best approach for non memory/cpu/server related kind of monitoring.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

铜锣湾横着走 2024-09-21 05:41:01

您可以尝试AlertGrid。看起来这可以解决您的问题。

您可以从应用程序向 AlertGrid 发送各种参数(新帐户名称、执行某些重要逻辑的时间等)。 AlertGrid 服务可以对您的数据执行多种操作。首先,它可以处理一些使用您发送的参数构建的通知规则(例如,如果做某件事的时间> X秒->向负责人发送短信)。

两周后,AlertGrid 将推出一系列新功能。看起来对您来说最重要的是绘制从系统接收到的参数的可能性。

请注意,AlertGrid 无法检测您系统中的参数 - 您需要发送它们。这可能看起来像是一项额外的工作,但我们认为它与安装和配置一些专用工具所需的时间相当。另一方面,由于这种方法,AlertGrid 克服了一些限制(它可以与任何可以发送 http 请求的东西集成)。

相信当您在AlertGrid中创建帐户并通过其交互式教程时,会更容易理解。

您可能已经注意到,我是 AlertGrid 团队的开发人员:)

免责声明:在撰写本文时,我们知道AlertGrid 的价格将在不久的将来降低,所以现在不要看它们,您可以联系我们的支持热线以获取有关定价的更多信息。免费帐户是可用的,对于开始来说应该足够了。

You can try AlertGrid. Looks like this can be a solution for your problems.

You can send various parameters to AlertGrid from your application (new account name, time of executing some important piece of logic and so on). AlertGrid service can do couple of things with your data. First of all it can process some notification rules built with parameters you've sent (like if time of doing something important > X seconds -> send sms to person in charge).

In a two weeks AlertGrid is going to have a bunch of new features. Looks like the most important for you will be the possiblity to plot parameters received from your system.

Please note that AlertGrid cannot detect parameters from your systems - you need to send them instead. This might looks like an additional piece of work, but we think it is comparable to time required for installing and configuring some specialized tools. On the other hand thanks to this approach AlertGrid overcomes some limitations (it can be integrated with anything that can send http requests).

I believe it will be much easier to understand when you create account in AlertGrid and pass its interactive tutorial.

As you might have noticed I'm a developer in AlertGrid team:)

Disclaimer: At the momment of writing we know that prices of AlertGrid are going to be reduced in a near future, so don't look at them right now, you can contact our support line for more information on pricing. Free account is available and should be enough for the begining.

寂寞笑我太脆弱 2024-09-21 05:41:01

这里的问题实际上是您想从性能监控中学到什么?

  • 您想让您的代码更快吗?那么我建议您在测试环境中使用分析工具来找出可以改进代码的地方。

  • 您想了解系统可以处理的最大跳动吗?那么我建议在测试环境中执行负载测试。如果您确切地知道可以在不破坏系统的情况下对系统施加多大的压力,那么您就不需要将监控投入到生产中。

对于生产,您可能希望最大限度地提高性能。为此,通常会大力推动测试环境并获得可靠的指标,这样您就不需要在生产中放置性能监视器。对于生产,您只想知道何时达到峰值,然后优雅地降级或以任何您认为合适的方式降级。一般来说,良好的日志记录是监视系统(除硬件之外)性能并记录异常性能问题的最佳方式。

但每个系统都不同,您的里程可能会有所不同。将此作为建议,而不是每个人都这样做的方式,因为总有例外情况,您可能必须在生产中运行分析。

The question here is really what are you trying to learn from the performance monitoring?

  • Do you want to make your code faster? Then I would suggest using the profiling tools on a test environment to find out where you can improve your code.

  • Do you want to find out the maximum beating your system can handle? Then I would suggest performing load testing on a test environment. If you know exactly how hard you can push your system without destroying it, then you won't need to put monitoring into production.

For production, you probably want to maximize performance. To do this, it's common to push a test environment hard and get solid metrics so that you don't need to put performance monitors in place in production. For production, you just want to be able to know when you hit that peak and then degrade gracefully or whatever you see fit. Generally, good logging is the best way to monitor system (besides hardware) performance and keep a record of exceptional performance quirks.

Every system is different though, and your mileage may vary. Take this as a suggestion rather than the way EVERYONE does it, because there are always exceptional cases where you may have to have profiling running in production.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文