Asp.net 应用程序速度缓慢,但 CPU 最大利用率为 40%

发布于 2024-10-03 13:32:20 字数 2554 浏览 4 评论 0原文

我在生产服务器上遇到了奇怪的情况。 asp.net 的连接已排队,但 CPU 利用率仅为 40%。此外,数据库在 CPU 利用率为 30% 时运行良好。

根据评论中的要求提供更多历史记录:

  • 在高峰时段,该网站每小时约有 20,000 名访问者。
  • 该网站是一个带有大量 AJAX/POST 的 asp.net Web 表单应用程序
  • 该网站使用大量用户生成的内容
  • 我们使用测试页面来测量该网站的性能,该测试页面确实会访问该网站使用的数据库和 Web 服务。正常加载时,该页面会在一秒钟内得到服务。当请求花费超过 4 秒时,将应用程序定义为慢速。
  • 从测量中我们可以看到连接时间很快,但处理时间很长。
  • 我们无法精确定位单个请求的缓慢响应,该网站在正常时间运行良好,但在高峰时间变慢
  • 我们遇到了一个问题,即该网站受 CPU 限制(又称以 100% 运行),我们修复了该问题
  • 我们也遇到了问题除了例外情况外,我们还修复了应用程序域重新启动的情况。
  • 在高峰时段,我查看了 ASP.NET 性能计数器。我们可以看到我们有 600 个当前连接和 500 个排队连接的行为。
  • 在高峰时段,CPU 约为 40%(这让我认为它不受 CPU 限制)
  • 物理内存的使用率约为 60%
  • 在高峰时段,数据库服务器 CPU 约为 30%(这让我认为它不受数据库限制)

我的结论是,其他原因导致服务器无法更快地处理请求。可能的嫌疑

  • 点 死锁(!syncblk 只提供一个锁)
  • 磁盘 I/O(通过 sysinternals procesexplorer 检查:3.5 mB/s)
  • 垃圾收集(高峰期间为 10~15%)
  • 网络 I/O(连接时间仍然很低)

找出原因我创建的进程正在执行小型转储。

我成功地创建了两个相隔 20 秒的 MemoryDump。这是第一个的输出:

!threadpool
CPU utilization 6%
Worker Thread: Total: 95 Running: 72 Idle: 23 MaxLimit: 200 MinLimit: 100
Work Request in Queue: 1
--------------------------------------
Number of Timers: 64

和第二个的输出:

!threadpool
CPU utilization 9%
Worker Thread: Total: 111 Running: 111 Idle: 0 MaxLimit: 200 MinLimit: 100
Work Request in Queue: 1589

正如您所看到的,队列中有很多请求。

问题1:队列中有1589个请求是什么意思?这是否意味着有什么东西阻塞了?

!threadpool 列表主要包含以下条目: 未知功能:6a2aa293 上下文:01cd1558 AsyncTimerCallbackCompletion TimerInfo@023a2cb0

如果我深入了解 AsyncTimerCallbackCompletion

!dumpheap -type TimerCallback

那么我会查看 TimerCallback 中的对象,其中大多数都是以下类型:

System.Web.SessionState.SessionStateModule
System.Web.Caching.CacheCommon

问题 2: 这些对象具有计时器是否有意义,还有这么多?我应该阻止这种情况吗?又如何呢?

主要问题我是否错过了任何明显的问题,为什么我要对连接进行排队并且没有最大化 CPU?


我在高峰期间成功制作了故障转储。用 debugdiag 分析它给了我这个警告:

Detected possible blocking or leaked critical section at webengine!g_AppDomainLock owned by thread 65 in Hang Dump.dmp
Impact of this lock
25.00% of threads blocked
(Threads 11 20 29 30 31 32 33 39 40 41 42 74 75 76 77 78 79 80 81 82 83)

The following functions are trying to enter this critical section
webengine!GetAppDomain+c9

The following module(s) are involved with this critical section
\\?\C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\webengine.dll from Microsoft Corporation

快速谷歌搜索没有给我任何结果。有人有线索吗?

I have a strange situation on a production server. Connection for asp.net get queued but the CPU is only at 40%. Also the database runs fine at 30% CPU.

Some more history as requested in the comments:

  • In the peak hours the sites gets around 20,000 visitors an hour.
  • The site is an asp.net webforms application with a lot of AJAX/POSTs
  • The site uses a lot of User generated content
  • We measure the performance of the site with a testpage which does hit the database and the webservices used by the site. This page get served within a second on normal load. Whe define the application as slow when the request takes more than 4 seconds.
  • From the measurements we can see that the connectiontime is fast, but the processing time is large.
  • We can't pinpoint the slowresponse the a single request, the site runs fine during normal hours but gets slow during peak hours
  • We had a problem that the site was CPU bound (aka running at 100%), we fixed that
  • We also had problems with exceptions maken the appdomain restart, we fixed that do
  • During peak hours I take a look at the asp.net performance counters. We can see behaviour that we have 600 current connections with 500 queued connections.
  • At peak times the CPU is around 40% (which makes me the think that it is not CPU bound)
  • Physical memory is around 60% used
  • At peak times the DatabaseServer CPU is around 30% (which makes me think it is not Database bound)

My conclusion is that something else is stopping the server from handling the requests faster. Possible suspects

  • Deadlocks (!syncblk only gives one lock)
  • Disk I/O (checked via sysinternals procesexplorer: 3.5 mB/s)
  • Garbage collection (10~15% during peaks)
  • Network I/O (connect time still low)

To find out what the proces is doing I created to minidumps.

I managed to create two MemoryDumps 20 seconds apart. This is the output of the first:

!threadpool
CPU utilization 6%
Worker Thread: Total: 95 Running: 72 Idle: 23 MaxLimit: 200 MinLimit: 100
Work Request in Queue: 1
--------------------------------------
Number of Timers: 64

and the output of the second:

!threadpool
CPU utilization 9%
Worker Thread: Total: 111 Running: 111 Idle: 0 MaxLimit: 200 MinLimit: 100
Work Request in Queue: 1589

As you can see there are a lot of Request in Queue.

Question 1: what does it mean that there are 1589 requests in queue. Does it mean something is blocking?

The !threadpool list contains mostly these entries:
Unknown Function: 6a2aa293 Context: 01cd1558
AsyncTimerCallbackCompletion TimerInfo@023a2cb0

If I you into depth with the AsyncTimerCallbackCompletion

!dumpheap -type TimerCallback

Then I look at the objects in the TimerCallback and most of them are of types:

System.Web.SessionState.SessionStateModule
System.Web.Caching.CacheCommon

Question 2: Does it make any sense that those Objects hava a timer, and so much? Should I prevent this. And how?

Main Question do I miss any obvious problems why I'm queueing connections and not maxing out the CPU?


I succeeded in making a crashdump during a peak. Analyzing it with debugdiag gave me this warning:

Detected possible blocking or leaked critical section at webengine!g_AppDomainLock owned by thread 65 in Hang Dump.dmp
Impact of this lock
25.00% of threads blocked
(Threads 11 20 29 30 31 32 33 39 40 41 42 74 75 76 77 78 79 80 81 82 83)

The following functions are trying to enter this critical section
webengine!GetAppDomain+c9

The following module(s) are involved with this critical section
\\?\C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\webengine.dll from Microsoft Corporation

A quick google search doesn't give me any results. Does somebody has a clue?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

冧九 2024-10-10 13:32:20

处理队列的工作进程是真正的破坏者。可能与调用同一主机上的 Web 服务的网站连接。从而造成一种僵局。

我将 machine.config 更改为以下内容:

<processModel
        autoConfig="false"
        maxWorkerThreads="100"
        maxIoThreads="100"
        minWorkerThreads="50"
        minIoThreads="50" />

标准此 processModel 设置为 autoConfig="true"

使用新配置,网络服务器处理请求的速度足够快,不会排队。

The worker processes handling the queue was the real dealbreaker. Probably connected with the website calling webservices on the same host. Thus creating a kind of deadlock.

I changed the machine.config to to following:

<processModel
        autoConfig="false"
        maxWorkerThreads="100"
        maxIoThreads="100"
        minWorkerThreads="50"
        minIoThreads="50" />

Standard this processModel is set to autoConfig="true"

With the new config the webserver is handling the requests fast enough to not get queued.

迟月 2024-10-10 13:32:20

我支持 realworldcoder:IIS 的工作原理是让工作进程处理传入的请求。如果请求堆积起来(正如它所显示的那样),那么性能就会急剧下降。

有几件可能的事情需要做/检查。

  1. 启动 SQL Server 上的活动监视器。您希望了解哪些查询花费的时间最长,并根据结果进行更改以减少其执行时间。长查询可能会导致页面正在执行的线程阻塞,从而减少可以支持的连接数量。

  2. 查看这些页面/ajax 调用的查询数量及其执行时间。我见过一些页面包含数十个不必要的查询,这些查询是为 Ajax 调用而执行的,这仅仅是因为 .Net 会执行整个页面周期,即使只需要运行特定方法也是如此。您可以将这些调用拆分到常规 Web 处理程序 (.ashx) 页面中,这样您就可以更好地准确控制所发生的情况。

  3. 考虑增加 IIS 处理传入请求的工作进程数量。新应用程序池的默认值为 1 个进程,其中 20 个线程。这通常足以处理大量请求;但是,如果请求由于等待数据库服务器或某些其他资源而阻塞,则可能会导致管道堆积。请记住,这可能会对应用程序的性能和正常功能产生积极或消极的影响。因此,做一些研究,然后测试、测试、测试。

  4. 考虑减少或消除会话的使用。不管怎样,查看它的内存使用情况,可能会为您的网络服务器添加更多内存。每次页面加载(包括 ajax 调用)时,会话数据都会被序列化和反序列化,无论数据是否被使用。根据您在会话中存储的内容,它可能会对您的网站产生严重的负面影响。如果您不使用它,请确保它在您的 web.config 中完全关闭。请注意,如果您将会话存储在网络服务器之外,这些问题只会变得更糟,因为当页面检索和存储会话时,您就会受到网络速度的限制。

  5. 查看有关 JIT(即时)编译的站点性能计数器。这应该是几乎不存在的。我见过一些网站因大量 JIT 而崩溃。一旦这些页面被重新编码以消除它,这些网站就再次开始飞速发展。

  6. 查看不同的缓存策略(我不认为会话是真正的缓存解决方案)。也许有些东西是您不断请求的,但实际上并不需要不断从数据库服务器中提取。我的一个朋友有一个网站,他们将整个网页缓存为动态内容的物理文件,包括他们的讨论组。这从根本上提高了他们的表现;但这是一个重大的架构变化。

以上只是一些需要注意的事情。您基本上需要进一步了解细节才能确切地了解正在发生的情况,而大多数常规性能计数器不会让您清楚地了解。

I'm with realworldcoder: IIS works by having Worker Processes handle the incoming requests. If the requests get stacked up, as it appears is happening, then performance takes a nose dive.

There are several possible things to do/check for.

  1. Fire up Activity Monitor on the SQL Server. You want to see what queries are taking the longest and, depending on the results, make changes to reduce their execution time. Long queries can cause the thread the page is executing under to block, reducing the number of connections you can support.

  2. Look at the number of queries, and the time they take to execute, for these page/ajax calls. I've seen pages with dozens of unnecessary queries that get executed for an Ajax call simply because .Net executes the entire page cycle even when only a particular method needed to be run. You might split those calls into regular web handlers (.ashx) pages that way you can better control exactly what happens.

  3. Consider increasing the number of worker processes IIS has to handle incoming requests. The default for a new app pool is 1 process with 20 threads. This is usually enough to handle tons of requests; however, if the requests are blocking due to waiting on the DB server or some other resource it can cause the pipeline to stack up. Bear in mind that this can have either a positive or negative impact to both performance and regular functioning of your application. So do some research then test, test, test.

  4. Consider reducing or eliminating your usage of session. Either way, look at the memory usage of it, potentially add more ram to your web server. Session data is serialized and deserialized for every page load (including ajax calls) regardless of whether the data is used or not. depending on what you are storing in session it can have a serious negative impact on your site. If you aren't using it, then make sure it's completely turned off in your web.config. Note that these issues only get worse if you store session off of the web server as you then become bound to the speed of the network when a page retrieves and stores it.

  5. Look at the sites performance counters around JIT (Just-In-Time) compiling. This should be nearly non-existent. I've seen sites brought to their knees by massive amounts of JIT. Once those pages were recoded to eliminate it, the sites started flying again.

  6. Look at different caching strategies (I don't consider session a real caching solution). Perhaps there are things that you constantly request that you don't really need to constantly pull out of the DB server. A friend of mine has a site where they cache entire web pages as physical files for dynamic content, including their discussion groups. This has radically increased their performance; but it is a major architectural change.

The above are just a couple things to look at. You basically need to get further into the details to find out exactly what is going on and most of the regular performance counters aren't going to give you that clarity.

秋千易 2024-10-10 13:32:20

过多的 ASP.NET 排队请求会破坏性能。请求线程的数量非常有限。

尝试通过异步处理页面的缓慢部分来释放这些线程,或者采取其他任何措施来缩短页面执行时间。

Too many ASP.NET queued requests will destroy performance. There are a very limited number of request threads.

Try to free up those threads by processing slow parts of your pages asynchronously or do anything else you can to bring down page execution times.

南城追梦 2024-10-10 13:32:20

我知道这是一个旧线程,但对于 ASP.NET 站点性能较差的人来说,这是 Google 的第一个点击。所以我会提出一些建议:

1)异步编程将解决根本原因。当您调用 Web 服务来执行实际业务逻辑时,这些请求线程只是坐在那里等待响应。它们可以用来服务另一个传入的请求。如果不能完全消除的话,这将显着减少队列长度。异步编程关注的是可扩展性,而不是单个请求的性能。在 .NET 4.5 中,使用 异步/等待模式。 ASP.NET 以每分钟 2 个线程的速度注入线程,因此除非您重新使用这些现有线程,否则您将很快耗尽所接收的站点负载。此外,启动更多线程对性能影响很小;它需要更多的 RAM 和分配 RAM 的时间。仅增加 machine.config 中的线程池大小并不能解决根本问题。除非你添加更多的CPU,否则添加更多的线程并没有真正的帮助,因为这仍然是资源的错误分配,而且你还可能因为线程太多而CPU太少而导致上下文切换而死。

2) 摘自一篇有关 IIS 7.5 中线程的热门文章:如果您的 ASP.NET 应用程序使用 Web 服务(WFC 或 ASMX)或 System.Net 通过 HTTP 与后端进行通信,您可以可能需要增加connectionManagement/maxconnection。对于 ASP.NET 应用程序,自动配置功能将其限制为 12 * #CPU。这意味着在四进程上,您最多可以有 12 * 4 = 48 个到 IP 端点的并发连接。因为这与 autoConfig 相关,所以在 ASP.NET 应用程序中增加 maxconnection 的最简单方法是以编程方式设置 System.Net.ServicePointManager.DefaultConnectionLimit,例如从 Application_Start。将该值设置为您希望应用程序使用的并发 System.Net 连接数。我已将其设置为 Int32.MaxValue 并且没有任何副作用,因此您可以尝试一下 - 这实际上是本机 HTTP 堆栈 WinHTTP 中使用的默认值。如果您无法以编程方式设置 System.Net.ServicePointManager.DefaultConnectionLimit ,则需要禁用 autoConfig ,但这意味着您还需要设置 maxWorkerThreads 和 maxIoThreads 。如果您不使用经典/ISAPI 模式,则无需设置 minFreeThreads 或 minLocalRequestFreeThreads。

3) 如果您每小时有 20,000 名独立访问者,您确实应该考虑负载平衡。如果每个用户每小时执行 10-20 个 AJAX 请求,那么您很容易就会对后端进行 100 万次或更多的 Web 服务调用。建立另一台服务器将减少主服务器上的负载。将其与 async/await 结合起来,您就可以轻松地使用硬件来解决问题(横向扩展)。这里有多种好处,例如硬件冗余、地理位置和性能。如果您使用的是 AWS 或 RackSpace 等云提供商,那么使用您的应用程序启动另一个虚拟机非常简单,可以通过手机完成。如今云计算太便宜了,甚至根本没有队列长度。即使在切换到异步编程模型之前,您也可以这样做以提供性能优势。

4) 扩展:向服务器添加更多硬件会有所帮助,因为当您有更多线程时,它可以提供更好的稳定性。更多线程意味着您需要更多 CPU 和 RAM。即使您已经掌握了 async/await,如果可以的话,您仍然希望微调这些 Web 服务请求。这可能意味着添加缓存层或增强数据库系统。您不想最大化该单个服务器上的 CPU。一旦 CPU 达到 80%,ASP.NET 将停止向系统注入更多线程。工作进程是否处于 0% 并不重要,如果任务管理器报告的总体系统 CPU 利用率达到 80%,则线程注入将停止并且请求开始排队。当垃圾收集检测到服务器上的 CPU 负载较高时,也会发生奇怪的事情。

I know this is an old thread but it's one of the first Google hits for people with poor ASP.NET site performance. So I will throw out a few recommendations:

1) Asynchronous Programming will solve the root cause. While you're calling out to a webservice to do your actual business logic, those request threads are just sitting there waiting on the response. They could be used instead to service another incoming request. This will reduce your Queue Length dramatically if not eliminate it entirely. Asynchronous programming is about scalability, not individual request performance. This is achieved quite easy in .NET 4.5 with the Async/Await pattern. ASP.NET injects threads at a rate of 2 per minute, so unless you are re-using those existing threads, you're going to quickly run out with the site load you are receiving. In addition, spinning up more threads is a small performance hit; it takes up more RAM and time to allocate that RAM. Just increasing the thread pool size in the machine.config won't fix the underlying problem. Unless you add more CPUs, adding more threads won't really help since it's still a misallocation of resources and you can also context-switch yourself to death by having too many threads and too little CPU.

2) From a popular article on threading in IIS 7.5: If your ASP.NET application is using web services (WFC or ASMX) or System.Net to communicate with a backend over HTTP you may need to increase connectionManagement/maxconnection. For ASP.NET applications, this is limited to 12 * #CPUs by the autoConfig feature. This means that on a quad-proc, you can have at most 12 * 4 = 48 concurrent connections to an IP end point. Because this is tied to autoConfig, the easiest way to increase maxconnection in an ASP.NET application is to set System.Net.ServicePointManager.DefaultConnectionLimit programatically, from Application_Start, for example. Set the value to the number of concurrent System.Net connections you expect your application to use. I've set this to Int32.MaxValue and not had any side effects, so you might try that--this is actually the default used in the native HTTP stack, WinHTTP. If you're not able to set System.Net.ServicePointManager.DefaultConnectionLimit programmatically, you'll need to disable autoConfig , but that means you also need to set maxWorkerThreads and maxIoThreads. You won't need to set minFreeThreads or minLocalRequestFreeThreads if you're not using classic/ISAPI mode.

3) You should really look at load-balancing if you're getting 20k unique visitors per hour. If every user did 10-20 AJAX requests per hour, you're easily talking about 1 million or more web service calls to your backend. Throwing up another server would reduce the load on the primary server. Combining this with async/await, and you've put yourself in a good situation where you can easily throw hardware at the problem (scaling out). There are multiple benefits here such as hardware redundancy, geolocation, and also performance. If you're using a cloud provider such as AWS or RackSpace, spinning up another VM with your app on it is easy enough that it can be done from your mobile phone. Cloud computing is too cheap nowadays to even have a queue length at all. You could do this to provide the performance benefits even before you make the switch to an asynchronous programming model.

4) Scaling Up: adding more hardware to your server(s) help because it providers better stability when you have additional threads. More threads means you need more CPUs and RAM. And even after you've gotten async/await under your belt, you'll still want to fine-tune those web service requests if you can. This could mean adding in a caching layer or beefing up your database system. You do NOT want to maximize the CPU on that single server. Once the CPU reaches 80%, ASP.NET will stop injecting more threads into the system. It doesn't matter if the worker process is sitting at 0%, if the overall system CPU utilization as reported by Task Manager reaches 80%, then thread injection stops and requests begin to queue. Weird things with garbage collection also happens when it detects a high CPU load on the server.

假情假意假温柔 2024-10-10 13:32:20

有人能够确认这对他们有用吗?我在网上找到了这个答案,并且有零确认发布的答案为他们解决了这个问题。话虽如此,我并不真正相信它的可信度,因为答案是由问题发布者提供的。

我最近遇到了同样的问题:

检测到可能阻塞或泄漏的关键部分
webengine!g_AppDomainLock 由线程 16 拥有
w3wp.exe__默认应用程序池__PID__3920__日期__04_26_2011__时间_10_40_42AM__109__IIS_COM+
挂起转储.dmp
此锁的影响

4.17% 的线程被阻塞
(主题 17)
以下函数正试图进入webengine的这个临界区!GetAppDomain+c9
此关键部分涉及以下模块 \?\c:\WINDOWS\microsoft.net\framework\v2.0.50727\webengine.dll
微软公司

这是 Microsoft 发布的用于进一步排除故障的建议:

根据 root 确定了以下供应商进行跟进
原因分析微软公司
请跟进上述供应商。考虑以下方法来确定此关键部分的根本原因
问题:

  1. 在应用程序验证器中启用“锁定检查”
    A. 从​​以下 URL 下载应用程序验证程序: http://www.microsoft.com/downloads/en/details.aspx?FamilyID=c4a25ab9-649d-4a1b-b4a7-c9d8b095df18&displaylang=en
    B. 通过运行以下命令为此进程启用“锁定检查”:

    Appverif.exe -enable locks -for w3wp.exe
    C. 有关应用程序验证程序的更多信息,请参阅以下文档:
    http: //msdn.microsoft.com/library/default.asp?url=/library/en-us/dnappcom/html/appverifier.asp?frame=true

  2. 使用 DebugDiag 崩溃规则监控应用程序是否存在异常

Was anybody able to confirm this worked for them? I've found that answer across the web, and there are zero confirmations that the posted answer fixed this problem for them. With that being said, I don't really give it credibility as the answer is provided by the question poster.

I got the same problem recently:

Detected possible blocking or leaked critical section at
webengine!g_AppDomainLock owned by thread 16 in
w3wp.exe__DefaultAppPool__PID__3920__Date__04_26_2011__Time_10_40_42AM__109__IIS_COM+
Hang Dump.dmp
Impact of this lock

4.17% of threads blocked
(Threads 17)
The following functions are trying to enter this critical section webengine!GetAppDomain+c9
The following module(s) are involved with this critical section \?\c:\WINDOWS\microsoft.net\framework\v2.0.50727\webengine.dll from
Microsoft Corporation

This is the recommendation posted by Microsoft to further troubleshoot:

The following vendors were identified for follow up based on root
cause analysis Microsoft Corporation
Please follow up with the vendors identified above. Consider the following approach to determine root cause for this critical section
problem:

  1. Enable 'lock checks' in Application Verifier
    A. Download Application Verifier from the following URL: http://www.microsoft.com/downloads/en/details.aspx?FamilyID=c4a25ab9-649d-4a1b-b4a7-c9d8b095df18&displaylang=en
    B. Enable 'lock checks' for this process by running the following command:

    Appverif.exe -enable locks -for w3wp.exe
    C. See the following document for more information on Application Verifier:
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnappcom/html/appverifier.asp?frame=true

  2. Use a DebugDiag crash rule to monitor the application for exceptions

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文