为什么 ASP.net 网站占用高 CPU 且访问量很少？

发布于 2025-01-04 00:50:02 字数 1514 浏览 4 评论 0原文

我们正在开发一个包含超过 10 万种产品的购物车网站，该网站是在流行的电子商务应用程序 - NopCommerce 2.3 版之上制作的（只是向您介绍 NopCommerce - 这是最好和流行的开源电子商务之一）应用程序构建在 ASP.net 版本 4 和 MVC3 之上。）。该网站以两种语言和单一货币发布。

对于大约 80 个类别和 30-40k 个产品，它的效果相当好。我的意思是不是很糟糕。但也不好。一旦添加更多产品，性能问题就会开始出现，例如只有 10-20 个用户时响应时间长（加载时间超过 40-50 秒）和 CPU 使用率高（利用率 90-100%）。

该服务器采用四核 Xeon 处理器，配备 16 GB RAM - Windows Server 2008 R2，并且与另外一个电子商务网站（拥有 50k 个产品的定制开发代码）运行良好 - 占用的 CPU 几乎不超过 4-8%。

我们使用缓存将主页特色产品和类别菜单存储在内存中，以避免数据库调用。它仅改进了主页。

后来为了修复问题，我们进行了分析，发现是目录列表导致从数据库获取数据时出现大量延迟，而数据库已被精细标准化。 SQL 服务器似乎占用了 80-90% 的 CPU，而 w3wp 占用了 30-40% 的 cpu，这导致 cpu 始终处于 100%，而网站上的访问者很少。我们咨询了一些专家，他们建议我们以二进制格式将非规范化数据存储在磁盘上，以绕过昂贵的数据库连接。我们做了一些研究，并使用 Protobuff 将非规范化序列化对象数据存储到磁盘，该磁盘仅存储目录 - 产品列表页面所需的字段。但由于维护一些规范功能，我们用来创建 3 个二进制文件。一个用于产品对象，另一个用于类别规范对象。这两个文件是每个类别的。还有一个用于产品和规格映射的文件 - 占用近 5 MB。当请求到来时，它从序列化的二进制文件中读取并将数据返回给对象。仅当有人根据规格过滤产品时，它才会读取映射文件。

因此，现在每当收到类别产品列表页面的请求时，它都会检查是否为该类别创建了二进制文件，如果没有，则使用存储过程生成，并将对象保存到二进制文件以供以后使用。如果文件存在，则直接从二进制文件中读取。有了这个东西，我们在加载此页面时避免了 90% 的数据库调用。由于只有少数用户（约 30-40），它的作用就像一个魅力。我们能够将每个页面加载的响应时间减少到 700-800 毫秒。从加载时间来看，这是一个很大的改进，但 CPU 仍然偏高。区别在于：现在 w3wp 使用 60-70% 的 cpu，有 20-30 个访问者，而 sql 几乎不使用 5-8%。

但随着用户数量增加到 100-120 左右，服务器开始挂起，w3wp 的使用率持续超过 100%。请求不再以秒为单位提供服务，而是需要超过 20-25 秒才能加载。然后大多数请求都不会得到满足。当多个请求到达该网站时，我们注意到了这一点。

我们不是序列化和二进制格式化程序方面的专家。但我们认为高 CPU 使用率是由文件读取操作引起的，或者可能是由于在每个目录页加载上执行反序列化操作所致。

我们现在正在寻找解决高 CPU 使用率的可能解决方案。可能是什么问题，我们应该在哪里解决它。您认为这是文件读取操作还是反序列化导致的？我们应该将非规范化对象存储在数据库中吗？我们有什么替代方案来解决这个问题？

等待您的专家意见。

提前致谢。

原文

We are working on shopping cart website with more than 1 lakh products, made on the top of popular e-commerce application - NopCommerce version 2.3 (Just to introduce you to NopCommerce - It's a one of the best & popular open source e-commerce application built on the top of ASP.net version 4 and MVC3.). The site was published with two langauages and single currency.

With around 80 category and 30-40k products it works fairly well. I mean not very bad. But it wasn't good either. As soon as more products were added, performance issues were started with symptoms like long response time (more than 40-50 seconds to load) and high CPU usage (utilizing 90-100%) with just 10-20 users.

The server is Quad Core Xeon Processor with 16 GB of RAM - Windows Server 2008 R2, and is working fine with one more e-commerce website with 50k products on custom develop code - taking hardly 4-8% cpu.

We used cache to store home page featured products and category menu in memory to avoid db calls. It improved home page only.

Later on for fixing issue, we profiled and found that it was Catalog listing which was causing lot of delay to fetch data from the db, which is finely normalized. SQL server seems to take 80-90% CPU and w3wp were taking 30-40% cpu which is causing 100% cpu constantly all the time with just few visitors on website. We consulted few expert, they suggested us to store de-normalized data on disk in binary format to bypass expensive database connections. We did some research and used Protobuff to store de-normalized serialized objects data to the disk which is storing only those fields which are necessary for catalog - product listing page. But due to maintaing some specification functionality we used to create 3 binary file. One for product object, another for category specification object. These two files are per category. And one more file for product and specification mapping - taking almost 5 mb. When requests come, it reads from the serialized binary file and returns data to the object. It reads into mapping file only when someone is filtering the product based on specification.

So now whenever a request for catagory product listing page comes, it checks whether there is binary file created for that category, if it doesn't it generates using stored procedure, and save object to binary for later use. If file exists it directly reads it from binary file. With this thing, we avoided 90% of db calls while loading this page. With just few users (appx. 30-40), it works like a charm. And we are able to reduce the response time to 700-800 ms for each page load. This is a great improvement if we look at the loading time, but CPU is still on the higher side. Differece is: now w3wp using 60-70% cpu with 20-30 visitors and sql is hardly using 5-8%.

But with hitting more users appx to 100-120, servers starts to hang and w3wp is using more than 100% constantly. Requests are no longer served in seconds, instead it takes more than 20-25 seconds to load. And then most requests are never served. We noticed this when multiple request are coming to the site.

We're not expert at Serialization and Binary formater. But we think the high cpu usage is caused by file read operation or may be due to the de-serialization operation being performed on each catalog page load.

We're now looking at the probable solution to address High CPU usage. What could be the problem, and where should we look to fix it. What do you think, is it the file read operation or de-serialization causing this? Should we store de-normalized object in db? What are the alternative we have to address this issue?

Awaiting your expert opinion on the same.

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

暮色兮凉城 2025-01-11 00:50:02

由于您遇到 CPU 问题，我怀疑反序列化是罪魁祸首。在这种情况下，您可以通过自己实现 ISerialized 接口，使序列化、反序列化速度提高近 100 倍。我之前曾在大型对象图上使用过这种技术，并且改进是惊人的。

假设您有一个这样的类：

[Serializable]
public class TestObject : ISerializable {
  public long     id1;
  public long     id2;
  public long     id3;
  public string   s1;
  public string   s2;
  public string   s3;
  public string   s4;
  public DateTime dt1;
  public DateTime dt2;
  public bool     b1;
  public bool     b2;
  public bool     b3;
  public byte     e1;
  public IDictionary<string,object> d1;
}

实现 ISerialized，以便您可以进行自定义序列化和反序列化。

public void GetObjectData (SerializationInfo info, StreamingContext ctxt) {
  SerializationWriter sw = SerializationWriter.GetWriter ();
  sw.Write (id1);
  sw.Write (id2);
  sw.Write (id3);
  sw.Write (s1);
  sw.Write (s2);
  sw.Write (s3);
  sw.Write (s4);
  sw.Write (dt1);
  sw.Write (dt2);
  sw.Write (b1);
  sw.Write (b2);
  sw.Write (b3);
  sw.Write (e1);
  sw.Write<string,object> (d1);
  sw.AddToInfo (info);
}

public TestObject (SerializationInfo info, StreamingContext ctxt) {
  SerializationReader sr = SerializationReader.GetReader (info);
  id1 = sr.ReadInt64 ();
  id2 = sr.ReadInt64 ();
  id3 = sr.ReadInt64 ();
  s1  = sr.ReadString ();
  s2  = sr.ReadString ();
  s3  = sr.ReadString ();
  s4  = sr.ReadString ();
  dt1 = sr.ReadDateTime ();
  dt2 = sr.ReadDateTime ();
  b1  = sr.ReadBoolean ();
  b2  = sr.ReadBoolean ();
  b3  = sr.ReadBoolean ();
  e1  = sr.ReadByte ();
  d1  = sr.ReadDictionary<string,object> ();
}

这不仅可以使有效负载减小 10-100 倍，而且还可以将性能提高 10 倍甚至 100 倍。

另一件事，看看是否有任何大循环可以循环数千个对象。也许您有次优的 linq 查询。这些有时会占用 CPU。

最后，我会推荐我见过的开发人员所犯的十大缓存错误，尤其是在使用分布式缓存时。

http://www.codeproject.com/Articles /115107/破坏应用程序的十个缓存错误

Since you are having CPU issues, I suspect deserialization is the main culprit. In that case, you can make serialization, deserialization nearly 100 times faster by implementing ISerializable interface yourself. I have used this technique before for large object graph and the improvement was phenomenal.

Say you have a class like this:

[Serializable]
public class TestObject : ISerializable {
  public long     id1;
  public long     id2;
  public long     id3;
  public string   s1;
  public string   s2;
  public string   s3;
  public string   s4;
  public DateTime dt1;
  public DateTime dt2;
  public bool     b1;
  public bool     b2;
  public bool     b3;
  public byte     e1;
  public IDictionary<string,object> d1;
}

Implement ISerializable so that you can do custom serialization and deserialization.

public void GetObjectData (SerializationInfo info, StreamingContext ctxt) {
  SerializationWriter sw = SerializationWriter.GetWriter ();
  sw.Write (id1);
  sw.Write (id2);
  sw.Write (id3);
  sw.Write (s1);
  sw.Write (s2);
  sw.Write (s3);
  sw.Write (s4);
  sw.Write (dt1);
  sw.Write (dt2);
  sw.Write (b1);
  sw.Write (b2);
  sw.Write (b3);
  sw.Write (e1);
  sw.Write<string,object> (d1);
  sw.AddToInfo (info);
}

public TestObject (SerializationInfo info, StreamingContext ctxt) {
  SerializationReader sr = SerializationReader.GetReader (info);
  id1 = sr.ReadInt64 ();
  id2 = sr.ReadInt64 ();
  id3 = sr.ReadInt64 ();
  s1  = sr.ReadString ();
  s2  = sr.ReadString ();
  s3  = sr.ReadString ();
  s4  = sr.ReadString ();
  dt1 = sr.ReadDateTime ();
  dt2 = sr.ReadDateTime ();
  b1  = sr.ReadBoolean ();
  b2  = sr.ReadBoolean ();
  b3  = sr.ReadBoolean ();
  e1  = sr.ReadByte ();
  d1  = sr.ReadDictionary<string,object> ();
}

This will not only make the payload 10-100 times smaller, but also improve the performance by 10x to sometimes 100x.

Another thing, see if you have any large loop that loops through thousands of objects. Maybe you have suboptimal linq queries. Those are sometimes CPU hog.

And finally I would recommend Top 10 caching mistakes that I have seen developers make, especially when using a distributed cache.

http://www.codeproject.com/Articles/115107/Ten-Caching-Mistakes-that-Break-your-App

回复收藏 0 原文

鸠书 2025-01-11 00:50:02

问题 1：这个盒子上运行的是什么？如果我没看错的话，您有一个网站包含 50,000 种产品（没有提及用户或点击量），而另一个网站则包含更多产品。当您堆叠站点时，即使您的代码非常紧凑，您也会看到一些降级。

问题 2：所有层都放在一个盒子上吗？您现在有相互竞争的问题，并且可能会由于 I/O 操作而阻塞一些 CPU 密集型线程。

问题 3：您是否对代码进行了审查以确保正确的开发概念和方法（SOLID 等）？否则，您持有资源的时间可能会超过所需时间并导致问题。

问题4：你有简介吗？我指的是 SQL Server 和 Web 应用程序。如果没有，您不知道问题可能出在哪里，我怀疑这个论坛中是否有人可以帮助您。

即使有数百万个“产品”，正确设计的数据库和站点也应该相当快。但是，不同的因素汇集在一起代表绩效。所有层上的所有部分都会影响应用程序。

举个例子，我曾经咨询过一家公司，该公司构建了一个即将消亡的高性能电子商务应用程序。所有部分在代码审查中看起来都很好。在测试中，页面和数据库都运行良好。但他们从未强调过这个系统。如果他们有的话，他们就会发现这一点疯狂。

 //let's not focus on the magic string, okay? Think about static
 private static SqlConnection connection = new SqlConnection("{conn string here}");

整个站点通过单个 SQL 连接进行过滤，因为一名开发人员不理解底层连接池的概念，并认为对象初始化比通过静态“始终在线”连接进行过滤更有效。

在您分析应用程序之前，您没有可以回答的问题。一旦您发现问题并提出问题，有人可以站出来说“这就是您解决问题的方法”。您可以向这个问题添加更多信息，但在确定问题而不是一般症状之前，您将一事无成。

Question 1: What all is running on this box? If I read correctly, you have one site with 50,000 products (no mention of users or hits) and another with lots more. As you stack sites, you will see some degredation, even if your code is very tight.

Question 2: Do you have all layers on a single box? You now have competing concerns and might block some CPU bound threads due to I/O operations.

Question 3: Have you code reviewed to ensure proper development concepts and methodologies (SOLID, etc)? If not, you could hold resources longer than needed and cause issues.

Question 4: Have you profiled? I mean both SQL Server and the web application. if not, you have no clue where the issue might be and I doubt anyone can help you in this forum.

Even with millions of "products", a properly designed database and site should be fairly fast. But, different factors come together to represent performance. All of the pieces on all layers can affect the application.

As an example, I consulted once on a company that had built a high performance eCommerce application that was dying. All of the parts seemed fine in code reviews. In tests, both the pages and the database worked fine. But they had never stressed the system. If they had they would have caught this little bit of insanity.

 //let's not focus on the magic string, okay? Think about static
 private static SqlConnection connection = new SqlConnection("{conn string here}");

The entire site was filtering through a single SQL Connection because one developer did not understand the concept of the underlying connection pool and thought object initialization would be more of a hit than filtering through a static "always on" connection.

Until you profile the application, you don't have a question here that can be answered. Once you find an issue and ask, someone can step up and say "here is how you solve that". You can add more information to this question, but until there is a problem identified, rather than a generic symptom, you are going nowhere.

回复收藏 0 原文