htmlagilitypack多线程中cpu使用率超过50%

发布于 2025-01-03 08:29:40 字数 2860 浏览 1 评论 0原文

我的应用程序使用 10 个线程来读取大量 html 文件。类似于以下代码:

for (int i = 0; i < 10; i++)
{
    new Thread(ParserHtmlWork)
    {
       IsBackground = true
    }.Start();
}

void ParserHtmlWork()
{            
      while (true)
      {
          //read the next file from the queue.
          var filePath = Query.Enqueue();
          using (var stream = OpenFile(filePath))
          {
              stream.Close();
          }
          System.Threading.Thread.Sleep(800);
      }
}

上面的代码运行没有问题,平均 cpu 为 2%-5%,接下来我更改我的代码,将 htmlagilitypack 库添加到解析器 html 。

private HtmlDocument CreateHtmlDocument(Stream stream, Encoding encoding)
{
    var doc = new HtmlDocument();
    ////Defines if the 'id' attribute must be specifically used. 
    doc.OptionUseIdAttribute = false;
    //Defines if declared encoding must be read from the document. 
    //Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node
    doc.OptionReadEncoding = false;
    doc.Load(stream, encoding);
    return doc;
}

更改 ParserHtmlWork 方法添加调用 CreateHtmlDocument 方法:

 using (var stream = OpenFile(filePath))
 {
     CreateHtmlDocument(stream, Encoding.UTF8);
     stream.Close();
 }

再次运行上述方法,平均 cpu 高达 50-60%(平均文件大小为 80k)。如果我将线程数减少到1,平均CPU下降到2%-5%。

我通过我的产品中的Visual Studio性能捕获CPU采样(不是上面的代码):

ApplicationEngine.Start()
Inclusive Samples: 398
Exclusive Samples: 0
Inclusive Samples %: 76
Exclusive Samples %: 0

ApplicationEngine.DoWork(class System.IO.Stream)
Inclusive Samples: 337
Exclusive Samples: 0
Inclusive Samples %: 64.44
Exclusive Samples %: 0.00

CreateHtmlDocument(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 298  
Exclusive Samples: 0
Inclusive Samples %: 56.98
Exclusive Samples %: 0.00

HtmlAgilityPack.HtmlDocument.Load(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 296
Exclusive Samples: 0
Inclusive Samples %: 56.60
Exclusive Samples %: 0.00

HtmlAgilityPack.HtmlDocument.Load(class System.IO.TextReader)
Inclusive Samples: 294
Exclusive Samples: 0
Inclusive Samples %: 56.21
Exclusive Samples %: 0.00

HtmlAgilityPack.HtmlDocument.Parse()
Inclusive Samples: 273
Exclusive Samples: 13
Inclusive Samples %: 52.20
Exclusive Samples %: 2.49

HtmlAgilityPack.HtmlDocument.PushNodeEnd(int32,bool)
Inclusive Samples: 135
Exclusive Samples: 2
Inclusive Samples %: 25.81
Exclusive Samples %: 0.38

[clr.dll]   130 106 24.86   20.27

System.String.ToLower()             
Inclusive Samples: 118
Exclusive Samples: 118
Inclusive Samples %: 22.56
Exclusive Samples %: 22.56

HtmlAgilityPack.HtmlNode.get_Name()             
Inclusive Samples: 81
Exclusive Samples: 3
Inclusive Samples %: 15.49
Exclusive Samples %: 0.57

my application use 10 threads that to read a lot of html file.similar the following code:

for (int i = 0; i < 10; i++)
{
    new Thread(ParserHtmlWork)
    {
       IsBackground = true
    }.Start();
}

void ParserHtmlWork()
{            
      while (true)
      {
          //read the next file from the queue.
          var filePath = Query.Enqueue();
          using (var stream = OpenFile(filePath))
          {
              stream.Close();
          }
          System.Threading.Thread.Sleep(800);
      }
}

the above code running no problem,the avg cpu is 2%-5%,next i change my code that add the htmlagilitypack library that to parser html.

private HtmlDocument CreateHtmlDocument(Stream stream, Encoding encoding)
{
    var doc = new HtmlDocument();
    ////Defines if the 'id' attribute must be specifically used. 
    doc.OptionUseIdAttribute = false;
    //Defines if declared encoding must be read from the document. 
    //Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node
    doc.OptionReadEncoding = false;
    doc.Load(stream, encoding);
    return doc;
}

change the ParserHtmlWork method add invoke a CreateHtmlDocument method:

 using (var stream = OpenFile(filePath))
 {
     CreateHtmlDocument(stream, Encoding.UTF8);
     stream.Close();
 }

running the above again,the avg cpu up to 50-60%(the average file size is 80k).if i decrease the thread number to 1,the ave cpu down to the 2%-5%.

i capture the cpu sampling by the visual studio performance in my products(not the above code):

ApplicationEngine.Start()
Inclusive Samples: 398
Exclusive Samples: 0
Inclusive Samples %: 76
Exclusive Samples %: 0

ApplicationEngine.DoWork(class System.IO.Stream)
Inclusive Samples: 337
Exclusive Samples: 0
Inclusive Samples %: 64.44
Exclusive Samples %: 0.00

CreateHtmlDocument(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 298  
Exclusive Samples: 0
Inclusive Samples %: 56.98
Exclusive Samples %: 0.00

HtmlAgilityPack.HtmlDocument.Load(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 296
Exclusive Samples: 0
Inclusive Samples %: 56.60
Exclusive Samples %: 0.00

HtmlAgilityPack.HtmlDocument.Load(class System.IO.TextReader)
Inclusive Samples: 294
Exclusive Samples: 0
Inclusive Samples %: 56.21
Exclusive Samples %: 0.00

HtmlAgilityPack.HtmlDocument.Parse()
Inclusive Samples: 273
Exclusive Samples: 13
Inclusive Samples %: 52.20
Exclusive Samples %: 2.49

HtmlAgilityPack.HtmlDocument.PushNodeEnd(int32,bool)
Inclusive Samples: 135
Exclusive Samples: 2
Inclusive Samples %: 25.81
Exclusive Samples %: 0.38

[clr.dll]   130 106 24.86   20.27

System.String.ToLower()             
Inclusive Samples: 118
Exclusive Samples: 118
Inclusive Samples %: 22.56
Exclusive Samples %: 22.56

HtmlAgilityPack.HtmlNode.get_Name()             
Inclusive Samples: 81
Exclusive Samples: 3
Inclusive Samples %: 15.49
Exclusive Samples %: 0.57

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

魄砕の薆 2025-01-10 08:29:40

那么你的问题是什么?

使用 CPU 的 HTML 解析器?您期望什么 - 下载不会,HTML 解析使用 CPU,如果您使用大量并行线程,那么是的,这会增加。

您可以做的不多 - 通过分析器优化 HtmlAgilityPack 以查看那里是否存在瓶颈。如果没有……那么……获得更快的处理器/更多的服务器或优化您的代码。

投票结束并-1 - 除了“天哪,我的CPU在必须工作时被使用”之外,我在这里看不到任何相关问题。

So what is your problem?

A HTML parser using CPU? What did you expect - the downloads do not, HTML parsing uses CPU, and if you use a lot of parallel threads then yes, this will add up.

Not a lot you can do - optimize HtmlAgilityPack via a proviler to see whether or not there is a bottleneck there. If not... well... get a faster processor / more servers or optimize your code.

Vote to close and -1 - I fail to see any related question here except "oh my god, my CPU is used when it has to do work".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文