htmlagilitypack多线程中cpu使用率超过50%
我的应用程序使用 10 个线程来读取大量 html 文件。类似于以下代码:
for (int i = 0; i < 10; i++)
{
new Thread(ParserHtmlWork)
{
IsBackground = true
}.Start();
}
void ParserHtmlWork()
{
while (true)
{
//read the next file from the queue.
var filePath = Query.Enqueue();
using (var stream = OpenFile(filePath))
{
stream.Close();
}
System.Threading.Thread.Sleep(800);
}
}
上面的代码运行没有问题,平均 cpu 为 2%-5%,接下来我更改我的代码,将 htmlagilitypack 库添加到解析器 html 。
private HtmlDocument CreateHtmlDocument(Stream stream, Encoding encoding)
{
var doc = new HtmlDocument();
////Defines if the 'id' attribute must be specifically used.
doc.OptionUseIdAttribute = false;
//Defines if declared encoding must be read from the document.
//Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node
doc.OptionReadEncoding = false;
doc.Load(stream, encoding);
return doc;
}
更改 ParserHtmlWork 方法添加调用 CreateHtmlDocument 方法:
using (var stream = OpenFile(filePath))
{
CreateHtmlDocument(stream, Encoding.UTF8);
stream.Close();
}
再次运行上述方法,平均 cpu 高达 50-60%(平均文件大小为 80k)。如果我将线程数减少到1,平均CPU下降到2%-5%。
我通过我的产品中的Visual Studio性能捕获CPU采样(不是上面的代码):
ApplicationEngine.Start()
Inclusive Samples: 398
Exclusive Samples: 0
Inclusive Samples %: 76
Exclusive Samples %: 0
ApplicationEngine.DoWork(class System.IO.Stream)
Inclusive Samples: 337
Exclusive Samples: 0
Inclusive Samples %: 64.44
Exclusive Samples %: 0.00
CreateHtmlDocument(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 298
Exclusive Samples: 0
Inclusive Samples %: 56.98
Exclusive Samples %: 0.00
HtmlAgilityPack.HtmlDocument.Load(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 296
Exclusive Samples: 0
Inclusive Samples %: 56.60
Exclusive Samples %: 0.00
HtmlAgilityPack.HtmlDocument.Load(class System.IO.TextReader)
Inclusive Samples: 294
Exclusive Samples: 0
Inclusive Samples %: 56.21
Exclusive Samples %: 0.00
HtmlAgilityPack.HtmlDocument.Parse()
Inclusive Samples: 273
Exclusive Samples: 13
Inclusive Samples %: 52.20
Exclusive Samples %: 2.49
HtmlAgilityPack.HtmlDocument.PushNodeEnd(int32,bool)
Inclusive Samples: 135
Exclusive Samples: 2
Inclusive Samples %: 25.81
Exclusive Samples %: 0.38
[clr.dll] 130 106 24.86 20.27
System.String.ToLower()
Inclusive Samples: 118
Exclusive Samples: 118
Inclusive Samples %: 22.56
Exclusive Samples %: 22.56
HtmlAgilityPack.HtmlNode.get_Name()
Inclusive Samples: 81
Exclusive Samples: 3
Inclusive Samples %: 15.49
Exclusive Samples %: 0.57
my application use 10 threads that to read a lot of html file.similar the following code:
for (int i = 0; i < 10; i++)
{
new Thread(ParserHtmlWork)
{
IsBackground = true
}.Start();
}
void ParserHtmlWork()
{
while (true)
{
//read the next file from the queue.
var filePath = Query.Enqueue();
using (var stream = OpenFile(filePath))
{
stream.Close();
}
System.Threading.Thread.Sleep(800);
}
}
the above code running no problem,the avg cpu is 2%-5%,next i change my code that add the htmlagilitypack library that to parser html.
private HtmlDocument CreateHtmlDocument(Stream stream, Encoding encoding)
{
var doc = new HtmlDocument();
////Defines if the 'id' attribute must be specifically used.
doc.OptionUseIdAttribute = false;
//Defines if declared encoding must be read from the document.
//Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node
doc.OptionReadEncoding = false;
doc.Load(stream, encoding);
return doc;
}
change the ParserHtmlWork method add invoke a CreateHtmlDocument method:
using (var stream = OpenFile(filePath))
{
CreateHtmlDocument(stream, Encoding.UTF8);
stream.Close();
}
running the above again,the avg cpu up to 50-60%(the average file size is 80k).if i decrease the thread number to 1,the ave cpu down to the 2%-5%.
i capture the cpu sampling by the visual studio performance in my products(not the above code):
ApplicationEngine.Start()
Inclusive Samples: 398
Exclusive Samples: 0
Inclusive Samples %: 76
Exclusive Samples %: 0
ApplicationEngine.DoWork(class System.IO.Stream)
Inclusive Samples: 337
Exclusive Samples: 0
Inclusive Samples %: 64.44
Exclusive Samples %: 0.00
CreateHtmlDocument(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 298
Exclusive Samples: 0
Inclusive Samples %: 56.98
Exclusive Samples %: 0.00
HtmlAgilityPack.HtmlDocument.Load(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 296
Exclusive Samples: 0
Inclusive Samples %: 56.60
Exclusive Samples %: 0.00
HtmlAgilityPack.HtmlDocument.Load(class System.IO.TextReader)
Inclusive Samples: 294
Exclusive Samples: 0
Inclusive Samples %: 56.21
Exclusive Samples %: 0.00
HtmlAgilityPack.HtmlDocument.Parse()
Inclusive Samples: 273
Exclusive Samples: 13
Inclusive Samples %: 52.20
Exclusive Samples %: 2.49
HtmlAgilityPack.HtmlDocument.PushNodeEnd(int32,bool)
Inclusive Samples: 135
Exclusive Samples: 2
Inclusive Samples %: 25.81
Exclusive Samples %: 0.38
[clr.dll] 130 106 24.86 20.27
System.String.ToLower()
Inclusive Samples: 118
Exclusive Samples: 118
Inclusive Samples %: 22.56
Exclusive Samples %: 22.56
HtmlAgilityPack.HtmlNode.get_Name()
Inclusive Samples: 81
Exclusive Samples: 3
Inclusive Samples %: 15.49
Exclusive Samples %: 0.57
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
那么你的问题是什么?
使用 CPU 的 HTML 解析器?您期望什么 - 下载不会,HTML 解析使用 CPU,如果您使用大量并行线程,那么是的,这会增加。
您可以做的不多 - 通过分析器优化 HtmlAgilityPack 以查看那里是否存在瓶颈。如果没有……那么……获得更快的处理器/更多的服务器或优化您的代码。
投票结束并-1 - 除了“天哪,我的CPU在必须工作时被使用”之外,我在这里看不到任何相关问题。
So what is your problem?
A HTML parser using CPU? What did you expect - the downloads do not, HTML parsing uses CPU, and if you use a lot of parallel threads then yes, this will add up.
Not a lot you can do - optimize HtmlAgilityPack via a proviler to see whether or not there is a bottleneck there. If not... well... get a faster processor / more servers or optimize your code.
Vote to close and -1 - I fail to see any related question here except "oh my god, my CPU is used when it has to do work".