通过 html 敏捷性仅从 html 获取文本
我试图从 html 中删除与 html 敏捷性相关的所有内容,但我需要保留文本。例如,从这个标签:
<TR><TD>
<B><A HREF="survival/index.html">Survival</A></B><BR>
<I>Be Suspicious, Be Worried, Be Prepared</I><BR>
<TD>
我只想保留“可疑......”
我有这个方法,但效果不是很好:
private static HtmlDocument RemoveHTML(HtmlDocument document)
{
HtmlDocument textOfDoc = new HtmlDocument();
foreach (var node in document.DocumentNode.SelectNodes(".//p|.//title|.//body"))
{
var newNode = HtmlNode.CreateNode(node.InnerText+" ");
textOfDoc.DocumentNode.AppendChild(newNode);
}
return textOfDoc;
}
谢谢!
I'm trying to remove from the html everything that is concerned to html with html agility, but I need to keep the text. For example, from this tag:
<TR><TD>
<B><A HREF="survival/index.html">Survival</A></B><BR>
<I>Be Suspicious, Be Worried, Be Prepared</I><BR>
<TD>
I want to keep only "Be suspicious..."
I have this method, but doesn't work very well:
private static HtmlDocument RemoveHTML(HtmlDocument document)
{
HtmlDocument textOfDoc = new HtmlDocument();
foreach (var node in document.DocumentNode.SelectNodes(".//p|.//title|.//body"))
{
var newNode = HtmlNode.CreateNode(node.InnerText+" ");
textOfDoc.DocumentNode.AppendChild(newNode);
}
return textOfDoc;
}
THANKS!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来您只提取了 P、TITLE 和 BODY 标签。如果你也想要 I 标签,你需要这样做:
It looks like you're only extracting P, TITLE and BODY tags. If you want I tags as well, you need to do this: