如何使用 html Agility Pack 进行 htmlencode?
有人这样做过吗? 基本上,我想通过保留 h1、h2、em 等基本标签来使用 html; 清理img和a标签中的所有非http地址; 并对所有其他标签进行 HTMLEncode。
我被困在 HTML 编码部分。 我知道要删除节点,您可以执行“node.ParentNode.RemoveChild(node);” 其中node是HtmlNode类的对象。 不过,我不想删除该节点,而是想对其进行 HTMLEncode。
Has anyone done this? Basically, I want to use the html by keeping basic tags such as h1, h2, em, etc; clean all non http addresses in the img and a tags; and HTMLEncode every other tag.
I'm stuck at the HTML Encoding part. I know to remove a node you do a "node.ParentNode.RemoveChild(node);" where node is the object of the class HtmlNode. Instead of removing the node though, I want to HTMLEncode it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要删除代表您不需要的元素的节点。 然后需要将编码的 HTML 作为文本节点重新添加。
如果您不想处理要丢弃的元素的子元素,您应该能够只使用 OuterHtml ...这样的方法可能会起作用:
You would need to remove the node representing the element you don't want. The encoded HTML would then need to be re-added as a text node.
If you don't want to process the children of the elements that you want to throw away, you should be able to just use OuterHtml ... something like this might work:
上面的答案几乎涵盖了它。 不过,有一点需要补充。
您不想更改特定节点,而是更改所有节点,因此上面的代码可能是一个方法,包含在 if 语句中(以确保它是您想要 HtmlEncode 的标签)。 更重要的是,由于 Agility Pack 不按序数公开节点,因此您无法迭代整个文档。 递归是最简单的方法。 您可能已经知道这一点...
我解决了类似的问题,并且有一些 shell 代码(C#),非常欢迎您使用:http://dev.forrestcroce.com/normalizer-of-web-pages-qualifier-of-urls/2008- 12-09/
The answer above pretty much covers it. There's one thing to add, though.
You don't want to change a particular node, but all of them, so the code above will probably be a method, wrapped in an if statement ( to make sure it's a tag you want to HtmlEncode ). More to the point, since Agility Pack doesn't expose nodes by ordinal, you can't iterate the entire document. Recursion is the easiest way to go about it. You probably already know this...
I tackled a similar problem, and have some shell code (C#) you're more than welcome to use: http://dev.forrestcroce.com/normalizer-of-web-pages-qualifier-of-urls/2008-12-09/