如何将 HTML 字符串加载到 Webkit.net 中,以便可以访问其“DOM”
我想使用 Webkit.net 加载 (X)HTML 字符串,然后分析 DOM为了“压缩”它(删除空格、换行符、将 和
转换为
< input>
(基本上是 XHTML 到 HTML 的转换,doctype 允许)。
如果没有,是否有任何 .net HTML 解析器可以做到这一点?如果没有,是否有一个 .net 组件已经完成了我所要求的操作?
一些伪代码解释了我想要做什么:
var DOM = Webkit.DOM.FromString("<!DOCTYPE HTML><html><head><title> Hello</title></head><body><INPUT Value="Click here" type="submit" /><br /><span class='bold red'>An element!</span><script type='text-javascript'>/*do stuff*/</script> <script>/*do more stuff*/</script></body></html>");
var sb = new StringBuilder();
// this would recursively iterate over all childnodes in a real scenario.
foreach(var node in DOM.Nodes){
sb.Append(/* Compress & sort attributes, normalize & strip unneeded quotes, remove unneeded end & self-closing tags, etc. */);
}
// return optimally compressed output...
// something like:
// <!doctype html><title>Hello</title><input type=submit value="Click here"><br><span class="bold red">An element!</span><script>/*do stuff*/</script><script>/*do more stuff*/</script>
return sb.ToString();
I'd like to use Webkit.net to load an (X)HTML string and then analyze the DOM in order to "compress" it (remove whitespace, newlines, convert <input></input>
and <input />
to <input>
(basically an XHTML to HTML conversion, doctype allowing).
Is there anyway to do get the "DOM tree" in webkit.net? If not, are there any .net HTML parsers out there that can do this? If not, is there a .net component that already does what I'm asking?
Some Pseudo-code explaining what I'd like to do:
var DOM = Webkit.DOM.FromString("<!DOCTYPE HTML><html><head><title> Hello</title></head><body><INPUT Value="Click here" type="submit" /><br /><span class='bold red'>An element!</span><script type='text-javascript'>/*do stuff*/</script> <script>/*do more stuff*/</script></body></html>");
var sb = new StringBuilder();
// this would recursively iterate over all childnodes in a real scenario.
foreach(var node in DOM.Nodes){
sb.Append(/* Compress & sort attributes, normalize & strip unneeded quotes, remove unneeded end & self-closing tags, etc. */);
}
// return optimally compressed output...
// something like:
// <!doctype html><title>Hello</title><input type=submit value="Click here"><br><span class="bold red">An element!</span><script>/*do stuff*/</script><script>/*do more stuff*/</script>
return sb.ToString();
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
没有使用过 Webkit.Net,但我使用过 HTMLAgilityPack 来完成与您想要的类似的任务,并且效果非常好。所以我认为你回答了你自己的问题。
Haven't used Webkit.Net but I have used HTMLAgilityPack to do a similar task to the one you have in mind and it works very well. So I think you answered your own question.