如何将 HTML 字符串加载到 Webkit.net 中,以便可以访问其“DOM”

发布于 2024-09-26 13:19:04 字数 1382 浏览 1 评论 0原文

我想使用 Webkit.net 加载 (X)HTML 字符串,然后分析 DOM为了“压缩”它(删除空格、换行符、将 转换为 < input> (基本上是 XHTML 到 HTML 的转换,doctype 允许)。

如果没有,是否有任何 .net HTML 解析器可以做到这一点?如果没有,是否有一个 .net 组件已经完成了我所要求的操作?

一些伪代码解释了我想要做什么:

var DOM = Webkit.DOM.FromString("<!DOCTYPE HTML><html><head><title> Hello</title></head><body><INPUT Value="Click here"  type="submit" /><br /><span class='bold red'>An element!</span><script type='text-javascript'>/*do stuff*/</script>  <script>/*do more stuff*/</script></body></html>");

var sb = new StringBuilder();

// this would recursively iterate over all childnodes in a real scenario.
foreach(var node in DOM.Nodes){
    sb.Append(/* Compress & sort attributes, normalize & strip unneeded quotes, remove unneeded end & self-closing tags, etc. */);
}

// return optimally compressed output...
// something like:
// <!doctype html><title>Hello</title><input type=submit value="Click here"><br><span class="bold red">An element!</span><script>/*do stuff*/</script><script>/*do more stuff*/</script>
return sb.ToString();

I'd like to use Webkit.net to load an (X)HTML string and then analyze the DOM in order to "compress" it (remove whitespace, newlines, convert <input></input> and <input /> to <input> (basically an XHTML to HTML conversion, doctype allowing).

Is there anyway to do get the "DOM tree" in webkit.net? If not, are there any .net HTML parsers out there that can do this? If not, is there a .net component that already does what I'm asking?

Some Pseudo-code explaining what I'd like to do:

var DOM = Webkit.DOM.FromString("<!DOCTYPE HTML><html><head><title> Hello</title></head><body><INPUT Value="Click here"  type="submit" /><br /><span class='bold red'>An element!</span><script type='text-javascript'>/*do stuff*/</script>  <script>/*do more stuff*/</script></body></html>");

var sb = new StringBuilder();

// this would recursively iterate over all childnodes in a real scenario.
foreach(var node in DOM.Nodes){
    sb.Append(/* Compress & sort attributes, normalize & strip unneeded quotes, remove unneeded end & self-closing tags, etc. */);
}

// return optimally compressed output...
// something like:
// <!doctype html><title>Hello</title><input type=submit value="Click here"><br><span class="bold red">An element!</span><script>/*do stuff*/</script><script>/*do more stuff*/</script>
return sb.ToString();

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

隔纱相望 2024-10-03 13:19:04

没有使用过 Webkit.Net,但我使用过 HTMLAgilityPack 来完成与您想要的类似的任务,并且效果非常好。所以我认为你回答了你自己的问题。

Haven't used Webkit.Net but I have used HTMLAgilityPack to do a similar task to the one you have in mind and it works very well. So I think you answered your own question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文