HTML Tidy 的 C# 版本?

发布于 2024-09-28 20:23:35 字数 1015 浏览 3 评论 0原文

我只是在寻找一种非常简单的方法来清理一些 HTML(可能使用嵌入的 JavaScript 代码)。我尝试了两个 不同 HTML Tidy .NET 端口并且都抛出异常...

抱歉,我所说的“干净”是指“缩进”。 HTML 根本没有格式错误。它是严格的XHTML


终于得到了一些与SGML一起使用的东西,但这确实是最重要的荒谬的代码块竟然缩进了一些 HTML。

private static string FormatHtml(string input)
{
    var sgml = new SgmlReader {DocType = "HTML", InputStream = new StringReader(input)};
    using (var sw = new StringWriter())
    using (var xw = new XmlTextWriter(sw) { Indentation = 2, Formatting = Formatting.Indented })
    {
        sgml.Read();
        while (!sgml.EOF)
            xw.WriteNode(sgml, true);
    }
    return sw.ToString();
}

I am just looking for a really easy way to clean up some HTML (possibly with embedded JavaScript code). I tried two different HTML Tidy .NET ports and both are throwing exceptions...

Sorry, by "clean" I mean "indent". The HTML is not malformed, at all. It's XHTML strict.


I finally got something working with SGML, but this is seriously the most ridiculous chunk of code ever to indent some HTML.

private static string FormatHtml(string input)
{
    var sgml = new SgmlReader {DocType = "HTML", InputStream = new StringReader(input)};
    using (var sw = new StringWriter())
    using (var xw = new XmlTextWriter(sw) { Indentation = 2, Formatting = Formatting.Indented })
    {
        sgml.Read();
        while (!sgml.EOF)
            xw.WriteNode(sgml, true);
    }
    return sw.ToString();
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

伴梦长久 2024-10-05 20:23:35

AngleSharp 100% c#

    var parser = new HtmlParser();
    
    var document = parser.ParseDocument("<html><head></head><body><i></i></body></html>");

    var sw = new StringWriter();
    document.ToHtml(sw, new PrettyMarkupFormatter());

    var HTML_prettified = sw.ToString();

由塞巴斯蒂安编辑:

 //old parse method
 var document = parser.Parse("<html><head></head><body><i></i></body></html>");

 //new parse method (for AngleSharp 0.16.1): 
 var document = await parser.ParseDocumentAsync(Code); 
 

AngleSharp 100% c#

    var parser = new HtmlParser();
    
    var document = parser.ParseDocument("<html><head></head><body><i></i></body></html>");

    var sw = new StringWriter();
    document.ToHtml(sw, new PrettyMarkupFormatter());

    var HTML_prettified = sw.ToString();

edit by sebastian :

 //old parse method
 var document = parser.Parse("<html><head></head><body><i></i></body></html>");

 //new parse method (for AngleSharp 0.16.1): 
 var document = await parser.ParseDocumentAsync(Code); 
 
满身野味 2024-10-05 20:23:35

HTML Tidy 的最新 C# 包装器是由 Mark Beaton 完成的,它看起来比您引用的链接 (2003) 更新得多。另外值得注意的是,Mark 还提供了可执行文件供参考,而不是从官方网站获取它们。这应该能够很好地组织和验证您的 HTML

The latest C# wrapper for HTML Tidy was done by Mark Beaton, which seems rather more up-to-date than the links you've referenced (2003). Also worth of note is that Mark provides executables for referencing as well, rather than pulling them from the official site. That should do the trick of nicely organising and validating your HTML.

层林尽染 2024-10-05 20:23:35

我使用 SGML Reader 来将 HTML 转换为 XHTML 过去。可能值得研究一下......

我在使用它时从未遇到过任何问题。

I've used SGML Reader to convert HTML to XHTML in the past. Might be worth looking into...

I never had any problems with it when I was using it.

眸中客 2024-10-05 20:23:35

更新

检查HtmlTextWriter 或 XhtmlTextWriter,用法:使用 HtmlTextWriter 格式化 Html 输出,也许 通过HtmlTextWriter构建HTML会更好吗?

另请检查: LINQ & Lambda,第 3 部分:Html 敏捷包到 LINQ to XML 转换器

http://www.manoli.net /csharpformat/,这里源代码以防您错过。


也许你想自己做?这个项目可能会有所帮助:Html Agility Pack

Html Agility Pack (HAP) 到底是什么?

这是一个敏捷的 HTML 解析器,它构建一个读/写 DOM 并支持普通的 XPATH 或 XSLT(实际上您不必了解 XPATH 或 XSLT 即可使用它,不用担心......)。它是一个 .NET 代码库,允许您解析“网络外”HTML 文件。解析器对“现实世界”格式错误的 HTML 非常宽容。该对象模型与 System.Xml 的建议非常相似,但适用于 HTML 文档(或流)。

Html Agility Pack 现在支持 Linq to Objects(通过 LINQ to Xml Like 接口)。查看新测试版以使用此功能

示例应用程序:

  • 页面修复或生成。你可以
    按照您想要的方式修复页面,修改
    DOM,添加节点,复制节点,
    好吧……你说吧。

  • 网络扫描仪。
    您可以轻松访问 img/src 或
    a/hrefs 带有一堆 XPATH 查询。

  • 网络抓取工具。你可以轻松报废
    将任何现有网页转化为 RSS
    例如,仅使用 XSLT 的 feed
    文件作为绑定。一个
    提供了这样的示例。


您也可以尝试以下实现:A Managed Wrapper for the HTML Tidy library

UPDATE:

Check HtmlTextWriter or XhtmlTextWriter, usage: Formatting Html Output with HtmlTextWriter, maybe HTML construction via HtmlTextWriter will be better?

Also check : LINQ & Lambda, Part 3: Html Agility Pack to LINQ to XML Converter

http://www.manoli.net/csharpformat/, here source code in case you miss it.


Maybe you want to do it yourself? This project can be helpful: Html Agility Pack

What is exactly the Html Agility Pack (HAP)?

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Html Agility Pack now supports Linq to Objects (via a LINQ to Xml Like interface). Check out the new beta to play with this feature

Sample applications:

  • Page fixing or generation. You can
    fix a page the way you want, modify
    the DOM, add nodes, copy nodes,
    well... you name it.

  • Web scanners.
    You can easily get to img/src or
    a/hrefs with a bunch XPATH queries.

  • Web scrapers. You can easily scrap
    any existing web page into an RSS
    feed for example, with just an XSLT
    file serving as the binding. An
    example of this is provided.


Also you can try this implementation: A managed wrapper for the HTML Tidy library

離人涙 2024-10-05 20:23:35

您可以使用HtmlAgilityPack(从nuget添加此包)。

代码示例:

string html = "<div><p>line 1<br>line 2</p><span></div>";
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(description);
var fixedHtml = htmlDoc.DocumentNode.OuterHtml;

输出:

<div><p>line 1<br />line 2</p><span></span></div>

You can use HtmlAgilityPack (add this package from nuget).

Code sample:

string html = "<div><p>line 1<br>line 2</p><span></div>";
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(description);
var fixedHtml = htmlDoc.DocumentNode.OuterHtml;

Output:

<div><p>line 1<br />line 2</p><span></span></div>
攒眉千度 2024-10-05 20:23:35

Beautifier提供html我用的是html-beautify。
例如

const beautified = html_beautify("<div><p></p></div>");
console.log(beautified)
<script src="https://cdnjs.cloudflare.com/ajax/libs/js-beautify/1.14.0/beautify-html.min.js"></script>

Beautifier provides html I used html-beautify.
for example

const beautified = html_beautify("<div><p></p></div>");
console.log(beautified)
<script src="https://cdnjs.cloudflare.com/ajax/libs/js-beautify/1.14.0/beautify-html.min.js"></script>

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文