HtmlAgilityPack:如何创建缩进的 HTML?

发布于 2024-11-05 22:36:11 字数 848 浏览 1 评论 0原文

因此,我使用 HtmlAgilityPack 生成 html,它工作正常,但 html 文本没有缩进。不过,我可以获得缩进的 XML,但我需要 HTML。有办法吗?

HtmlDocument doc = new HtmlDocument();

// gen html
HtmlNode table = doc.CreateElement("table");
table.Attributes.Add("class", "tableClass");
HtmlNode tr = doc.CreateElement("tr");
table.ChildNodes.Append(tr);
HtmlNode td = doc.CreateElement("td");
td.InnerHtml = "—";
tr.ChildNodes.Append(td);

// write text, no indent :(
using(StreamWriter sw = new StreamWriter("table.html"))
{
        table.WriteTo(sw);
}

// write xml, nicely indented but it's XML!
XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.Indent = true;
settings.ConformanceLevel = ConformanceLevel.Fragment;
using (XmlWriter xw = XmlTextWriter.Create("table.xml", settings))
{
        table.WriteTo(xw);
}

So, I am generating html using HtmlAgilityPack and it's working perfectly, but html text is not indented. I can get indented XML however, but I need HTML. Is there a way?

HtmlDocument doc = new HtmlDocument();

// gen html
HtmlNode table = doc.CreateElement("table");
table.Attributes.Add("class", "tableClass");
HtmlNode tr = doc.CreateElement("tr");
table.ChildNodes.Append(tr);
HtmlNode td = doc.CreateElement("td");
td.InnerHtml = "—";
tr.ChildNodes.Append(td);

// write text, no indent :(
using(StreamWriter sw = new StreamWriter("table.html"))
{
        table.WriteTo(sw);
}

// write xml, nicely indented but it's XML!
XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.Indent = true;
settings.ConformanceLevel = ConformanceLevel.Fragment;
using (XmlWriter xw = XmlTextWriter.Create("table.xml", settings))
{
        table.WriteTo(xw);
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

雪落纷纷 2024-11-12 22:36:11

快速、可靠、纯 C#、.NET Core 兼容 AngleSharp

您可以使用 AngleSharp 解析它
它提供了一种自动缩进的方法:

var parser = new HtmlParser();
var document = parser.ParseDocument(text);
using (var writer = new StringWriter())
{
    document.ToHtml(writer, new PrettyMarkupFormatter
                            {
                                Indentation = "\t",
                                NewLine = "\n"
                            });
    var indentedText = writer.ToString();
}

Fast, Reliable, Pure C#, .NET Core compatible AngleSharp

You can parse it with AngleSharp
which provides a way to auto indent:

var parser = new HtmlParser();
var document = parser.ParseDocument(text);
using (var writer = new StringWriter())
{
    document.ToHtml(writer, new PrettyMarkupFormatter
                            {
                                Indentation = "\t",
                                NewLine = "\n"
                            });
    var indentedText = writer.ToString();
}
一杆小烟枪 2024-11-12 22:36:11

不,这是一个“设计使然”的选择。 XML(或 XHTML,是 XML,而不是 HTML)和 HTML 之间有很大的区别,大多数情况下,空格没有特定的含义。

这不是一个很小的改进,因为更改空格可以改变某些浏览器呈现给定 HTML 块的方式,尤其是格式错误的 HTML(通常由库很好地处理)。 Html Agility Pack 的设计目的是保持 HTML 呈现的方式,而不是最小化标记编写的方式。

我并不是说这是不可行或根本不可能的。显然,您可以转换为 XML 并瞧(您可以编写一个扩展方法来简化此操作),但在一般情况下,呈现的输出可能会有所不同。

No, and it's a "by design" choice. There is a big difference between XML (or XHTML, which is XML, not HTML) where - most of the times - whitespaces are no specific meaning, and HTML.

This is not a so minor improvement, as changing whitespaces can change the way some browsers render a given HTML chunk, especially malformed HTML (that is in general well handled by the library). And the Html Agility Pack was designed to keep the way the HTML is rendered, not to minimize the way the markup is written.

I'm not saying it's not feasible or plain impossible. Obviously you can convert to XML and voilà (and you could write an extension method to make this easier) but the rendered output may be different, in the general case.

·深蓝 2024-11-12 22:36:11

据我所知,HtmlAgilityPack 无法做到这一点。但是您可以查看类似问题中提出的 html tidy packs:

As far as I know, HtmlAgilityPack cannot do this. But you could look through html tidy packs which are proposed in similar questions:

诗笺 2024-11-12 22:36:11

我也有同样的经历,尽管 HtmlAgilityPack 非常适合读取和修改 Html(或在我的情况下为 asp)文件,但您无法创建可读输出。

然而,我最终编写了一些对我有用的代码行:

有一个名为“m_htmlDocument”的 HtmlDocument,我创建了 HTML 文件,如下所示

file = new System.IO.StreamWriter(_sFullPath);
            if (m_htmlDocument.DocumentNode != null)
                foreach (var node in m_htmlDocument.DocumentNode.ChildNodes)
                    WriteNode(file, node, 0);

void WriteNode(System.IO.StreamWriter _file, HtmlNode _node, int _indentLevel)
    {
        // check parameter
        if (_file == null) return;
        if (_node == null) return;

        // init 
        string INDENT = " ";
        string NEW_LINE = System.Environment.NewLine;

        // case: no children
        if(_node.HasChildNodes == false)
        {
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);
            _file.Write(_node.OuterHtml);
            _file.Write(NEW_LINE);
        }

        // case: node has childs
        else
        {
            // indent
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);

            // open tag
            _file.Write(string.Format("<{0} ",_node.Name));
            if(_node.HasAttributes)
                foreach(var attr in _node.Attributes)
                    _file.Write(string.Format("{0}=\"{1}\" ", attr.Name, attr.Value));
            _file.Write(string.Format(">{0}",NEW_LINE));

            // childs
            foreach(var chldNode in _node.ChildNodes)
                WriteNode(_file, chldNode, _indentLevel + 1);

            // close tag
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);
            _file.Write(string.Format("</{0}>{1}", _node.Name,NEW_LINE));
        }
    }

I made the same experience even though HtmlAgilityPack is great to read and modify Html (or in my case asp) files you cannot create readable output.

However, I ended up in writing some lines of code which work for me:

Having a HtmlDocument named "m_htmlDocument" I create my HTML file as follows:

file = new System.IO.StreamWriter(_sFullPath);
            if (m_htmlDocument.DocumentNode != null)
                foreach (var node in m_htmlDocument.DocumentNode.ChildNodes)
                    WriteNode(file, node, 0);

and

void WriteNode(System.IO.StreamWriter _file, HtmlNode _node, int _indentLevel)
    {
        // check parameter
        if (_file == null) return;
        if (_node == null) return;

        // init 
        string INDENT = " ";
        string NEW_LINE = System.Environment.NewLine;

        // case: no children
        if(_node.HasChildNodes == false)
        {
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);
            _file.Write(_node.OuterHtml);
            _file.Write(NEW_LINE);
        }

        // case: node has childs
        else
        {
            // indent
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);

            // open tag
            _file.Write(string.Format("<{0} ",_node.Name));
            if(_node.HasAttributes)
                foreach(var attr in _node.Attributes)
                    _file.Write(string.Format("{0}=\"{1}\" ", attr.Name, attr.Value));
            _file.Write(string.Format(">{0}",NEW_LINE));

            // childs
            foreach(var chldNode in _node.ChildNodes)
                WriteNode(_file, chldNode, _indentLevel + 1);

            // close tag
            for (int i = 0; i < _indentLevel; i++)
                _file.Write(INDENT);
            _file.Write(string.Format("</{0}>{1}", _node.Name,NEW_LINE));
        }
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文