避免 XmlDocument 在 C# 中验证命名空间

发布于 2024-08-30 13:25:35 字数 1518 浏览 13 评论 0原文

我正在尝试找到一种缩进 HTML 文件的方法,我一直在使用 XMLDocument 并且只使用 XmlTextWriter。

但是我无法正确格式化 HTML 文档,因为它会检查文档类型并尝试下载它。

是否存在不验证或检查文档并尽力缩进的“哑”缩进机制?这些文件的大小为 4-10Mb,并且它们是自动生成的,我们必须在内部处理它 - 没关系,用户可以等待,我只是想避免分叉到新进程等。

这是我的参考代码

        using (MemoryStream ms = new MemoryStream())
        using (XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.Unicode))
        {
            XmlDocument doc = new XmlDocument();
            // LoadSettings the unformatted XML text string into an instance
            // of the XML Document Object Model (DOM)
            doc.LoadXml(content);

            // Set the formatting property of the XML Text Writer to indented
            // the text writer is where the indenting will be performed
            xtw.Formatting = Formatting.Indented;

            // write dom xml to the xmltextwriter
            doc.WriteContentTo(xtw);

            // Flush the contents of the text writer
            // to the memory stream, which is simply a memory file
            xtw.Flush();

            // set to start of the memory stream (file)
            ms.Seek(0, SeekOrigin.Begin);

            // create a reader to read the contents of
            // the memory stream (file)
            using (StreamReader sr = new StreamReader(ms))
                return sr.ReadToEnd();
        }

本质上,现在我使用 MemoryStream、XmlTextWriter 和 XmlDocument,一旦缩进,我就会从 MemoryStream 读回它并将其作为字符串返回。 XHTML 文档和某些 HTML 4 文档会发生失败,因为它试图获取 dtd。我尝试将 XmlResolver 设置为 null 但无济于事:(

I'm trying to find a way of indenting a HTML file, I've been using XMLDocument and just using a XmlTextWriter.

However I am unable to format it correctly for HTML documents because it checks the doctype and tries to download it.

Is there a "dumb" indenting mechanism that doesnt validate or check the document and does a best effort indentation? The files are 4-10Mb in size and they are autogenerated, we have to handle it internal - its fine, the user can wait, I just want to avoid forking to a new process etc.

Here's my code for reference

        using (MemoryStream ms = new MemoryStream())
        using (XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.Unicode))
        {
            XmlDocument doc = new XmlDocument();
            // LoadSettings the unformatted XML text string into an instance
            // of the XML Document Object Model (DOM)
            doc.LoadXml(content);

            // Set the formatting property of the XML Text Writer to indented
            // the text writer is where the indenting will be performed
            xtw.Formatting = Formatting.Indented;

            // write dom xml to the xmltextwriter
            doc.WriteContentTo(xtw);

            // Flush the contents of the text writer
            // to the memory stream, which is simply a memory file
            xtw.Flush();

            // set to start of the memory stream (file)
            ms.Seek(0, SeekOrigin.Begin);

            // create a reader to read the contents of
            // the memory stream (file)
            using (StreamReader sr = new StreamReader(ms))
                return sr.ReadToEnd();
        }

Essentially, right now I use a MemoryStream, XmlTextWriter and XmlDocument, once indented I read it back from the MemoryStream and return it as a string. Failures happen for XHTML documents and some HTML 4 documents because its trying to grab the dtds. I tried setting XmlResolver as null but to no avail :(

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

画▽骨i 2024-09-06 13:25:35

如果无法访问导致问题的特定 X[H]TML,则很难知道这是否有效,但您是否尝试过使用 XDocument 来代替?

XDocument xdoc = XDocument.Parse(xml);
string formatted = xdoc.ToString();

Without access to the specific X[H]TML causing the problems, it's hard to know if this will work, but have you tried using XDocument instead?

XDocument xdoc = XDocument.Parse(xml);
string formatted = xdoc.ToString();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文