避免 XmlDocument 在 C# 中验证命名空间
我正在尝试找到一种缩进 HTML 文件的方法,我一直在使用 XMLDocument 并且只使用 XmlTextWriter。
但是我无法正确格式化 HTML 文档,因为它会检查文档类型并尝试下载它。
是否存在不验证或检查文档并尽力缩进的“哑”缩进机制?这些文件的大小为 4-10Mb,并且它们是自动生成的,我们必须在内部处理它 - 没关系,用户可以等待,我只是想避免分叉到新进程等。
这是我的参考代码
using (MemoryStream ms = new MemoryStream())
using (XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.Unicode))
{
XmlDocument doc = new XmlDocument();
// LoadSettings the unformatted XML text string into an instance
// of the XML Document Object Model (DOM)
doc.LoadXml(content);
// Set the formatting property of the XML Text Writer to indented
// the text writer is where the indenting will be performed
xtw.Formatting = Formatting.Indented;
// write dom xml to the xmltextwriter
doc.WriteContentTo(xtw);
// Flush the contents of the text writer
// to the memory stream, which is simply a memory file
xtw.Flush();
// set to start of the memory stream (file)
ms.Seek(0, SeekOrigin.Begin);
// create a reader to read the contents of
// the memory stream (file)
using (StreamReader sr = new StreamReader(ms))
return sr.ReadToEnd();
}
本质上,现在我使用 MemoryStream、XmlTextWriter 和 XmlDocument,一旦缩进,我就会从 MemoryStream 读回它并将其作为字符串返回。 XHTML 文档和某些 HTML 4 文档会发生失败,因为它试图获取 dtd。我尝试将 XmlResolver 设置为 null 但无济于事:(
I'm trying to find a way of indenting a HTML file, I've been using XMLDocument and just using a XmlTextWriter.
However I am unable to format it correctly for HTML documents because it checks the doctype and tries to download it.
Is there a "dumb" indenting mechanism that doesnt validate or check the document and does a best effort indentation? The files are 4-10Mb in size and they are autogenerated, we have to handle it internal - its fine, the user can wait, I just want to avoid forking to a new process etc.
Here's my code for reference
using (MemoryStream ms = new MemoryStream())
using (XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.Unicode))
{
XmlDocument doc = new XmlDocument();
// LoadSettings the unformatted XML text string into an instance
// of the XML Document Object Model (DOM)
doc.LoadXml(content);
// Set the formatting property of the XML Text Writer to indented
// the text writer is where the indenting will be performed
xtw.Formatting = Formatting.Indented;
// write dom xml to the xmltextwriter
doc.WriteContentTo(xtw);
// Flush the contents of the text writer
// to the memory stream, which is simply a memory file
xtw.Flush();
// set to start of the memory stream (file)
ms.Seek(0, SeekOrigin.Begin);
// create a reader to read the contents of
// the memory stream (file)
using (StreamReader sr = new StreamReader(ms))
return sr.ReadToEnd();
}
Essentially, right now I use a MemoryStream, XmlTextWriter and XmlDocument, once indented I read it back from the MemoryStream and return it as a string. Failures happen for XHTML documents and some HTML 4 documents because its trying to grab the dtds. I tried setting XmlResolver as null but to no avail :(
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果无法访问导致问题的特定 X[H]TML,则很难知道这是否有效,但您是否尝试过使用
XDocument
来代替?Without access to the specific X[H]TML causing the problems, it's hard to know if this will work, but have you tried using
XDocument
instead?