.NET:阻止 XmlDocument.LoadXml 检索 DTD

发布于 2024-10-08 00:04:51 字数 1892 浏览 8 评论 0原文

我有以下代码(C#),它花费了太长的时间并且抛出异常:

new XmlDocument().
LoadXml("<?xml version='1.0' ?><!DOCTYPE note SYSTEM 'http://someserver/dtd'><note></note>");

我明白它为什么这样做。我的问题是如何让它停止?我不关心 DTD 验证。我想我可以用正则表达式替换它,但我正在寻找更优雅的解决方案。

背景:
实际的 XML 是从一个不属于我的网站接收的。当站点正在进行维护时,它会返回带有 DOCTYPE 的 XML,该 DOCTYPE 指向维护期间不可用的 DTD。因此,我的服务变得不必要的缓慢,因为它尝试为我需要解析的每个 XML 获取 DTD。

这是异常堆栈:

Unhandled Exception: System.Net.WebException: The remote name could not be resolved: 'someserver'
at System.Net.HttpWebRequest.GetResponse()
at System.Xml.XmlDownloadManager.GetNonFileStream(Uri uri, ICredentials credentials)
at System.Xml.XmlDownloadManager.GetStream(Uri uri, ICredentials credentials)
at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri, String role, Type ofObjectToReturn)
at System.Xml.XmlTextReaderImpl.OpenStream(Uri uri)
at System.Xml.XmlTextReaderImpl.DtdParserProxy_PushExternalSubset(String systemId, String publicId)
at System.Xml.XmlTextReaderImpl.DtdParserProxy.System.Xml.IDtdParserAdapter.PushExternalSubset(String systemId, String publicId)
at System.Xml.DtdParser.ParseExternalSubset()
at System.Xml.DtdParser.ParseInDocumentDtd(Boolean saveInternalSubset)
at System.Xml.DtdParser.Parse(Boolean saveInternalSubset)
at System.Xml.XmlTextReaderImpl.DtdParserProxy.Parse(Boolean saveInternalSubset)
at System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at ConsoleApplication36.Program.Main(String[] args) in c:\Projects\temp\ConsoleApplication36\Program.cs:line 11

I have following code (C#), it takes too long and it throws exception:

new XmlDocument().
LoadXml("<?xml version='1.0' ?><!DOCTYPE note SYSTEM 'http://someserver/dtd'><note></note>");

I understand why it does that. My question is how do I make it stop? I don't care about DTD validation. I suppose I could just regex-replace it, but I am looking for more elegant solution.

Background:
The actual XML is received from a web site I do not own. When site is undergoing maintenance it returns XML with DOCTYPE that points to the DTD that's not available during maintenance. So my service gets unnecessary slow because it tries to get DTD for each XML I need to parse.

Here is exception stack:

Unhandled Exception: System.Net.WebException: The remote name could not be resolved: 'someserver'
at System.Net.HttpWebRequest.GetResponse()
at System.Xml.XmlDownloadManager.GetNonFileStream(Uri uri, ICredentials credentials)
at System.Xml.XmlDownloadManager.GetStream(Uri uri, ICredentials credentials)
at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri, String role, Type ofObjectToReturn)
at System.Xml.XmlTextReaderImpl.OpenStream(Uri uri)
at System.Xml.XmlTextReaderImpl.DtdParserProxy_PushExternalSubset(String systemId, String publicId)
at System.Xml.XmlTextReaderImpl.DtdParserProxy.System.Xml.IDtdParserAdapter.PushExternalSubset(String systemId, String publicId)
at System.Xml.DtdParser.ParseExternalSubset()
at System.Xml.DtdParser.ParseInDocumentDtd(Boolean saveInternalSubset)
at System.Xml.DtdParser.Parse(Boolean saveInternalSubset)
at System.Xml.XmlTextReaderImpl.DtdParserProxy.Parse(Boolean saveInternalSubset)
at System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at ConsoleApplication36.Program.Main(String[] args) in c:\Projects\temp\ConsoleApplication36\Program.cs:line 11

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

ㄖ落Θ余辉 2024-10-15 00:04:51

那么,在 .NET 4.0 中,XmlTextReader 有一个名为 DtdProcessing 的属性。当设置为 DtdProcessing.Ignore 时,它​​应该禁用 DTD 处理。

Well, in .NET 4.0 XmlTextReader has a property called DtdProcessing. When set to DtdProcessing.Ignore it should disable DTD processing.

花开半夏魅人心 2024-10-15 00:04:51

在 .net 4.5.1 中,我没有运气将 doc.XmlResolver 设置为 null。

对我来说最简单的解决方法是在调用 LoadXml() 之前使用字符串替换将“xmlns=”更改为“ignore=”,例如

var responseText = await response.Content.ReadAsStringAsync();
responseText = responseText.Replace("xmlns=", "ignore=");
try
{
    var doc = new XmlDocument();
    doc.LoadXml(responseText);
    ...
}

In .net 4.5.1 I had no luck setting doc.XmlResolver to null.

The easiest fix for me was to use a string replacement to change "xmlns=" to "ignore=" before calling LoadXml(), e.g.

var responseText = await response.Content.ReadAsStringAsync();
responseText = responseText.Replace("xmlns=", "ignore=");
try
{
    var doc = new XmlDocument();
    doc.LoadXml(responseText);
    ...
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文