在 XOM 中解析 XHTML 文档时出现 DTD 下载错误
我正在尝试使用声明使用的 doctype 来解析 HTML 文档 过渡 dtd 如下:
http://www.w3.org/TR/xhtml1/DTD /xhtml1-transitional.dtd">
当我对文档执行 Builder.build 时,出现以下异常:
java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1305)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at nu.xom.Builder.build(Builder.java:1127)
at nu.xom.Builder.build(Builder.java:1019)
如果删除 doc 类型声明,它解析得很好。 我可以 成功从我的浏览器下载 dtd,这告诉我 网址有效。 我不想删除文档类型声明。 是 有一种方法告诉构建者不要下载 dtd 或提供它 有替代的 dtd 吗?
I am trying to parse an HTML document with the doctype declared to use
the transitional dtd as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
When I do Builder.build on the document, I get the following exception:
java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1305)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at nu.xom.Builder.build(Builder.java:1127)
at nu.xom.Builder.build(Builder.java:1019)
If I remove the doc type declaration, it parses just fine. I can
successfully download the dtd from my browser, which tells me that the
url is valid. I don't want to remove the doc type declaration. Is
there a way tell the builder not to download the dtd or provide it
with an alternate dtd?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这解决了这个问题:
This solves the problem:
快速浏览一下 Builder,我想你可以提供一个 EntityResolver 通过采用 XMLReader。 我会尽可能避免让解析器从互联网下载文件。
Taking a quick look at the javadoc for Builder, I guess you could provide an EntityResolver via the constructor that takes a XMLReader. I would avoid letting the parser download files from the internet where possible.