如何使用 DOM 解析器解析 xhtml 并忽略 DOCTYPE 声明
我在使用 DOM 解析器解析带有 DOCTYPE 声明的 xhtml 时遇到问题。
错误: java.io.IOException:服务器返回 URL 的 HTTP 响应代码:503: http://www.w3.org/TR/xhtml1/ DTD/xhtml1-transitional.dtd%20
声明:DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" “http://www.w3.org/TR/xhtml1/DTD /xhtml1-transitional.dtd
有没有办法将 xhtml 解析为 Document 对象,忽略 DOCTYPE 声明。
I face issue parsing xhtml with DOCTYPE declaration using DOM parser.
Error:
java.io.IOException: Server returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd%20
Declaration: DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Is there a way to parse the xhtml to a Document object ignoring the DOCTYPE declaration.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
对我有用的解决方案是为 DocumentBuilder 提供一个返回空流的假解析器。这里有一个很好的解释(查看 kdgregory 的最后一条消息)
http://forums。 sun.com/thread.jspa?threadID=5362097
这是 kdgregory 的解决方案:
A solution that works for me is to give the DocumentBuilder a fake Resolver that returns an empty stream. There's a good explanation here (look at the last message from kdgregory)
http://forums.sun.com/thread.jspa?threadID=5362097
here's kdgregory's solution:
下载 DTD 需要解析器,但您可以通过在
行上设置独立属性来绕过它。
但请注意,此特定错误很可能是由 XML 模式定义和 DTD URL 之间的混淆引发的。有关详细信息,请参阅 http://www.w3schools.com/xhtml/xhtml_dtd.asp 。正确的是:
The parser is required to download the DTD, but you may get around it by setting the standalone attribute on the
<?xml... ?>
line.Note however, that this particular error is most likely triggered by a confusion between XML Schema definitions and DTD URL's. See http://www.w3schools.com/xhtml/xhtml_dtd.asp for details. The right one is:
最简单的方法是在 DocumentBuilderFactory 中设置 validating=false 。如果您想进行验证,请下载 DTD 并使用本地副本。正如 Rachel 上面评论的那样,这个问题在 WWW 上进行了讨论
简而言之,因为默认的 DocumentBuilderFactory 每次验证时都会下载 DTD,所以每次典型程序员尝试用 Java 解析 XHTML 文件时,W3 都会受到影响。他们无法承受那么多流量,因此会返回错误。
The easiest thing to do is to set validating=false in your DocumentBuilderFactory. If you want to do validation, download the DTD and use a local copy. As commented by Rachel above, this is discussed at The WWW Consortium.
In short, because the default DocumentBuilderFactory downloads the DTD every time it validates, the W3 was getting hit every time a typical programmer tried to parse an XHTML file in Java. They can't afford that much traffic, so they respond with an error.
以下代码片段指示解析器真正忽略 DOCTYPE 声明中的外部 DTD,而不是假解析器:
Instead of the fake resolver, the following code snippet instructs the parser to really ignore the external DTD from the DOCTYPE declaration: