如何使用 DOM 解析器解析 xhtml 并忽略 DOCTYPE 声明

发布于 2024-08-29 06:30:54 字数 546 浏览 9 评论 0原文

我在使用 DOM 解析器解析带有 DOCTYPE 声明的 xhtml 时遇到问题。

错误： java.io.IOException：服务器返回 URL 的 HTTP 响应代码：503： http://www.w3.org/TR/xhtml1/ DTD/xhtml1-transitional.dtd%20

声明：DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" “http://www.w3.org/TR/xhtml1/DTD /xhtml1-transitional.dtd

有没有办法将 xhtml 解析为 Document 对象，忽略 DOCTYPE 声明。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

Smile简单爱 2024-09-05 06:30:54

对我有用的解决方案是为 DocumentBuilder 提供一个返回空流的假解析器。这里有一个很好的解释（查看 kdgregory 的最后一条消息）

http://forums。 sun.com/thread.jspa?threadID=5362097

这是 kdgregory 的解决方案：

documentBuilder.setEntityResolver(new EntityResolver()
        {
            public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException
            {
                return new InputSource(new StringReader(""));
            }
        });

A solution that works for me is to give the DocumentBuilder a fake Resolver that returns an empty stream. There's a good explanation here (look at the last message from kdgregory)

http://forums.sun.com/thread.jspa?threadID=5362097

here's kdgregory's solution:

documentBuilder.setEntityResolver(new EntityResolver()
        {
            public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException
            {
                return new InputSource(new StringReader(""));
            }
        });

回复收藏 0 原文

陌若浮生 2024-09-05 06:30:54

下载 DTD 需要解析器，但您可以通过在行上设置独立属性来绕过它。

但请注意，此特定错误很可能是由 XML 模式定义和 DTD URL 之间的混淆引发的。有关详细信息，请参阅 http://www.w3schools.com/xhtml/xhtml_dtd.asp 。正确的是：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The parser is required to download the DTD, but you may get around it by setting the standalone attribute on the <?xml... ?> line.

Note however, that this particular error is most likely triggered by a confusion between XML Schema definitions and DTD URL's. See http://www.w3schools.com/xhtml/xhtml_dtd.asp for details. The right one is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

回复收藏 0 原文

童话 2024-09-05 06:30:54

最简单的方法是在 DocumentBuilderFactory 中设置 validating=false 。如果您想进行验证，请下载 DTD 并使用本地副本。正如 Rachel 上面评论的那样，这个问题在 WWW 上进行了讨论

简而言之，因为默认的 DocumentBuilderFactory 每次验证时都会下载 DTD，所以每次典型程序员尝试用 Java 解析 XHTML 文件时，W3 都会受到影响。他们无法承受那么多流量，因此会返回错误。

回复收藏 0 原文

夏有森光若流苏 2024-09-05 06:30:54

以下代码片段指示解析器真正忽略 DOCTYPE 声明中的外部 DTD，而不是假解析器：

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;

(...)

DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setValidating(false);
f.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = f.newDocumentBuilder();
Document document = builder.parse( ... )

Instead of the fake resolver, the following code snippet instructs the parser to really ignore the external DTD from the DOCTYPE declaration:

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;

(...)

DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setValidating(false);
f.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = f.newDocumentBuilder();
Document document = builder.parse( ... )

回复收藏 0 原文

~没有更多了~