如何使用 DOM 解析器解析 xhtml 并忽略 DOCTYPE 声明

发布于 2024-08-29 06:30:54 字数 546 浏览 6 评论 0原文

我在使用 DOM 解析器解析带有 DOCTYPE 声明的 xhtml 时遇到问题。

错误: java.io.IOException:服务器返回 URL 的 HTTP 响应代码:503: http://www.w3.org/TR/xhtml1/ DTD/xhtml1-transitional.dtd%20

声明:DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" “http://www.w3.org/TR/xhtml1/DTD /xhtml1-transitional.dtd

有没有办法将 xhtml 解析为 Document 对象,忽略 DOCTYPE 声明。

I face issue parsing xhtml with DOCTYPE declaration using DOM parser.

Error:
java.io.IOException: Server returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd%20

Declaration: DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

Is there a way to parse the xhtml to a Document object ignoring the DOCTYPE declaration.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

Smile简单爱 2024-09-05 06:30:54

对我有用的解决方案是为 DocumentBuilder 提供一个返回空流的假解析器。这里有一个很好的解释(查看 kdgregory 的最后一条消息)

http://forums。 sun.com/thread.jspa?threadID=5362097

这是 kdgregory 的解决方案:

documentBuilder.setEntityResolver(new EntityResolver()
        {
            public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException
            {
                return new InputSource(new StringReader(""));
            }
        });

A solution that works for me is to give the DocumentBuilder a fake Resolver that returns an empty stream. There's a good explanation here (look at the last message from kdgregory)

http://forums.sun.com/thread.jspa?threadID=5362097

here's kdgregory's solution:

documentBuilder.setEntityResolver(new EntityResolver()
        {
            public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException
            {
                return new InputSource(new StringReader(""));
            }
        });
陌若浮生 2024-09-05 06:30:54

下载 DTD 需要解析器,但您可以通过在 行上设置独立属性来绕过它。

但请注意,此特定错误很可能是由 XML 模式定义和 DTD URL 之间的混淆引发的。有关详细信息,请参阅 http://www.w3schools.com/xhtml/xhtml_dtd.asp 。正确的是:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The parser is required to download the DTD, but you may get around it by setting the standalone attribute on the <?xml... ?> line.

Note however, that this particular error is most likely triggered by a confusion between XML Schema definitions and DTD URL's. See http://www.w3schools.com/xhtml/xhtml_dtd.asp for details. The right one is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
童话 2024-09-05 06:30:54

最简单的方法是在 DocumentBuilderFactory 中设置 validating=false 。如果您想进行验证,请下载 DTD 并使用本地副本。正如 Rachel 上面评论的那样,这个问题在 WWW 上进行了讨论

简而言之,因为默认的 DocumentBuilderFactory 每次验证时都会下载 DTD,所以每次典型程序员尝试用 Java 解析 XHTML 文件时,W3 都会受到影响。他们无法承受那么多流量,因此会返回错误。

The easiest thing to do is to set validating=false in your DocumentBuilderFactory. If you want to do validation, download the DTD and use a local copy. As commented by Rachel above, this is discussed at The WWW Consortium.

In short, because the default DocumentBuilderFactory downloads the DTD every time it validates, the W3 was getting hit every time a typical programmer tried to parse an XHTML file in Java. They can't afford that much traffic, so they respond with an error.

夏有森光若流苏 2024-09-05 06:30:54

以下代码片段指示解析器真正忽略 DOCTYPE 声明中的外部 DTD,而不是假解析器:

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;

(...)

DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setValidating(false);
f.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = f.newDocumentBuilder();
Document document = builder.parse( ... )

Instead of the fake resolver, the following code snippet instructs the parser to really ignore the external DTD from the DOCTYPE declaration:

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;

(...)

DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setValidating(false);
f.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = f.newDocumentBuilder();
Document document = builder.parse( ... )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文