解析 XML 时忽略 DTD

发布于 2024-12-14 21:32:20 字数 1263 浏览 1 评论 0原文

使用 XOM xml 库解析文件时如何忽略 DTD 声明。我的文件有以下行:

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
//rest of stuff here 

当我尝试 build() 我的文档时,我收到 DTD 文件的 filenotfound 异常。我知道我没有这个文件并且我不关心它,那么在使用XOM时如何删除它呢?

下面是一个代码片段:

public BlastXMLParser(String filePath) {
    Builder b = new Builder(false);
     //not a good idea to have exception-throwing code in constructor
    try {

        _document = b.build(filePath);
    } catch (ParsingException ex) {
        Logger.getLogger(BlastXMLParser.class.getName()).log(Level.SEVERE,"err", ex);
    } catch (IOException ex) {
        //
    }

private Elements getBlastReads() {
    Element root = _document.getRootElement();
    Elements rootChildren = root.getChildElements();

    for (int i = 0; i < rootChildren.size(); i++) {
        Element child = rootChildren.get(i);
        if (child.getLocalName().equals("BlastOutput_iterations")) {

            return child.getChildElements();
        }
    }

    return null;
}
}

我在这一行收到 NullPointerException:

Element root = _document.getRootElement();

从源 XML 文件中删除 DTD 行后,我可以成功解析它,但这不是最终生产系统中的选项。

How can I ignore the DTD declaration when parsing file with XOM xml library. My file has the following line :

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
//rest of stuff here 

And when I try to build() my document I get a filenotfound exception for the DTD file. I know I don't have this file and I don't care about it, so how can it be removed when using XOM?

Here is a code snippet:

public BlastXMLParser(String filePath) {
    Builder b = new Builder(false);
     //not a good idea to have exception-throwing code in constructor
    try {

        _document = b.build(filePath);
    } catch (ParsingException ex) {
        Logger.getLogger(BlastXMLParser.class.getName()).log(Level.SEVERE,"err", ex);
    } catch (IOException ex) {
        //
    }

private Elements getBlastReads() {
    Element root = _document.getRootElement();
    Elements rootChildren = root.getChildElements();

    for (int i = 0; i < rootChildren.size(); i++) {
        Element child = rootChildren.get(i);
        if (child.getLocalName().equals("BlastOutput_iterations")) {

            return child.getChildElements();
        }
    }

    return null;
}
}

I get a NullPointerException at this line:

Element root = _document.getRootElement();

With the DTD line removed from the source XML file I can successfully parse it, but this is not an option in the final production system.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

苦行僧 2024-12-21 21:32:21

首选解决方案是实现 EntityResolver 拦截 DTD 请求并将这些请求重定向到嵌入副本。如果您

  1. 无权访问 DTD 并且
  2. 绝对确定不需要它(除了验证之外,它还可能声明文档中使用的字符实体)并且
  3. 您正在使用 Xerces XML Parser 实现,

则可以禁用获取通过设置相应的SAX特性来实现DTD。在 XOM 中,这应该可以通过传递 XMLReader< /a> 到 Builder 构造函数,如下所示:

import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

...

XMLReader xmlreader = XMLReaderFactory.createXMLReader();
xmlreader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Builder builder = new Builder(xmlreader);

The preferred solution would be to implement an EntityResolver that intercepts requests for the DTD and redirects these to an embedded copy. If you

  1. don't have access to the DTD and
  2. are absolutely sure you won't need it (apart from validation it might also declare character entities that are used in the document) and
  3. you are using the Xerces XML Parser implementation

you can disable fetching of DTD by setting the corresponding SAX feature. In XOM this should be possible by passing an XMLReader to the Builder constructor like this:

import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

...

XMLReader xmlreader = XMLReaderFactory.createXMLReader();
xmlreader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Builder builder = new Builder(xmlreader);
︶ ̄淡然 2024-12-21 21:32:21

如果不使用 XOM 而只是使用 JAXP,则只需将上述解决方案调整为

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(...);

If not using XOM but simply JAXP the abovementioned solution just need to be tweaked into

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(...);
冰雪之触 2024-12-21 21:32:21

根据他们的文档,这是在没有任何验证的情况下解析文档的方法。

try {
  Builder parser = new Builder();
  Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ParsingException ex) {
  System.err.println("Cafe con Leche is malformed today. How embarrassing!");
}
catch (IOException ex) {
  System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}

如果您确实想验证 XML 模式,则必须调用 new Builder(true):

try {
  Builder parser = new Builder(true);
  Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ValidityException ex) {
  System.err.println("Cafe con Leche is invalid today. (Somewhat embarrassing.)");
}
catch (ParsingException ex) {
  System.err.println("Cafe con Leche is malformed today. (How embarrassing!)");
}
catch (IOException ex) {
  System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}

请注意,现在可能会抛出另一个异常:ValidityException

According to their documentation this is the way to parse document without any validation.

try {
  Builder parser = new Builder();
  Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ParsingException ex) {
  System.err.println("Cafe con Leche is malformed today. How embarrassing!");
}
catch (IOException ex) {
  System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}

If you do want to validate XML schema you have to call new Builder(true):

try {
  Builder parser = new Builder(true);
  Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ValidityException ex) {
  System.err.println("Cafe con Leche is invalid today. (Somewhat embarrassing.)");
}
catch (ParsingException ex) {
  System.err.println("Cafe con Leche is malformed today. (How embarrassing!)");
}
catch (IOException ex) {
  System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}

Pay attention that now yet another exception can be thrown: ValidityException

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文