将 XHTML 和自定义标签读入 DOM 树

发布于 2024-11-30 19:23:02 字数 3244 浏览 1 评论 0原文

我正在使用飞碟进行从 XHTML 到 PDF 的转换，它工作得很好，但现在我想添加书签，根据 fs 文档，应该这样做：

<bookmarks>
    <bookmark name='1. Foo bar baz' href='#1'>
      <bookmark name='1.1 Baz quux' href='#1.2'>
      </bookmark>
    </bookmark>
    <bookmark name='2. Foo bar baz' href='#2'>
      <bookmark name='2.1 Baz quux' href='#2.2'>
      </bookmark>
    </bookmark>
</bookmarks>

那应该放入 HEAD 部分，我已经这样做了，但是SAXParser 不会再读取该文件，并说：

line 11 column 14 - Error: <bookmarks> is not recognized!
line 11 column 25 - Error: <bookmark> is not recognized!

我设置了本地实体解析器，甚至已将书签添加到 DTD，

<!--flying saucer bookmarks -->
<!ELEMENT bookmarks (#PCDATA)>
<!ATTLIST bookmarks %attrs;>

<!ELEMENT bookmark (#PCDATA)>
<!ATTLIST bookmark %attrs;>

但它只是无法解析，我没有想法，请帮忙。

编辑

我使用以下代码进行解析：

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
builder.setEntityResolver(new LocalEntityResolver());
document = builder.parse(is);

编辑

这是LocalEntityResolver：

 class LocalEntityResolver implements EntityResolver {

    private static final Logger LOG = ESAPI.getLogger(LocalEntityResolver.class);
    private static final Map<String, String> DTDS;
    static {
        DTDS = new HashMap<String, String>();
        DTDS.put("-//W3C//DTD XHTML 1.0 Strict//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd");
        DTDS.put("-//W3C//DTD XHTML 1.0 Transitional//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd");
        DTDS.put("-//W3C//ENTITIES Latin 1 for XHTML//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent");
        DTDS.put("-//W3C//ENTITIES Symbols for XHTML//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent");
        DTDS.put("-//W3C//ENTITIES Special for XHTML//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent");
    }

    @Override
    public InputSource resolveEntity(String publicId, String systemId)
            throws SAXException, IOException {
        InputSource input_source = null;
        if (publicId != null && DTDS.containsKey(publicId)) {
            LOG.debug(Logger.EVENT_SUCCESS, "Looking for local copy of [" + publicId + "]");

            final String dtd_system_id = DTDS.get(publicId);
            final String file_name = dtd_system_id.substring(
                    dtd_system_id.lastIndexOf('/') + 1, dtd_system_id.length());

            InputStream input_stream = FileUtil.readStreamFromClasspath(
                    file_name, "my/class/path",
                    getClass().getClassLoader());
            if (input_stream != null) {
                LOG.debug(Logger.EVENT_SUCCESS, "Found local file [" + file_name + "]!");
                input_source = new InputSource(input_stream);
            }
        }

        return input_source;
    }
}

我的文档生成器工厂实现是： com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl

原文

I am doing conversion from XHTML to PDF using flying saucer, it works perfectly but now i want to add bookmarks, and according to the fs documentation it should be done like this:

<bookmarks>
    <bookmark name='1. Foo bar baz' href='#1'>
      <bookmark name='1.1 Baz quux' href='#1.2'>
      </bookmark>
    </bookmark>
    <bookmark name='2. Foo bar baz' href='#2'>
      <bookmark name='2.1 Baz quux' href='#2.2'>
      </bookmark>
    </bookmark>
</bookmarks>

That should be put into the HEAD section, I have done that but the SAXParser wont read the file anymore, saying:

line 11 column 14 - Error: <bookmarks> is not recognized!
line 11 column 25 - Error: <bookmark> is not recognized!

I have a local entity resolver set up and have even added the bookmarks to a DTD,

<!--flying saucer bookmarks -->
<!ELEMENT bookmarks (#PCDATA)>
<!ATTLIST bookmarks %attrs;>

<!ELEMENT bookmark (#PCDATA)>
<!ATTLIST bookmark %attrs;>

But it just wont parse, I am out of ideas, please help.

EDIT

I am using the following code to parse:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
builder.setEntityResolver(new LocalEntityResolver());
document = builder.parse(is);

EDIT

Here is LocalEntityResolver:

 class LocalEntityResolver implements EntityResolver {

    private static final Logger LOG = ESAPI.getLogger(LocalEntityResolver.class);
    private static final Map<String, String> DTDS;
    static {
        DTDS = new HashMap<String, String>();
        DTDS.put("-//W3C//DTD XHTML 1.0 Strict//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd");
        DTDS.put("-//W3C//DTD XHTML 1.0 Transitional//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd");
        DTDS.put("-//W3C//ENTITIES Latin 1 for XHTML//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent");
        DTDS.put("-//W3C//ENTITIES Symbols for XHTML//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent");
        DTDS.put("-//W3C//ENTITIES Special for XHTML//EN",
                "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent");
    }

    @Override
    public InputSource resolveEntity(String publicId, String systemId)
            throws SAXException, IOException {
        InputSource input_source = null;
        if (publicId != null && DTDS.containsKey(publicId)) {
            LOG.debug(Logger.EVENT_SUCCESS, "Looking for local copy of [" + publicId + "]");

            final String dtd_system_id = DTDS.get(publicId);
            final String file_name = dtd_system_id.substring(
                    dtd_system_id.lastIndexOf('/') + 1, dtd_system_id.length());

            InputStream input_stream = FileUtil.readStreamFromClasspath(
                    file_name, "my/class/path",
                    getClass().getClassLoader());
            if (input_stream != null) {
                LOG.debug(Logger.EVENT_SUCCESS, "Found local file [" + file_name + "]!");
                input_source = new InputSource(input_stream);
            }
        }

        return input_source;
    }
}

My document builder factory implementation is :
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝海似她心 2024-12-07 19:23:02

呃，我终于发现问题了。抱歉让你们调试代码，问题是在我的代码中，在 DOM 解析发生之前调用了 JTidy.parse，这导致要解析的内容为空，而我什至没有发现这一点，实际的错误是来自 SAX 的文件过早结束。

感谢马特·吉布森（Matt Gibson），当我检查代码来编译一个简短的输入文档时，我发现了这个错误。

我的代码现在包括检查内容是否为空

/**
 * parses String content into a valid XML document.
 * @param content the content to be parsed.
 * @return the parsed document or <tt>null</tt>
 */
private static Document parse(final String content) {
    Document document = null;
    try {
        if (StringUtil.isNull(content)) {
            throw new IllegalArgumentException("cannot parse null "
                    + "content into a DOM object!");
        }

        InputStream is = new ByteArrayInputStream(content
                .getBytes(CONTEXT.getEncoding()));

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = dbf.newDocumentBuilder();
        builder.setEntityResolver(new LocalEntityResolver());
        document = builder.parse(is);
    } catch (Exception ex) {
        LOG.error(Logger.EVENT_FAILURE, "parsing failed "
                + "for content[" + content + "]", ex);
    }

    return document;
}

Ugh, I finally found the problem. Sorry for making you guys debug the code, the problem was that in my code there was a call to JTidy.parse just before the DOM parsing occurred, this resulted in the content to be parsed to be empty and i did not even catch that, the actual Error was, Premature End of file from SAX.

Thanks to Matt Gibson, while i was going through the code to compile a short input document, i found the bug.

My code now includes a check to see if the content was null

/**
 * parses String content into a valid XML document.
 * @param content the content to be parsed.
 * @return the parsed document or <tt>null</tt>
 */
private static Document parse(final String content) {
    Document document = null;
    try {
        if (StringUtil.isNull(content)) {
            throw new IllegalArgumentException("cannot parse null "
                    + "content into a DOM object!");
        }

        InputStream is = new ByteArrayInputStream(content
                .getBytes(CONTEXT.getEncoding()));

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = dbf.newDocumentBuilder();
        builder.setEntityResolver(new LocalEntityResolver());
        document = builder.parse(is);
    } catch (Exception ex) {
        LOG.error(Logger.EVENT_FAILURE, "parsing failed "
                + "for content[" + content + "]", ex);
    }

    return document;
}

回复收藏 0 原文

~没有更多了~