如何生成包含已解析实体的 XML 文档的*精确*副本
给定一个像这样的 XML 文档:
<!DOCTYPE doc SYSTEM 'http://www.blabla.com/mydoc.dtd'>
<author>john</author>
<doc>
<title>&title;</title>
</doc>
我想解析上面的 XML 文档并生成它的副本,其中所有实体都已解析。因此,给定上述 XMl 文档,解析器应该输出:
<!DOCTYPE doc SYSTEM 'http://www.blabla.com/mydoc.dtd'>
<author>john</author>
<doc>
<title>Stack Overflow Madness</title>
</doc>
我知道您可以实现 org.xml.sax.EntityResolver 来解析实体,但我不知道如何正确生成XML 文档的副本,其中所有内容仍然完好无损(除了其实体)。我所说的一切是指空格、文档顶部的 dtd、注释以及除之前应该已解决的实体之外的任何其他内容。如果这是不可能的,请建议一种至少可以保留大部分内容的方法(例如全部但没有评论)。
另请注意,我仅限于 Sun 提供的纯 Java API,因此此处不能使用第三方库。
非常感谢!
编辑:上面的 XML 文档是其原始文档的简化版本。最初的涉及使用 EntityResolver 进行非常复杂的实体解析,我在这个问题中大大降低了其重要性。我真正感兴趣的是如何使用 EntityResolver 解析实体的 XML 解析器生成 XML 文档的精确副本。
Given an XML document like this:
<!DOCTYPE doc SYSTEM 'http://www.blabla.com/mydoc.dtd'>
<author>john</author>
<doc>
<title>&title;</title>
</doc>
I wanted to parse the above XML document and generate a copy of it with all of its entities already resolved. So given the above XMl document, the parser should output:
<!DOCTYPE doc SYSTEM 'http://www.blabla.com/mydoc.dtd'>
<author>john</author>
<doc>
<title>Stack Overflow Madness</title>
</doc>
I know that you could implement an org.xml.sax.EntityResolver to resolve entities, but what I don't know is how to properly generate a copy of the XML document with everything still intact (except its entities). By everything, I mean the whitespaces, the dtd at the top of the document, the comments, and any other things except the entities that should have been resolved previously. If this is not possible, please suggest a way that at least can preserve most of the things (e.g. all but no comments).
Note also that I am restricted to the pure Java API provided by Sun, so no third party libraries can be used here.
Thanks very much!
EDIT: The above XML document is a much simplified version of its original document. The original one involves a very complex entity resolution using EntityResolver whose significance I have greatly reduced in this question. What I am really interested is how to produce an exact copy of the XML document with an XML parser that uses EntityResolver to resolve the entities.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以将 xml 模板作为字符串读取吗?
用字符串做类似的事情
Is it possible for you to read in the xml template as a string?
And with the string do something like
几乎可以肯定,使用我听说过的任何 XML 解析器都无法做到这一点,当然 Sun XML 解析器也无法做到这一点。他们会很乐意丢弃那些对于 XML 的含义而言没有意义的细节。例如,
和
从 XML 语法的角度来看,
是无法区分的,并且 Sun 解析器(正确地)将它们视为相同。我认为您的选择是将 XML 作为文本进行替换(如 @Wololo 建议)或放宽您的要求。
顺便说一句,您可以独立于 XML 解析器使用 XmlEntityResolver。或者创建一个执行相同操作的类。这可能意味着 String.replace... 不是答案,但您应该能够实现一个临时扩展器,它迭代字符缓冲区中的字符,将它们扩展为第二个字符。
You almost certainly cannot do this using any XML parser I've heard of, and certainly the Sun XML parsers cannot do it. They will happily discard details that have no significance as far as the meaning of the XML is concerned. For example,
and
are indistinguishable from the perspective of the XML syntax, and the Sun parsers (rightly) treat them as identical.
I think your choices are to do the replacement treating the XML as text (as @Wololo suggests) or relax your requirements.
By the way, you can probably use an XmlEntityResolver independently of the XML parser. Or create a class that does the same thing. This may mean that
String.replace...
is not the answer, but you should be able to implement an ad-hoc expander that iterates over the characters in a character buffer, expanding them into a second one.