SAXReader 不重新转义字符
我正在使用 dom4j 读取 XML 文件。该文件如下所示:
...
<Field> hello, world...</Field>
...
我使用 SAXReader
将文件读取到 Document
中。当我在节点上使用 getText()
时,我获得以下字符串:
\r\n hello, world...
我进行一些处理,然后使用 asXml()
编写另一个文件。但这些字符并未像原始文件中那样进行转义,这会导致使用该文件的外部系统出现错误。
写入文件时如何转义特殊字符并具有
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
你不能轻易。这些不是“逃避”,而是“角色实体”。它们是 XML 的基本组成部分。 Xerces 对“未解析实体”有一些非常复杂的支持,但我怀疑它是否适用于这些实体,而不是 DTD 中定义的种类。
You cannot easily. Those aren't 'escapes', they are 'character entities'. They are a fundamental part of XML. Xerces has some very complex support for 'unparsed entities', but I doubt that it applies to these, as opposed to the species that are defined in a DTD.
这取决于您得到什么以及您想要什么(请参阅我之前的评论)。SAX
阅读器没有做错任何事情 - 您的 XML 为您提供了一个文字换行符。如果您控制此 XML,那么您将需要插入一个 \(反斜杠)字符,后跟“r”或“n”字符(或两者),而不是换行符。
如果您不控制此 XML,那么您在取回字符串后,需要将换行符字面转换为“\r\n”。在 C# 中,它会是这样的:
It depends on what you're getting and what you want (see my previous comment.)
The SAX reader is doing nothing wrong - your XML is giving you a literal newline character. If you control this XML, then instead of the newline characters, you will need to insert a \ (backslash) character following by the "r" or "n" characters (or both.)
If you do not control this XML, then you will need to do a literal conversion of the newline character to "\r\n" after you've gotten your string back. In C# it would be something like:
XML 实体在 DOM 中被抽象出来。内容通过 String 公开,无需担心编码——这在大多数情况下都是您想要的。
但是 SAX 对实体的处理方式有一些支持。您可以尝试使用自定义
EntityResolver#resolveEntity
创建一个XMLReader
,并将其作为参数传递给SAXReader
。但我觉得它可能行不通:否则,您可以尝试为 SAX 配置一个
LexicalHandler
,以便在遇到实体时收到通知。LexicalHandler#startEntity
的 Javadoc 说:您将无法更改分辨率,但这可能仍然有帮助。
编辑
您必须使用 dom4j 提供的
SAXReader
和XMLWriter
读取和写入 XML。请参阅读取 XML 文件 和编写XML 文件。不要使用asXml()
并自行转储文件。XML entities are abstracted away in DOM. Content is exposed with String without the need to bother about the encoding -- which in most of the case is what you want.
But SAX has some support for how entities are processed. You could try to create a
XMLReader
with a customEntityResolver#resolveEntity
, and pass it as parameter to theSAXReader
. But I feat it may not work:Otherwise you could try to configure a
LexicalHandler
for SAX in a way to be notified when an entity is encountered. Javadoc forLexicalHandler#startEntity
says:You will not be able to change the resolving, but that may still help.
EDIT
You must read and write XML with the
SAXReader
andXMLWriter
provided by dom4j. See reading a XML file and writing an XML file. Don't useasXml()
and dump the file yourself.您可以预处理输入流以将
&
替换为[$AMPERSAND_CHARACTER$]
,然后使用 dom4j 执行这些操作,并对输出流进行后处理,从而使回替。示例(使用 streamflyer):
您还可以使用 FilterInputStream/FilterOutputStream, PipedInputStream/PipedOutputStream,或 ProxyInputStream/ProxyOutputStream 用于预处理和后处理。
You can pre-process the input stream to replace
&
to e.g.[$AMPERSAND_CHARACTER$]
, then do the stuff with dom4j, and post-process the output stream making the back substitution.Example (using streamflyer):
You can also use FilterInputStream/FilterOutputStream, PipedInputStream/PipedOutputStream, or ProxyInputStream/ProxyOutputStream for pre- and post-processing.