使用 DOM 解析 xml,DOCTYPE 被删除
为什么 dom 在编辑 xml 时会删除 doctype?
得到这个 xml 文件:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE map[ <!ELEMENT map (station*) >
<!ATTLIST station id ID #REQUIRED> ]>
<favoris>
<station id="5">test1</station>
<station id="6">test1</station>
<station id="8">test1</station>
</favoris>
我的功能非常基本:
public static void EditStationName(int id, InputStream is, String path, String name) throws ParserConfigurationException, SAXException, IOException, TransformerFactoryConfigurationError, TransformerException{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document dom = builder.parse(is);
Element e = dom. getElementById(String.valueOf(id));
e.setTextContent(name);
// Write the DOM document to the file
Transformer xformer = TransformerFactory.newInstance().newTransformer();
FileOutputStream fos = new FileOutputStream(path);
Result result = new StreamResult(fos);
Source source = new DOMSource(dom);
xformer.setOutputProperty(
OutputKeys.STANDALONE,"yes"
);
xformer.transform(source, result);
}
它可以工作,但是文档类型被删除了!我刚刚获得了整个文档,但没有 doctype 部分,这对我来说很重要,因为它允许我通过 id 检索! 我们如何保留文档类型?为什么它会删除它? 我尝试了许多使用 outputkeys 或 omImpl.createDocumentType 的解决方案,但这些都不起作用......
谢谢!
how come dom with java erases doctype when editing xml ?
got this xml file :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE map[ <!ELEMENT map (station*) >
<!ATTLIST station id ID #REQUIRED> ]>
<favoris>
<station id="5">test1</station>
<station id="6">test1</station>
<station id="8">test1</station>
</favoris>
my function is very basic :
public static void EditStationName(int id, InputStream is, String path, String name) throws ParserConfigurationException, SAXException, IOException, TransformerFactoryConfigurationError, TransformerException{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document dom = builder.parse(is);
Element e = dom. getElementById(String.valueOf(id));
e.setTextContent(name);
// Write the DOM document to the file
Transformer xformer = TransformerFactory.newInstance().newTransformer();
FileOutputStream fos = new FileOutputStream(path);
Result result = new StreamResult(fos);
Source source = new DOMSource(dom);
xformer.setOutputProperty(
OutputKeys.STANDALONE,"yes"
);
xformer.transform(source, result);
}
it's working but the doctype gets erased ! and I just got the whole document but without the doctype part, which is important for me because it allows me to retrieve by id !
how can we keep the doctype ? why does it erase it?
I tried many solution with outputkeys for example or omImpl.createDocumentType but none of these worked...
thank you !
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您输入的 XML 无效。那应该是:
正如@DevNull所写的那样,它完全有效,你不能写
test1
(但是对于 Java 来说,即使有这个问题,它仍然可以工作) )。DOCTYPE
在输出 XML 文档中被删除:我还没有找到缺少 DTD 的解决方案,但作为解决方法,您可以设置外部 DTD:
结果(示例)文档:
编辑:
我认为不可能使用
Transformer
类(参见 此处)。如果无法使用外部 DTD 引用,则可以使用 DOM Level 3LSSerializer
类:使用所需的 DTD 输出(我看不到任何使用
standalone="yes"
添加的选项LSSerializer
...):另一种方法是使用 Apache Xerces2-J
XMLSerializer
类:结果:
Your input XML is not valid. That should be:
As @DevNull wrote to be fully valid you can't write
<station id="5">test1</station>
(however for Java it works anyway even with that issue).DOCTYPE
is erased in output XML document:I didn't find solution to missing DTD yet, but as workaround you can set external DTD:
Result (example) document:
EDIT:
I don't think it's possible to save inline DTD using
Transformer
class (vide here). If you can't use external DTD reference, then you can DOM Level 3LSSerializer
class instead:Output with wanted DTD (I can't see any option to add
standalone="yes"
usingLSSerializer
...):Another approach is to use Apache Xerces2-J
XMLSerializer
class:Result:
(此响应在某种程度上只是对 @Grzegorz Szpetkowski 的答案及其工作原理的补充)
您丢失了 doctype 定义,因为您使用了生成 XSL 转换的
Transform
类。 XSLT 树模型中没有DOCTYPE
声明或 docytype 定义对象/节点。当解析器将文档移交给 XSLT 处理器时,文档类型信息将丢失,因此无法保留或复制。 XSLT 提供对输出树序列化的一些控制,包括添加带有公共或系统标识符的声明。这些标识符的值需要事先知道,并且无法从输入树中读取。也不支持创建或保留嵌入的 DTD 或实体声明(尽管解决此障碍的一种解决方法是使用
disable-output-escaping="yes"
将其输出为文本)。为了保留 DTD,您需要使用 XML 序列化程序而不是 XSL 转换来输出文档,就像 Grzegorz 已经建议的那样。
(This response is in a way only a supplement to @Grzegorz Szpetkowski's answer, why it works)
You lose the doctype definition because you use the
Transform
class which produces an XSL transformation. There is noDOCTYPE
declaration or docytype definition object/node in XSLT tree model. When a parser hands over the document to an XSLT processor, the doctype info is lost and therefore cannot be retained or duplicated. XSLT offers some control over the serialization of the output tree, including adding an<!DOCTYPE ... >
declaration with a public or system identifier. The values for these identifiers need to be known beforehand and cannot be read from the input tree. Creating or retaining an embedded DTD or entity declarations is also not supported (although one workaround for this obstacle is to output it as text withdisable-output-escaping="yes"
).In order to preserve the DTD you need to output your document with an XML serializer instead of XSL transformation, like Grzegorz already suggested.
@Grzegorz Szpetkowski 对于使用外部 DTD 有一个好主意。但是,如果保留这些 station/@id 值,XML 仍然无效。
任何类型为“ID”的属性都不能具有以数字开头的值。您必须向其中添加一些内容,例如代表电台的“s”:
@Grzegorz Szpetkowski has a good idea with using an external DTD. However, the XML is still invalid if you keep those station/@id values.
Any attribute with the type "ID" can't have a value that starts with a digit. You'll have to add something to it, like "s" for station:
我遇到了几乎同样的问题,发现 this 与转换一起使用。它是有限的,因为它只允许引用 dtd,并且如果文档的 doctype 可能变化,则需要一些工作。不过,对我来说这已经足够了,我只需要在转换后对 xhtml 文档类型进行硬编码。
I had almost the same problem and found this which works with transform. It is limited since it only allows to reference the dtd and it will require some work if the doctype of the document can vary. It was enough in my case though, I only needed to hardcode the xhtml doctype after a transformation.