使用 DOM 解析 xml,DOCTYPE 被删除

发布于 2024-11-19 06:41:23 字数 1528 浏览 3 评论 0原文

为什么 dom 在编辑 xml 时会删除 doctype?

得到这个 xml 文件:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE map[ <!ELEMENT map (station*) >
                <!ATTLIST station  id   ID    #REQUIRED> ]>
<favoris>
<station id="5">test1</station>
<station id="6">test1</station>
<station id="8">test1</station>
</favoris> 

我的功能非常基本:

public static void EditStationName(int id, InputStream is, String path, String name) throws ParserConfigurationException, SAXException, IOException, TransformerFactoryConfigurationError, TransformerException{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    DocumentBuilder builder = factory.newDocumentBuilder();
    Document dom = builder.parse(is);

    Element e = dom. getElementById(String.valueOf(id));
    e.setTextContent(name);
    // Write the DOM document to the file
    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    FileOutputStream fos = new FileOutputStream(path);
    Result result = new StreamResult(fos);  
    Source source = new DOMSource(dom);


        xformer.setOutputProperty(
                OutputKeys.STANDALONE,"yes"     
                );

    xformer.transform(source, result);
}

它可以工作,但是文档类型被删除了!我刚刚获得了整个文档,但没有 doctype 部分,这对我来说很重要,因为它允许我通过 id 检索! 我们如何保留文档类型?为什么它会删除它? 我尝试了许多使用 outputkeys 或 omImpl.createDocumentType 的解决方案,但这些都不起作用......

谢谢!

how come dom with java erases doctype when editing xml ?

got this xml file :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE map[ <!ELEMENT map (station*) >
                <!ATTLIST station  id   ID    #REQUIRED> ]>
<favoris>
<station id="5">test1</station>
<station id="6">test1</station>
<station id="8">test1</station>
</favoris> 

my function is very basic :

public static void EditStationName(int id, InputStream is, String path, String name) throws ParserConfigurationException, SAXException, IOException, TransformerFactoryConfigurationError, TransformerException{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    DocumentBuilder builder = factory.newDocumentBuilder();
    Document dom = builder.parse(is);

    Element e = dom. getElementById(String.valueOf(id));
    e.setTextContent(name);
    // Write the DOM document to the file
    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    FileOutputStream fos = new FileOutputStream(path);
    Result result = new StreamResult(fos);  
    Source source = new DOMSource(dom);


        xformer.setOutputProperty(
                OutputKeys.STANDALONE,"yes"     
                );

    xformer.transform(source, result);
}

it's working but the doctype gets erased ! and I just got the whole document but without the doctype part, which is important for me because it allows me to retrieve by id !
how can we keep the doctype ? why does it erase it?
I tried many solution with outputkeys for example or omImpl.createDocumentType but none of these worked...

thank you !

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

记忆之渊 2024-11-26 06:41:23

您输入的 XML 无效。那应该是:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [
    <!ELEMENT favoris (station)+>
    <!ELEMENT station (#PCDATA)>
    <!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">test1</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

正如@DevNull所写的那样,它完全有效,你不能写 test1 (但是对于 Java 来说,即使有这个问题,它仍然可以工作) )。


DOCTYPE 在输出 XML 文档中被删除:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

我还没有找到缺少 DTD 的解决方案,但作为解决方法,您可以设置外部 DTD:

xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "favoris.dtd");

结果(示例)文档:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris SYSTEM "favoris.dtd">
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

编辑:

我认为不可能使用 Transformer 类(参见 此处)。如果无法使用外部 DTD 引用,则可以使用 DOM Level 3 LSSerializer 类:

DOMImplementationLS domImplementationLS =
    (DOMImplementationLS) dom.getImplementation().getFeature("LS","3.0");
LSOutput lsOutput = domImplementationLS.createLSOutput();
FileOutputStream outputStream = new FileOutputStream("output.xml");
lsOutput.setByteStream((OutputStream) outputStream);
LSSerializer lsSerializer = domImplementationLS.createLSSerializer();
lsSerializer.write(dom, lsOutput);
outputStream.close();

使用所需的 DTD 输出(我看不到任何使用 standalone="yes" 添加的选项LSSerializer...):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris> 

另一种方法是使用 Apache Xerces2-J XMLSerializer 类:

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
...

XMLSerializer serializer = new XMLSerializer();
serializer.setOutputCharStream(new java.io.FileWriter("output.xml"));
OutputFormat format = new OutputFormat();
format.setStandalone(true);
serializer.setOutputFormat(format);
serializer.serialize(dom);

结果:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

Your input XML is not valid. That should be:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [
    <!ELEMENT favoris (station)+>
    <!ELEMENT station (#PCDATA)>
    <!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">test1</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

As @DevNull wrote to be fully valid you can't write <station id="5">test1</station> (however for Java it works anyway even with that issue).


DOCTYPE is erased in output XML document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

I didn't find solution to missing DTD yet, but as workaround you can set external DTD:

xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "favoris.dtd");

Result (example) document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris SYSTEM "favoris.dtd">
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

EDIT:

I don't think it's possible to save inline DTD using Transformer class (vide here). If you can't use external DTD reference, then you can DOM Level 3 LSSerializer class instead:

DOMImplementationLS domImplementationLS =
    (DOMImplementationLS) dom.getImplementation().getFeature("LS","3.0");
LSOutput lsOutput = domImplementationLS.createLSOutput();
FileOutputStream outputStream = new FileOutputStream("output.xml");
lsOutput.setByteStream((OutputStream) outputStream);
LSSerializer lsSerializer = domImplementationLS.createLSSerializer();
lsSerializer.write(dom, lsOutput);
outputStream.close();

Output with wanted DTD (I can't see any option to add standalone="yes" using LSSerializer...):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris> 

Another approach is to use Apache Xerces2-J XMLSerializer class:

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
...

XMLSerializer serializer = new XMLSerializer();
serializer.setOutputCharStream(new java.io.FileWriter("output.xml"));
OutputFormat format = new OutputFormat();
format.setStandalone(true);
serializer.setOutputFormat(format);
serializer.serialize(dom);

Result:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>
凯凯我们等你回来 2024-11-26 06:41:23

(此响应在某种程度上只是对 @Grzegorz Szpetkowski 的答案及其工作原理的补充)

您丢失了 doctype 定义,因为您使用了生成 XSL 转换的 Transform 类。 XSLT 树模型中没有 DOCTYPE 声明或 docytype 定义对象/节点。当解析器将文档移交给 XSLT 处理器时,文档类型信息将丢失,因此无法保留或复制。 XSLT 提供对输出树序列化的一些控制,包括添加带有公共或系统标识符的 声明。这些标识符的值需要事先知道,并且无法从输入树中读取。也不支持创建或保留嵌入的 DTD 或实体声明(尽管解决此障碍的一种解决方法是使用 disable-output-escaping="yes" 将其输出为文本)。

为了保留 DTD,您需要使用 XML 序列化程序而不是 XSL 转换来输出文档,就像 Grzegorz 已经建议的那样。

(This response is in a way only a supplement to @Grzegorz Szpetkowski's answer, why it works)

You lose the doctype definition because you use the Transform class which produces an XSL transformation. There is no DOCTYPE declaration or docytype definition object/node in XSLT tree model. When a parser hands over the document to an XSLT processor, the doctype info is lost and therefore cannot be retained or duplicated. XSLT offers some control over the serialization of the output tree, including adding an <!DOCTYPE ... > declaration with a public or system identifier. The values for these identifiers need to be known beforehand and cannot be read from the input tree. Creating or retaining an embedded DTD or entity declarations is also not supported (although one workaround for this obstacle is to output it as text with disable-output-escaping="yes").

In order to preserve the DTD you need to output your document with an XML serializer instead of XSL transformation, like Grzegorz already suggested.

木緿 2024-11-26 06:41:23

@Grzegorz Szpetkowski 对于使用外部 DTD 有一个好主意。但是,如果保留这些 station/@id 值,XML 仍然无效。

任何类型为“ID”的属性都不能具有以数字开头的值。您必须向其中添加一些内容,例如代表电台的“s”:

<!DOCTYPE favoris [
<!ELEMENT favoris (station*)      > 
<!ELEMENT station (#PCDATA)       > 
<!ATTLIST station 
          id       ID   #REQUIRED > 
]>
<favoris>
  <station id="s5">test1</station>
  <station id="s6">test1</station>
  <station id="s8">test1</station>
</favoris>

@Grzegorz Szpetkowski has a good idea with using an external DTD. However, the XML is still invalid if you keep those station/@id values.

Any attribute with the type "ID" can't have a value that starts with a digit. You'll have to add something to it, like "s" for station:

<!DOCTYPE favoris [
<!ELEMENT favoris (station*)      > 
<!ELEMENT station (#PCDATA)       > 
<!ATTLIST station 
          id       ID   #REQUIRED > 
]>
<favoris>
  <station id="s5">test1</station>
  <station id="s6">test1</station>
  <station id="s8">test1</station>
</favoris>
五里雾 2024-11-26 06:41:23

我遇到了几乎同样的问题,发现 this 与转换一起使用。它是有限的,因为它只允许引用 dtd,并且如果文档的 doctype 可能变化,则需要一些工作。不过,对我来说这已经足够了,我只需要在转换后对 xhtml 文档类型进行硬编码。

xformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "publicId");
xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "systemId");

I had almost the same problem and found this which works with transform. It is limited since it only allows to reference the dtd and it will require some work if the doctype of the document can vary. It was enough in my case though, I only needed to hardcode the xhtml doctype after a transformation.

xformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "publicId");
xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "systemId");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文