生成 XML 时如何保留 CDATA 中的换行符？

发布于 2024-07-29 19:16:40 字数 506 浏览 7 评论 0原文

我想将一些包含空格字符（例如 newline 和 tab）的文本写入 xml 文件中，所以我使用，

Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));

但是当我在 using 中读回它时，

Node vs =  xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();

我得到一个字符串，其中包含不再有换行符。
当我直接查看磁盘上的 xml 时，换行符似乎被保留。所以在读取xml文件的时候就会出现这个问题。

如何保留换行符？

谢谢！

原文

I want to write some text that contains whitespace characters such as newline and tab into an xml file so I use

Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));

but when I read this back in using

Node vs =  xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();

I get a string that has no newlines anymore.
When i look directly into the xml on disk, the newlines seem preserved. so the problem occurs when reading in the xml file.

How can I preserve the newlines?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

饮惑 2024-08-05 19:16:40

我不知道您如何解析和编写文档，但这里有一个基于您的文档的增强代码示例：

// creating the document in-memory                                                        
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

Element element = xmldoc.createElement("TestElement");                                    
xmldoc.appendChild(element);                                                              
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));              

// serializing the xml to a string                                                        
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();             

DOMImplementationLS impl =                                                                
    (DOMImplementationLS)registry.getDOMImplementation("LS");                             

LSSerializer writer = impl.createLSSerializer();                                          
String str = writer.writeToString(xmldoc);                                                

// printing the xml for verification of whitespace in cdata                               
System.out.println("--- XML ---");                                                        
System.out.println(str);                                                                  

// de-serializing the xml from the string                                                 
final Charset charset = Charset.forName("utf-16");                                        
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));       
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);

Node vs =  xmldoc2.getElementsByTagName("TestElement").item(0);                           
final Node child = vs.getFirstChild();                                                    
String x = child.getNodeValue();                                                          

// print the value, yay!                                                                  
System.out.println("--- Node Text ---");                                                  
System.out.println(x);

使用 LSSerializer 进行序列化是 W3C 的方法（参见此处）。输出符合预期，带有行分隔符：

--- XML --- 
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text --- 
first line
second line

I don't know how you parse and write your document, but here's an enhanced code example based on yours:

// creating the document in-memory                                                        
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

Element element = xmldoc.createElement("TestElement");                                    
xmldoc.appendChild(element);                                                              
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));              

// serializing the xml to a string                                                        
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();             

DOMImplementationLS impl =                                                                
    (DOMImplementationLS)registry.getDOMImplementation("LS");                             

LSSerializer writer = impl.createLSSerializer();                                          
String str = writer.writeToString(xmldoc);                                                

// printing the xml for verification of whitespace in cdata                               
System.out.println("--- XML ---");                                                        
System.out.println(str);                                                                  

// de-serializing the xml from the string                                                 
final Charset charset = Charset.forName("utf-16");                                        
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));       
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);

Node vs =  xmldoc2.getElementsByTagName("TestElement").item(0);                           
final Node child = vs.getFirstChild();                                                    
String x = child.getNodeValue();                                                          

// print the value, yay!                                                                  
System.out.println("--- Node Text ---");                                                  
System.out.println(x);

The serialization using LSSerializer is the W3C way to do it (see here). The output is as expected, with line separators:

--- XML --- 
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text --- 
first line
second line

回复收藏 0 原文

汐鸠 2024-08-05 19:16:40

您需要使用node.getNodeType()检查每个节点的类型。如果类型是 CDATA_SECTION_NODE，则需要将 CDATA 保护连接到 node.getNodeValue。

回复收藏 0 原文

伴我老 2024-08-05 19:16:40

您不一定必须使用 CDATA 来保留空白字符。
XML 规范指定如何对这些字符进行编码。

因此，例如，如果您有一个值包含新空格的元素，您应该使用

回车符对其进行编码：

等等

You don't necessarily have to use CDATA to preserve white space characters.
The XML specification specify how to encode these characters.

So for example, if you have an element with value that contains new space you should encode it with

Carriage return:

And so forth

回复收藏 0 原文

贱人配狗天长地久 2024-08-05 19:16:40

编辑：删除所有不相关的内容

我很想知道您正在使用什么 DOM 实现，因为它没有反映我尝试过的几个 JVM 中的默认行为（它们附带 Xerces impl）。我也对您的文档有哪些换行符感兴趣。

我不确定 CDATA 是否应该保留空格。我怀疑这涉及很多因素。 DTD/模式不会影响空白的处理方式吗？

您可以尝试使用 xml:space="preserve" 属性。

回复收藏 0 原文

海拔太高太耀眼 2024-08-05 19:16:40

xml:space='preserve' 不是吗。这仅适用于“所有空白”节点。也就是说，如果您想要其中的空白节点，

<this xml:space='preserve'> <has/>
<whitespace/>
</this>

但请注意这些空白节点只是空白。

我一直在努力让 Xerces 生成允许隔离 CDATA 内容的事件。我还没有解决办法。

xml:space='preserve' is not it. That is only for "all whitespace" nodes. That is, if you want the whitespace nodes in

<this xml:space='preserve'> <has/>
<whitespace/>
</this>

But see that those whitespace nodes are ONLY whitespace.

I have been struggling to get Xerces to generate events allowing isolation of CDATA content as well. I have no solution as yet.

回复收藏 0 原文

~没有更多了~

关于作者

不甘平庸

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

生成 XML 时如何保留 CDATA 中的换行符？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

生成 XML 时如何保留 CDATA 中的换行符？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。