如何从 XML 节点获取文本而不修剪两个 unicode 字符之间的空格
在 JAVA 中使用 SAX 解析器解析 XML 时,我无法获取 XML 中的数据。 问题是节点是否包含带有某些 unicode 字符的文本数据。
node.getTextContent()
在 unicode 字符处分割内容并修剪两个 unicode 字符之间的空格。
假设,如果节点具有数据oro-maxilo-facială și 种植学
。 请注意 ă 之间的空格; și
。
node.getTextContent()
方法返回的字符串为 oro-maxilo-facialășiimplantologie
(无空格)。
下面是我尝试过的代码。
private String getNodeContent(Element nodeToSerialize) {
StringBuffer sb = new StringBuffer();
if (nodeToSerialize.hasChildNodes()) {
NodeList nodeList = nodeToSerialize.getChildNodes();
for (int x = 0; x < nodeList.getLength(); x++) {
Node node = nodeList.item(x);
sb.append(node.getTextContent());
}
}
return sb.toString();
}
XML 内容是
<record>
<isbn>1234-5689</isbn>
<titles>
<title>Revista de chirurgie oro-maxilo-facială și implantologie</title>
</titles>
<number>16</number>
</record>
While parsing the XML with SAX parser in JAVA, I am not able to get data as it is in XML.
The problem is if the node contains text data with some unicode charaters.
The node.getTextContent()
is splitting the content at unicode characters and trimming the whitespace between two unicode characters.
Suppose, if the node is having the data oro-maxilo-facială și implantologie
.
Please observe the space between ă și
.
The method node.getTextContent()
returns the string as oro-maxilo-facialăși implantologie
(no whitespace).
Below is the code I tried.
private String getNodeContent(Element nodeToSerialize) {
StringBuffer sb = new StringBuffer();
if (nodeToSerialize.hasChildNodes()) {
NodeList nodeList = nodeToSerialize.getChildNodes();
for (int x = 0; x < nodeList.getLength(); x++) {
Node node = nodeList.item(x);
sb.append(node.getTextContent());
}
}
return sb.toString();
}
XML content is
<record>
<isbn>1234-5689</isbn>
<titles>
<title>Revista de chirurgie oro-maxilo-facială și implantologie</title>
</titles>
<number>16</number>
</record>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题出在digester1.8 上。使用 commons-digester1.8.1.jar 而不是 commons-digester1.8.jar。这将解决这个空白吞咽问题。
The problem is with digester1.8. Use commons-digester1.8.1.jar instead of commons-digester1.8.jar. That will solve this whitespace swallowing issue.