如何从 XML 节点获取文本而不修剪两个 unicode 字符之间的空格

发布于 2025-01-02 02:34:59 字数 1161 浏览 2 评论 0原文

在 JAVA 中使用 SAX 解析器解析 XML 时，我无法获取 XML 中的数据。问题是节点是否包含带有某些 unicode 字符的文本数据。

node.getTextContent() 在 unicode 字符处分割内容并修剪两个 unicode 字符之间的空格。

假设，如果节点具有数据oro-maxilo-facială și 种植学。请注意 ă 之间的空格； și。

node.getTextContent() 方法返回的字符串为 oro-maxilo-facialășiimplantologie（无空格）。

下面是我尝试过的代码。

private String getNodeContent(Element nodeToSerialize) {
    StringBuffer sb = new StringBuffer();
    if (nodeToSerialize.hasChildNodes()) {
        NodeList nodeList = nodeToSerialize.getChildNodes();
        for (int x = 0; x < nodeList.getLength(); x++) {
            Node node = nodeList.item(x);
            sb.append(node.getTextContent());
        }
    }
    return sb.toString();
}

XML 内容是

<record>
    <isbn>1234-5689</isbn>
    <titles>
        <title>Revista de chirurgie oro-maxilo-facial&#x103; &#x219;i implantologie</title>
    </titles>
    <number>16</number>
</record>

原文

While parsing the XML with SAX parser in JAVA, I am not able to get data as it is in XML.
The problem is if the node contains text data with some unicode charaters.

The node.getTextContent() is splitting the content at unicode characters and trimming the whitespace between two unicode characters.

Suppose, if the node is having the data oro-maxilo-facială și implantologie.
Please observe the space between ă și.

The method node.getTextContent() returns the string as oro-maxilo-facialăși implantologie (no whitespace).

Below is the code I tried.

private String getNodeContent(Element nodeToSerialize) {
    StringBuffer sb = new StringBuffer();
    if (nodeToSerialize.hasChildNodes()) {
        NodeList nodeList = nodeToSerialize.getChildNodes();
        for (int x = 0; x < nodeList.getLength(); x++) {
            Node node = nodeList.item(x);
            sb.append(node.getTextContent());
        }
    }
    return sb.toString();
}

XML content is

<record>
    <isbn>1234-5689</isbn>
    <titles>
        <title>Revista de chirurgie oro-maxilo-facială și implantologie</title>
    </titles>
    <number>16</number>
</record>

分享到QQ

分享到微博