如何在Characters方法中使用SAX解析器读取转义字符?

发布于 2024-12-10 03:19:32 字数 922 浏览 3 评论 0原文

我正在使用解析器解析以下 XML:

<Person>
<Name>Test</Name>
<Phone>111-111-2222</OtherPhone>
<Address>lee h&amp;y</Address>
<Person>

sax 解析器的字符方法仅读取地址数据,直到“lee h”,因为它不考虑“&”作为一个角色。我需要获取地址元素中的完整文本。关于我应该如何做有什么想法吗?这是我的 sax 解析器(这里地址是一个标志,通知 XML 中存在地址元素):

boolean address=false;

 public void startElement(String uri, String localName,
            String qName, Attributes attributes)
            throws SAXException {


        if (qName.equalsIgnoreCase("Address")) {
            address= true;

        }

    public void characters(char ch[], int start, int length)
                throws SAXException {

            String data = new String(ch, start, length);


            if (address) {

                System.out.println("Address is: "+data);
                address = false;
            }

输出为:: lee h

I'm parsing the following XML using parser:

<Person>
<Name>Test</Name>
<Phone>111-111-2222</OtherPhone>
<Address>lee h&y</Address>
<Person>

The characters method of the sax parser is only reading the address data until 'lee h' as it does not consider '&' as a character. I need to get the complete text in the address element. Any ideas on how I should do it? This is my sax parser(here address is a flag which notifies that an address element is present in XML):

boolean address=false;

 public void startElement(String uri, String localName,
            String qName, Attributes attributes)
            throws SAXException {


        if (qName.equalsIgnoreCase("Address")) {
            address= true;

        }

    public void characters(char ch[], int start, int length)
                throws SAXException {

            String data = new String(ch, start, length);


            if (address) {

                System.out.println("Address is: "+data);
                address = false;
            }

and the output is:: lee h

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

窗影残 2024-12-17 03:19:32

由于外部实体的存在,这里调用了 3 次字符方法来报告元素 Address 的内容。您应该累积对字符调用的内容,直到收到 endElement 事件,然后您就拥有了完整的内容。

请注意 字符方法的文档

您还可以受益于将 ignorableWhitespace 方法与验证解析器和适当的架构(例如 DTD)一起使用,让解析器知道哪些空格可以忽略(由于缩进)。

在 Java 中,它可能是:

class MyHandler extends DefaultHandler {

    private StringBuilder acc;

    public MyHandler() {
        acc = new StringBuilder();
    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        System.out.printf("Characters accumulated: %s\n", acc.toString());
        acc.setLength(0);
    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        acc.append(ch, start, length);
    }
}

The characters method is called three times here to report the content of the element Address because of the presence of an external entity. You should accumulate the content of the calls to characters until you receive an endElement event and then you have the complete content.

Please note the documentation of the characters method.

You could also benefit from the use of the ignorableWhitespace method with a validating parser and the appropriate schema (e.g. DTD) to let the parser know which spaces are ignorable (due to indentation).

In Java, it could be:

class MyHandler extends DefaultHandler {

    private StringBuilder acc;

    public MyHandler() {
        acc = new StringBuilder();
    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        System.out.printf("Characters accumulated: %s\n", acc.toString());
        acc.setLength(0);
    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        acc.append(ch, start, length);
    }
}
深海蓝天 2024-12-17 03:19:32

答案在某种程度上取决于您使用的解析器。

以下是该问题的完整概述: http://www.ibm .com/developerworks/xml/library/x-tipsaxdo4/index.html

使用 StaX 解析器,您可以指定属性 isCoalescing=true。该属性指定是否合并相邻的相邻字符数据。

但是对于SAX,通常没有这样的控制。

The answer depends to some extent which parser you're using.

Here's a thorough rundown on the issue: http://www.ibm.com/developerworks/xml/library/x-tipsaxdo4/index.html

With a StaX parser you can specify the property isCoalescing=true. This property specifies whether to coalesce adjacent adjacent character data.

But with SAX there is no such control, generally.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文