Android：解析 XML DOM 解析器。将子节点转换为字符串

发布于 2024-08-18 07:40:59 字数 1137 浏览 9 评论 0原文

又问一个问题。这次我正在解析从服务器收到的 XML 消息。有人自以为聪明，决定将 HTML 页面放入 XML 消息中。现在我遇到了一些问题，因为我想从该 XML 消息中将该 HTML 页面提取为字符串。

好的，这是我正在解析的 XML 消息：

; <来自> <至> <消息类型>showMessage一般消息测试Testhtml

您会看到 Param1 中指定了一个 HTML 页面。我尝试通过以下方式提取消息：

public String getParam1(Document d) {
        if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
            NodeList results = d.getElementsByTagName("Param1");
            // Messagetype depends on what message we are reading.           
            if (results.getLength() > 0 && results != null) {                
                return results.item(0).getFirstChild().getNodeValue();
            }
        }
        return "";
    }

其中 d 是文档形式的 XML 消息。它总是返回一个 null 值，因为 getNodeValue() 返回 null。当我尝试 results.item(0).getFirstChild().hasChildNodes() 时，它将返回 true，因为他看到消息中有一个标签。

如何从字符串中的 Param0 中提取 html 消息 testTesthtml ？

我正在使用 Android sdk 1.5（几乎是 java）和 DOM 解析器。

感谢您的时间和回复。

安泰克

原文

Again a question. This time I'm parsing XML messages I receive from a server.
Someone thought to be smart and decided to place HTML pages in a XML message. Now I'm kind of facing problems because I want to extract that HTML page as a string from this XML message.

Ok this is the XML message I'm parsing:

<AmigoRequest> <From></From> <To></To> <MessageType>showMessage</MessageType> <Param0>general message</Param0> <Param1><html><head>test</head><body>Testhtml</body></html></Param1> </AmigoRequest>

You see that in Param1 a HTML page is specified. I've tried to extract the message the following way:

public String getParam1(Document d) {
        if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
            NodeList results = d.getElementsByTagName("Param1");
            // Messagetype depends on what message we are reading.           
            if (results.getLength() > 0 && results != null) {                
                return results.item(0).getFirstChild().getNodeValue();
            }
        }
        return "";
    }

Where d is the XML message in document form.
It always returns me a null value, because getNodeValue() returns null.
When i try results.item(0).getFirstChild().hasChildNodes() it will return true because he sees there is a tag in the message.

How can i extract the html message <html><head>test</head><body>Testhtml</body></html> from Param0 in a string?

I'm using Android sdk 1.5 (well almost java) and a DOM Parser.

Thanks for your time and replies.

Antek

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一曲爱恨情仇 2024-08-25 07:40:59

您可以获取 param1 的内容，如下所示：

public String getParam1(Document d) {
        if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
            NodeList results = d.getElementsByTagName("Param1");
            // Messagetype depends on what message we are reading.           
            if (results.getLength() > 0 && results != null) {                

                // String extractHTMLTags(String s) is a function that you have 
                // to implement in a way that will extract all the HTML tags inside a string.
                return extractHTMLTags(results.item(0).getTextContent());
            }
        }
        return "";
    }

您所要做的就是实现一个函数：

String extractHTMLTags(String s)

该函数将从字符串中删除所有出现的 HTML 标记。
为此，您可以查看这篇文章：从字符串中删除 HTML 标签

You could take the content of param1, like this:

public String getParam1(Document d) {
        if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
            NodeList results = d.getElementsByTagName("Param1");
            // Messagetype depends on what message we are reading.           
            if (results.getLength() > 0 && results != null) {                

                // String extractHTMLTags(String s) is a function that you have 
                // to implement in a way that will extract all the HTML tags inside a string.
                return extractHTMLTags(results.item(0).getTextContent());
            }
        }
        return "";
    }

All you have to do is to implement a function:

String extractHTMLTags(String s)

that will remove all HTML tag occurrences from a string.
For that you can take a look at this post: Remove HTML tags from a String

回复收藏 0 原文

岛歌少女 2024-08-25 07:40:59

经过大量检查并挠头数千次后，我想出了一个简单的修改，需要将 API 级别更改为 8

回复收藏 0 原文

蘸点软妹酱 2024-08-25 07:40:59

编辑：我刚刚看到您上面关于 Android 不支持 getTextContent() 的评论。我将保留这个答案，以防它对使用不同平台的人有用。

如果您的 DOM API 支持，您可以调用 getTextContent()，如下所示：

public String getParam1(Document d) {
        if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
            NodeList results = d.getElementsByTagName("Param1");
            // Messagetype depends on what message we are reading.           
            if (results != null) {                
                return results.getTextContent();
            }
        }
        return "";
    }

但是，getTextContent() 是 DOM Level 3 API 调用；并非所有解析器都保证支持它。 Xerces-J 确实。

顺便说一下，在您原来的示例中，您对 null 的检查位于错误的位置；它应该是：

        if (results != null && results.getLength() > 0) {

否则，如果 results 确实返回为 null，您将得到 NPE。

EDIT: I just saw your comment above about getTextContent() not being supported on Android. I'm going to leave this answer up in case it's useful to someone who's on a different platform.

If your DOM API supports it, you can call getTextContent(), as follows:

public String getParam1(Document d) {
        if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
            NodeList results = d.getElementsByTagName("Param1");
            // Messagetype depends on what message we are reading.           
            if (results != null) {                
                return results.getTextContent();
            }
        }
        return "";
    }

However, getTextContent() is a DOM Level 3 API call; not all parsers are guaranteed to support it. Xerces-J does.

By the way, in your original example, your check for null is in the wrong place; it should be:

        if (results != null && results.getLength() > 0) {

Otherwise, you'd get a NPE if results really does come back as null.

回复收藏 0 原文

似最初 2024-08-25 07:40:59

由于您无法使用 getTextContent() ，因此另一种选择是编写它 - 这并不难。事实上，如果您编写此内容仅供自己使用 - 或者您的雇主对开源没有过于严格的规则 - 您可以查看 Apache 的实现作为起点；第 610-646 行似乎包含了您需要的大部分内容。（请尊重 Apache 的版权和许可。）

否则，该方法的一些粗略伪代码将是：

String getTextContent(Node node) {
    if (node has no children) 
        return "";

    if (node has 1 child)
        return getTextContent(node.getFirstChild());

    return getTextContent(new StringBuffer()).toString();
}

StringBuffer getTextContent(Node node, StringBuffer sb) {
    for each child of node {
        if (child is a text node) sb.append(child's text)
        else getTextContent(child, sb);
    }
    return sb;
}

Since getTextContent() isn't available to you, another option would be to write it -- it isn't hard. In fact, if you're writing this solely for your own use -- or your employer doesn't have overly strict rules about open source -- you could look at Apache's implementation as a starting point; lines 610-646 seem to contain most of what you need. (Please be respectful of Apache's copyright and license.)

Otherwise, some rough pseudocode for the method would be:

String getTextContent(Node node) {
    if (node has no children) 
        return "";

    if (node has 1 child)
        return getTextContent(node.getFirstChild());

    return getTextContent(new StringBuffer()).toString();
}

StringBuffer getTextContent(Node node, StringBuffer sb) {
    for each child of node {
        if (child is a text node) sb.append(child's text)
        else getTextContent(child, sb);
    }
    return sb;
}

回复收藏 0 原文

安静 2024-08-25 07:40:59

好吧，我的代码就快到了...

public String getParam1(Document d) {
    if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
        NodeList results = d.getElementsByTagName("Param1");
        // Messagetype depends on what message we are reading.           
        if (results.getLength() > 0 && results != null) {                
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db;
            Element node = (Element) results.item(0); // get the value of Param1
            Document doc2 = null;
            try {

                db = dbf.newDocumentBuilder();
                doc2 = db.newDocument(); //create new document
                doc2.appendChild(doc2.importNode(node, true)); //import the <html>...</html> result in doc2

            } catch (ParserConfigurationException e) {
                // TODO Auto-generated catch block
                Log.d(TAG, " Exception ", e);
            } catch (DOMException e) {
                // TODO: handle exception
                Log.d(TAG, " Exception ", e);
            } catch (Exception e) {
                // TODO: handle exception
                e.printStackTrace();               }              


            return doc2. .....// All I'm missing is something to convert a Document to a string.
        }
    }
    return "";

}

就像我的代码注释中所解释的那样。我所缺少的就是从文档中创建一个字符串。你不能在 Android 中使用 Transform 类... doc2.toString() 将为你提供对象的序列化..

但我的下一步是编写我自己的解析器，如果这不起作用;)

不是最好的代码，而是一个临时解决方案。

public String getParam1(String b) {
        return b
                .substring(b.indexOf("<Param1>") + "<Param1>".length(), b.indexOf("</Param1>"));
    }

其中 String b 是 XML 文档字符串。

Well i was almost there with the code...

public String getParam1(Document d) {
    if (d.getDocumentElement().getTagName().equals("AmigoRequest")) {
        NodeList results = d.getElementsByTagName("Param1");
        // Messagetype depends on what message we are reading.           
        if (results.getLength() > 0 && results != null) {                
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db;
            Element node = (Element) results.item(0); // get the value of Param1
            Document doc2 = null;
            try {

                db = dbf.newDocumentBuilder();
                doc2 = db.newDocument(); //create new document
                doc2.appendChild(doc2.importNode(node, true)); //import the <html>...</html> result in doc2

            } catch (ParserConfigurationException e) {
                // TODO Auto-generated catch block
                Log.d(TAG, " Exception ", e);
            } catch (DOMException e) {
                // TODO: handle exception
                Log.d(TAG, " Exception ", e);
            } catch (Exception e) {
                // TODO: handle exception
                e.printStackTrace();               }              


            return doc2. .....// All I'm missing is something to convert a Document to a string.
        }
    }
    return "";

}

Like explained in the comment of my code. All I am missing is to make a String out of a Document. You can't use the Transform class in Android... doc2.toString() will give you a serialization of the object..

But my next step is write my own parser if this doesnt work out ;)

Not the best code but a temponary solution.

public String getParam1(String b) {
        return b
                .substring(b.indexOf("<Param1>") + "<Param1>".length(), b.indexOf("</Param1>"));
    }

Where String b is the XML document string.

回复收藏 0 原文

~没有更多了~