为什么 Java 中的 XML Dom 会报告额外的节点?

发布于 2024-10-30 19:48:43 字数 4345 浏览 4 评论 0原文

我有一个简单的 XML 表示形式,如下表所示。当我仅使用代码(包含在下面)遍历顶层时。我得到 5 个节点,而事实上提供的示例中只有 2 个(theader 和 tbody)。有人可以解释一下为什么吗?

package testparser;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.Vector;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class TestParser {
    private static final int FILE_small = 1;
    private static final int FILE_medium = 2;
    private static final int FILE_large = 3;
    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        doDomTest(1);

    }
    private static void doDomTest(int sizeId) {
        String filename = getFileNameFromId(sizeId);

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        try {
            DocumentBuilder db = dbf.newDocumentBuilder();
            FileInputStream fis = new FileInputStream(filename);
            Document doc = db.parse(fis);

            Element topElement = doc.getDocumentElement();

            NodeList nl = topElement.getChildNodes();

            int ilen = nl.getLength();
            print("Top Element count " + ilen);
            for (int i=0;i<ilen;i++){
                Node node = nl.item(i);
                if (node.getNodeType()==Node.TEXT_NODE) {
                    print(i + ". Name:" + node.getNodeName() + "= " + node.getNodeValue() + ". type " + node.getNodeType());
                } else {
                    print(i + ". Name:" + node.getNodeName() + ", type " + node.getNodeType());
                }
            }


        } catch (Exception e) {
            e.printStackTrace();
        }

    }


    private static String getFileNameFromId(int sizeId) {
        String sReturn = "";
        switch (sizeId) {
        case FILE_small:
            sReturn = "D:/temp/testdata_ok.xml";
            break;
        case FILE_medium:
            sReturn = "D:/temp/testdata_ok.xml";
            break;
        case FILE_large:
            sReturn = "D:/temp/testdata_ok.xml";
            break;
        }
        return sReturn;
    }

    private static void print(String sValue) {
        System.out.println(sValue);
    }  
}

测试数据

<?xml version="1.0" encoding="utf-8"?>
<table>
    <theader>
        <tr>
            <th>Title Col1</th>
            <th>Title Col2</th>
            <th>Title Col3</th>
            <th>Title Col4</th>
        </tr>
    </theader>
    <tbody>
        <tr>
            <td>data:R1C1</td>
            <td>data:R1C2</td>
            <td>data:R1C3</td>
            <td>data:R1C4</td>
        </tr>
        <tr>
            <td>data:R2C1</td>
            <td>data:R2C2</td>
            <td>data:R2C3</td>
            <td>data:R2C4</td>
        </tr>
        <tr>
            <td>data:R3C1</td>
            <td>data:R3C2</td>
            <td>data:R3C3</td>
            <td>data:R3C4</td>
        </tr>
        <tr>
            <td>data:R4C1</td>
            <td>data:R4C2</td>
            <td>data:R4C3</td>
            <td>data:R4C4</td>
        </tr>
        <tr>
            <td>data:R5C1</td>
            <td>data:R5C2</td>
            <td>data:R5C3</td>
            <td>data:R5C4</td>
        </tr>
    </tbody>
</table>

控制台输出

Top Element count 5
0. Name:#text= 
    . type 3
1. Name:theader, type 1
2. Name:#text= 
    . type 3
3. Name:tbody, type 1
4. Name:#text= 
. type 3

请注意输出中如何报告 theader 和 tbody(第 1 行和第 3 行),但我也有项目 0,2 和 4。为什么需要额外的节点?我本来期望只列出 0 和 1 的行,分别表示 theader 和 tbody。

“type 1”/“type 3”表示也打印在输出中的“getNodeType()”方法的值。我发现 getNodeType() 含义 这里

我使用的是 JDK 1.6.0u24

I have a simple XML representation of a table below. When I traverse the top level only, with the code (included below). I get 5 nodes, when infact there are only 2 in the example provided (theader and tbody). Can someone please explain why ?

package testparser;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.Vector;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class TestParser {
    private static final int FILE_small = 1;
    private static final int FILE_medium = 2;
    private static final int FILE_large = 3;
    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        doDomTest(1);

    }
    private static void doDomTest(int sizeId) {
        String filename = getFileNameFromId(sizeId);

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        try {
            DocumentBuilder db = dbf.newDocumentBuilder();
            FileInputStream fis = new FileInputStream(filename);
            Document doc = db.parse(fis);

            Element topElement = doc.getDocumentElement();

            NodeList nl = topElement.getChildNodes();

            int ilen = nl.getLength();
            print("Top Element count " + ilen);
            for (int i=0;i<ilen;i++){
                Node node = nl.item(i);
                if (node.getNodeType()==Node.TEXT_NODE) {
                    print(i + ". Name:" + node.getNodeName() + "= " + node.getNodeValue() + ". type " + node.getNodeType());
                } else {
                    print(i + ". Name:" + node.getNodeName() + ", type " + node.getNodeType());
                }
            }


        } catch (Exception e) {
            e.printStackTrace();
        }

    }


    private static String getFileNameFromId(int sizeId) {
        String sReturn = "";
        switch (sizeId) {
        case FILE_small:
            sReturn = "D:/temp/testdata_ok.xml";
            break;
        case FILE_medium:
            sReturn = "D:/temp/testdata_ok.xml";
            break;
        case FILE_large:
            sReturn = "D:/temp/testdata_ok.xml";
            break;
        }
        return sReturn;
    }

    private static void print(String sValue) {
        System.out.println(sValue);
    }  
}

TEST DATA

<?xml version="1.0" encoding="utf-8"?>
<table>
    <theader>
        <tr>
            <th>Title Col1</th>
            <th>Title Col2</th>
            <th>Title Col3</th>
            <th>Title Col4</th>
        </tr>
    </theader>
    <tbody>
        <tr>
            <td>data:R1C1</td>
            <td>data:R1C2</td>
            <td>data:R1C3</td>
            <td>data:R1C4</td>
        </tr>
        <tr>
            <td>data:R2C1</td>
            <td>data:R2C2</td>
            <td>data:R2C3</td>
            <td>data:R2C4</td>
        </tr>
        <tr>
            <td>data:R3C1</td>
            <td>data:R3C2</td>
            <td>data:R3C3</td>
            <td>data:R3C4</td>
        </tr>
        <tr>
            <td>data:R4C1</td>
            <td>data:R4C2</td>
            <td>data:R4C3</td>
            <td>data:R4C4</td>
        </tr>
        <tr>
            <td>data:R5C1</td>
            <td>data:R5C2</td>
            <td>data:R5C3</td>
            <td>data:R5C4</td>
        </tr>
    </tbody>
</table>

Console Output

Top Element count 5
0. Name:#text= 
    . type 3
1. Name:theader, type 1
2. Name:#text= 
    . type 3
3. Name:tbody, type 1
4. Name:#text= 
. type 3

Note how theader and tbody (lines 1 and 3) are reported in the output but I also have items 0,2, and 4. Why the extra nodes ? I would have expected just lines listing 0 and 1 for theader and tbody respectively.

The "type 1"/"type 3" represents value of the "getNodeType()" method also printed in the output. I found getNodeType() meaning here.

I am using JDK 1.6.0u24

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

随风而去 2024-11-06 19:48:44

这三个额外节点是表示空白的文本节点:

  • 之间和

  • 之间 之间和,以及
  • 之间。

我对此不确定,但我认为您可以通过调用

    dbf.setIgnoringElementContentWhitespace(true);

Read javadoc,注意说明解析器必须处于验证模式的位...

The three extra nodes are text nodes that represent the white space:

  • between <table> and <theader>
  • between </theader> and <tbody>, and
  • between </tbody> and </table>.

I'm not sure about this, but I think you could eliminate the nodes by calling

    dbf.setIgnoringElementContentWhitespace(true);

Read the javadoc, paying attention to the bit that says that the parser must be in validating mode ...

梦纸 2024-11-06 19:48:44

正如您的输出所示,这些是 table 和 theader / tbody 元素之间的空格。如果没有 DTD 或模式,解析器不知道可以忽略这些空格。您必须在解析器代码中跳过这些节点。

As your output shows, these are the whitespaces between the table and theader / tbody elements. Without a DTD or schema the parser does not know that these whitespaces can be ignored. You would have to skip these nodes in your parser code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文