为什么我的 xml 文件中有 #text 节点？

发布于 2024-11-25 10:38:27 字数 1546 浏览 0 评论 0原文

我正在制作一个 Android 应用程序，它对 xml 文件进行 DOM 解析。我有一个如下所示的 xml 文件：

<?xml version="1.0" encoding="utf-8"?>
<family>
    <grandparent>
        <parent1>
            <child1>Foo</child1>
            <child2>Bar</child2>
        </parent1>
        <parent2>
            <child1>Raz</child1>
            <child2>Mataz</child2>
        </parent2>
    </grandparent>  
</family>

如果我在其上运行 dom 解析器，如下所示：

try {
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        
    Document doc = builder.parse(input);
    doc.getDocumentElement().normalize();   //added in since the edit
    NodeList nodd = doc.getElementsByTagName("grandparent");
    for (int x = 0; x < nodd.getLength(); x++){
        Node node = nodd.item(x);
        NodeList nodes = node.getChildNodes();
        for(int y = 0; y < nodes.getLength(); y++){
            Node n = nodes.item(y);
            System.out.println(n.getNodeName());
        }
    }
}

我的应用程序打印出以下内容

07-20 18:24:28.395: INFO/System.out(491): #text
07-20 18:24:28.395: INFO/System.out(491): 父级1
07-20 18:24:28.395: INFO/System.out(491): #text
07-20 18:24:28.395: INFO/System.out(491): 父级2
07-20 18:24:28.395: INFO/System.out(491): #text

我的问题是，那些#text字段是什么，更重要的是，我该如何摆脱它们？

编辑：现在我知道它们是什么了，我尝试将其标准化。我已更新代码以反映更改，但结果相同。

原文

I'm making an android application that does DOM parsing on an xml file. I have an xml file that looks like this:

<?xml version="1.0" encoding="utf-8"?>
<family>
    <grandparent>
        <parent1>
            <child1>Foo</child1>
            <child2>Bar</child2>
        </parent1>
        <parent2>
            <child1>Raz</child1>
            <child2>Mataz</child2>
        </parent2>
    </grandparent>  
</family>

If I run a dom parser on it, like this:

try {
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        
    Document doc = builder.parse(input);
    doc.getDocumentElement().normalize();   //added in since the edit
    NodeList nodd = doc.getElementsByTagName("grandparent");
    for (int x = 0; x < nodd.getLength(); x++){
        Node node = nodd.item(x);
        NodeList nodes = node.getChildNodes();
        for(int y = 0; y < nodes.getLength(); y++){
            Node n = nodes.item(y);
            System.out.println(n.getNodeName());
        }
    }
}

My application prints out the following

07-20 18:24:28.395: INFO/System.out(491): #text
07-20 18:24:28.395: INFO/System.out(491): parent1
07-20 18:24:28.395: INFO/System.out(491): #text
07-20 18:24:28.395: INFO/System.out(491): parent2
07-20 18:24:28.395: INFO/System.out(491): #text

My question is, what are those #text fields and more importantly, how do I get rid of them?

Edit: So now that I know what they are, I tried to normalize it. I have updated the code to reflect the changes, but same result.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

秋凉 2024-12-02 10:38:27

它是空白（换行符、空格、制表符）:)

回复收藏 0 原文

雨后彩虹 2024-12-02 10:38:27

这就是你得到的：

1) 一个节点列表，其中所有节点都是祖父母

NodeList nodd = doc.getElementsByTagName("grandparent");

2) 祖父母 x 的所有子节点，

NodeList nodes = node.getChildNodes();

节点

< grandparent >
    < parent1 >
       ...
    < /parent1 >

    < parent2 >
       ...
    < /parent2 >
< /grandparent >

它们是3) 子节点 y 的子

nodes.item(y);

之间可能有文本，这是你拥有的#text，如果你有：

< grandparent >
    yourTextHere1
    < parent1 >
       ...
    < /parent1 >
    yourTextHere2
    < parent2 >
       ...
    < /parent2 >
    yourTextHere3
< /grandparent >

你会得到：

yourTextHere1
父级1
你的文本在这里2
父级2
yourTextHere3

希望对您有帮助！
朱利安,

This is what you get :

1) A node list with all the nodes being the grand-parents

NodeList nodd = doc.getElementsByTagName("grandparent");

2) All the child node of the grand parent x

NodeList nodes = node.getChildNodes();

which are the sub nodes of

< grandparent >
    < parent1 >
       ...
    < /parent1 >

    < parent2 >
       ...
    < /parent2 >
< /grandparent >

3) The child y

nodes.item(y);

There could be text between and this is the #text you have, if you had :

< grandparent >
    yourTextHere1
    < parent1 >
       ...
    < /parent1 >
    yourTextHere2
    < parent2 >
       ...
    < /parent2 >
    yourTextHere3
< /grandparent >

You would get :

yourTextHere1
parent1
yourTextHere2
parent2
yourTextHere3

I hope it helped you !
Julien,

回复收藏 0 原文

世界等同你 2024-12-02 10:38:27

在解析文档时执行此操作，

Document doc = builder.parse(input); 
doc.getDocumentElement().normalize();

这会缩小 xml 文件并删除所有不需要的 #text 子项。

Do this when parsing the document,

Document doc = builder.parse(input); 
doc.getDocumentElement().normalize();

This would kind of deflate the xml file and remove all unwanted #text children.

回复收藏 0 原文

~没有更多了~

关于作者

岁月静好

暂无简介

0 文章

0 评论

24 人气

关注发私信

马化腾

文章 0 评论 0

关注

thousandcents

文章 0 评论 0

关注

辰『辰』

文章 0 评论 0

关注

ailin001

文章 0 评论 0

关注

再摆5分钟就干活

文章 0 评论 0

关注

冷情妓

文章 0 评论 0

友情链接

文江博客

为什么我的 xml 文件中有 #text 节点？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

马化腾

thousandcents

辰『辰』

ailin001

再摆5分钟就干活

冷情妓

友情链接

为什么我的 xml 文件中有 #text 节点？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

马化腾

thousandcents

辰『辰』

ailin001

再摆5分钟就干活

冷情妓

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。