为什么我的 xml 文件中有 #text 节点?
我正在制作一个 Android 应用程序,它对 xml 文件进行 DOM 解析。我有一个如下所示的 xml 文件:
<?xml version="1.0" encoding="utf-8"?>
<family>
<grandparent>
<parent1>
<child1>Foo</child1>
<child2>Bar</child2>
</parent1>
<parent2>
<child1>Raz</child1>
<child2>Mataz</child2>
</parent2>
</grandparent>
</family>
如果我在其上运行 dom 解析器,如下所示:
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(input);
doc.getDocumentElement().normalize(); //added in since the edit
NodeList nodd = doc.getElementsByTagName("grandparent");
for (int x = 0; x < nodd.getLength(); x++){
Node node = nodd.item(x);
NodeList nodes = node.getChildNodes();
for(int y = 0; y < nodes.getLength(); y++){
Node n = nodes.item(y);
System.out.println(n.getNodeName());
}
}
}
我的应用程序打印出以下内容
07-20 18:24:28.395: INFO/System.out(491): #text
07-20 18:24:28.395: INFO/System.out(491): 父级1
07-20 18:24:28.395: INFO/System.out(491): #text
07-20 18:24:28.395: INFO/System.out(491): 父级2
07-20 18:24:28.395: INFO/System.out(491): #text
我的问题是,那些#text字段是什么,更重要的是,我该如何摆脱它们?
编辑:现在我知道它们是什么了,我尝试将其标准化。我已更新代码以反映更改,但结果相同。
I'm making an android application that does DOM parsing on an xml file. I have an xml file that looks like this:
<?xml version="1.0" encoding="utf-8"?>
<family>
<grandparent>
<parent1>
<child1>Foo</child1>
<child2>Bar</child2>
</parent1>
<parent2>
<child1>Raz</child1>
<child2>Mataz</child2>
</parent2>
</grandparent>
</family>
If I run a dom parser on it, like this:
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(input);
doc.getDocumentElement().normalize(); //added in since the edit
NodeList nodd = doc.getElementsByTagName("grandparent");
for (int x = 0; x < nodd.getLength(); x++){
Node node = nodd.item(x);
NodeList nodes = node.getChildNodes();
for(int y = 0; y < nodes.getLength(); y++){
Node n = nodes.item(y);
System.out.println(n.getNodeName());
}
}
}
My application prints out the following
07-20 18:24:28.395: INFO/System.out(491): #text
07-20 18:24:28.395: INFO/System.out(491): parent1
07-20 18:24:28.395: INFO/System.out(491): #text
07-20 18:24:28.395: INFO/System.out(491): parent2
07-20 18:24:28.395: INFO/System.out(491): #text
My question is, what are those #text fields and more importantly, how do I get rid of them?
Edit: So now that I know what they are, I tried to normalize it. I have updated the code to reflect the changes, but same result.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
它是空白(换行符、空格、制表符):)
It's whitespace (newlines, spaces, tabs) :)
这就是你得到的:
1) 一个节点列表,其中所有节点都是祖父母
2) 祖父母 x 的所有子节点,
节点
它们是3) 子节点 y 的子
之间可能有文本,这是你拥有的#text,如果你有:
你会得到:
yourTextHere1
父级1
你的文本在这里2
父级2
yourTextHere3
希望对您有帮助!
朱利安,
This is what you get :
1) A node list with all the nodes being the grand-parents
2) All the child node of the grand parent x
which are the sub nodes of
3) The child y
There could be text between and this is the #text you have, if you had :
You would get :
yourTextHere1
parent1
yourTextHere2
parent2
yourTextHere3
I hope it helped you !
Julien,
在解析文档时执行此操作,
这会缩小 xml 文件并删除所有不需要的 #text 子项。
Do this when parsing the document,
This would kind of deflate the xml file and remove all unwanted #text children.