为什么 JDOM 的 getChild() 方法返回 null?
我正在做一个关于 html 文档操作的项目。我想要现有 html 文档中的正文内容将其修改为新的 html。现在我正在使用 JDOM。我想在我的编码中使用 body 元素。为此,我在编码中使用了 getChild("body") 。但它向我的程序返回 null 。但是我的 html 文档有一个 body 元素。任何人都可以帮助我知道这个问题吗我是一名学生?
希望得到指点..
编码:
import org.jdom.Document;
import org.jdom.Element;
public static void getBody() {
SAXBuilder builder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser", true);
org.jdom.Document jdomDocument=builder.build("http://www......com");
Element root = jdomDocument.getRootElement();
//It returns null
System.out.println(root.getChild("body"));
}
也请参考这些..我的html根目录和子目录打印在控制台中...
root.getName():html
SIZE:2
[Element: <head [Namespace: http://www.w3.org/1999/xhtml]/>]
[Element: <body [Namespace: http://www.w3.org/1999/xhtml]/>]
I'm doing a project regarding html document manipulation. I want body content from existing html document to modify it into a new html.Now i'm using JDOM. i want to use body element in my coding.For that i used getChild("body") in my coding.But it returns null to my program.But my html document have a body element.Could anybody help me to know this problem as i'm a student?
would appreciate pointers..
Coding:
import org.jdom.Document;
import org.jdom.Element;
public static void getBody() {
SAXBuilder builder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser", true);
org.jdom.Document jdomDocument=builder.build("http://www......com");
Element root = jdomDocument.getRootElement();
//It returns null
System.out.println(root.getChild("body"));
}
please refer these too.. My html's root and childs printed in console...
root.getName():html
SIZE:2
[Element: <head [Namespace: http://www.w3.org/1999/xhtml]/>]
[Element: <body [Namespace: http://www.w3.org/1999/xhtml]/>]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我在您的代码中发现了一些问题:
1) 如果你想通过网络构建远程 xml,你应该使用另一个接收 URL 作为输入的构建方法。实际上,您正在将名称为“www......com”的文件解析为 xml。
2)如果你想将html页面解析为xml,你必须检查它是否是一个格式良好的xhtml文档,否则你无法将其解析为xml
3)正如我已经在另一个答案中说过的,< code>root.getChild("body") 返回 root 的子级,名称为“body”,不带命名空间。您应该检查您要查找的元素的名称空间;如果它有一个合格的命名空间,你必须以这种方式传递它:
要以简单的方式知道哪个命名空间有你的元素,你应该使用 getChildren 方法打印出所有 root 的子元素:
如果你试图解析一个 xhtml,可能你有命名空间 uri
http://www.w3.org/1999/xhtml
。所以你应该这样做:I've found some problems in your code:
1) if you want to build a remote xml through the net, you should user another build method which receives an URL as input. Actually you're parsing the file with name "www......com" as an xml.
2) if you want to parse an html page as xml, you have to check that it is a well formed xhtml document, otherwise you can't parse it as xml
3) as I've already said you in another answer, the
root.getChild("body")
returns root's child which name is "body", without namespace. You should check the namespace for the element that you're looking for; if it has a qualified namespace you have to pass it in this way:To know which namespace has your element in an easy way, you should print out all root's children using getChildren method:
If you're trying to parse an xhtml, probably you have namespace uri
http://www.w3.org/1999/xhtml
. So you should do this:是什么让您感觉需要 org.ccil.cowan.tagsoup.Parser?它为您提供了哪些 JDK 内置的解析器没有提供的功能?
我会尝试使用 SAXBuilder 的另一个构造函数。使用 JDK 中内置的解析器,看看是否有帮助。
首先使用 XMLOutputter 打印出整个树。
What makes you feel like you require org.ccil.cowan.tagsoup.Parser? What does it provide you that the parser built into the JDK does not?
I'd try it using another constructor for SAXBuilder. Use the parser built into the JDK and see if that helps.
Start by printing out the entire tree using XMLOutputter.