如何使用 Jericho HTML 解析器解析 XML
我是 java 和 servlet 的新手,目前正在尝试使用 Jericho XML Parser 来解析 XML。 例如,我想从每个链接标签获取链接,但它没有显示任何内容,并且总数显示27(只能获取不带字符串的正确总数)。 谁知道怎么做,请教我。
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.*;
import net.htmlparser.jericho.Element;
import net.htmlparser.jericho.Source;
@WebServlet(urlPatterns = { "/HelloServlet"})
public class HelloServlet extends HttpServlet {
private static final long serialVersionUID = 1L;
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException,MalformedURLException{
resp.setContentType("text/html; charset=UTF-8");
PrintWriter out = resp.getWriter();
out.println("<html>");
out.println("<head><meta http-equiv='content-type' content='text/html; charset=UTF-8'></head>");
out.println("<body>");
Source source = new Source(new URL("http://news.yahoo.com/rss/"));
source.fullSequentialParse();
List<Element> Linklist = source.getAllElements("link");
if(Linklist!=null){
out.println("<p>total:"+Linklist.size()+"</p>");
for(Element link: Linklist){
out.println("<p>"+link.getContent().toString()+"</p>");
}
}
out.println("</body>");
out.println("</html>");
}
}
I'm new to java and servlet and currently trying to parse XML using Jericho XML Parser.
For instance, i want to get links from each link tag, but it dose not show anything,and total number says 27(can get only correct total number without string).
Anyone who knows how to, please teach me.
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.*;
import net.htmlparser.jericho.Element;
import net.htmlparser.jericho.Source;
@WebServlet(urlPatterns = { "/HelloServlet"})
public class HelloServlet extends HttpServlet {
private static final long serialVersionUID = 1L;
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException,MalformedURLException{
resp.setContentType("text/html; charset=UTF-8");
PrintWriter out = resp.getWriter();
out.println("<html>");
out.println("<head><meta http-equiv='content-type' content='text/html; charset=UTF-8'></head>");
out.println("<body>");
Source source = new Source(new URL("http://news.yahoo.com/rss/"));
source.fullSequentialParse();
List<Element> Linklist = source.getAllElements("link");
if(Linklist!=null){
out.println("<p>total:"+Linklist.size()+"</p>");
for(Element link: Linklist){
out.println("<p>"+link.getContent().toString()+"</p>");
}
}
out.println("</body>");
out.println("</html>");
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据 Jericho HTML Parser 主页 Jericho 用于操作 HTML 文档。但是 Yahoo 的 RSS 是 XML,您可以使用 Java 的标准 XML 来解析该文档并提取链接标签。
这是一个例子:
According to the Jericho HTML Parser homepage Jericho is for manipulating HTML documents. But the RSS from Yahoo is XML and you can use Java's standard XML to parse this document and to extract the link tags.
Here is an example: