使用 Sax 解析此页面

发布于 2024-12-11 10:44:04 字数 4409 浏览 0 评论 0原文

我正在尝试解析网页,我想要解析的值包含在某些 TD 标签中。 有人可以帮助我相处吗?我收到语法错误 line1 column62。

目标是稍后将这些值传递给列表视图,以便它显示 Nieuw Beltegoed: € 2,50

任何帮助将不胜感激。

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="nl" lang="nl">
<head>
<TABLE class=personaltable cellSpacing=0 cellPadding=0>
 <TBODY>
  <TR class=alternativerow>
   <TD>Nieuw beltegoed:</TD>
   <TD>€ 2,50</TD>
  </TR>
  <TR>
   <TD>Tegoed vorige periode:</TD>
   <TD>€ 3,62</TD>
  </TR>
  <TR class=alternativerow>
   <TD>Tegoed tot 09-11-2011:</TD>
   <TD>€ 1,12</TD>    
  </TR>
  <TR>
   <TD>
   <TD height=25></TD>
  <TR class=alternativerow>
   <TD>Verbruik sinds nieuw tegoed:</TD>
   <TD>€ 3,33</TD>
  </TR>
  <TR>
   <TD>Ongebruikt tegoed:</TD>
   <TD>€ 1,79</TD>
  </TR>
  <TR class=alternativerow>
   <TD class=f-Orange>Verbruik boven bundel:</TD>
   <TD class=f-Orange>€ 0,00</TD>
  </TR>
  <TR>
   <TD>Verbruik dat niet in de bundel zit*:</TD>
   <TD>€ 0,00</TD>
  </TR>
 </TBODY>
</TABLE>
</head>

到目前为止我的萨克斯处理者:

// ===========================================================
    // Fields
    // ===========================================================

    private boolean in_TABLE = false;
    private boolean in_TBODY = false;
    private boolean in_TR = false;
    private boolean in_TD = false;

    private ParsedExampleDataSet myParsedExampleDataSet = new ParsedExampleDataSet();

    // ===========================================================
    // Getter & Setter
    // ===========================================================

    public ParsedExampleDataSet getParsedData() {
            return this.myParsedExampleDataSet;
    }

    // ===========================================================
    // Methods
    // ===========================================================
    @Override
    public void startDocument() throws SAXException {
            this.myParsedExampleDataSet = new ParsedExampleDataSet();
    }

    @Override
    public void endDocument() throws SAXException {
            // Nothing to do
    }

    /** Gets be called on opening tags like: 
     * <tag> 
     * Can provide attribute(s), when xml was like:
     * <tag attribute="attributeValue">*/
    @Override
    public void startElement(String namespaceURI, String localName,
                    String qName, Attributes atts) throws SAXException {
            if (localName.equals("TABLE class=personaltable cellSpacing=0 cellPadding=0")) {
                    this.in_TABLE = true;
            }else if (localName.equals("TBODY")) {
                    this.in_TBODY = true;
            }else if (localName.equals("TR class=alternativerow")) {
                    this.in_TR = true;
            }else if (localName.equals("TD")) {
                    // Extract an Attribute
                    String attrValue = atts.getValue("TD");
                    int i = Integer.parseInt(attrValue);
                    myParsedExampleDataSet.setExtractedInt(i);
            }
    }

    /** Gets be called on closing tags like: 
     * </tag> */
    @Override
    public void endElement(String namespaceURI, String localName, String qName)
                    throws SAXException {
            if (localName.equals("TABLE class=personaltable cellSpacing=0 cellPadding=0")) {
                    this.in_TABLE = false;
            }else if (localName.equals("TBODY")) {
                    this.in_TBODY = false;
            }else if (localName.equals("TR class=alternativerow")) {
                    this.in_TR = false;
            }else if (localName.equals("TD")) {
                    // Nothing to do here
            }
    }

    /** Gets be called on the following structure: 
     * <tag>characters</tag> */
    @Override
public void characters(char ch[], int start, int length) {
            if(this.in_TD){
            myParsedExampleDataSet.setExtractedString(new String(ch, start, length));
    }
}

I am trying to parse a webpage, the values i want to parse are included in certain TD tags.
Can someone please help me to get along? I am getting a syntax error line1 column62.

Goal is to pass the values to a listview later on, so that it shows Nieuw beltegoed: € 2,50

Any help would be appreciated.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="nl" lang="nl">
<head>
<TABLE class=personaltable cellSpacing=0 cellPadding=0>
 <TBODY>
  <TR class=alternativerow>
   <TD>Nieuw beltegoed:</TD>
   <TD>€ 2,50</TD>
  </TR>
  <TR>
   <TD>Tegoed vorige periode:</TD>
   <TD>€ 3,62</TD>
  </TR>
  <TR class=alternativerow>
   <TD>Tegoed tot 09-11-2011:</TD>
   <TD>€ 1,12</TD>    
  </TR>
  <TR>
   <TD>
   <TD height=25></TD>
  <TR class=alternativerow>
   <TD>Verbruik sinds nieuw tegoed:</TD>
   <TD>€ 3,33</TD>
  </TR>
  <TR>
   <TD>Ongebruikt tegoed:</TD>
   <TD>€ 1,79</TD>
  </TR>
  <TR class=alternativerow>
   <TD class=f-Orange>Verbruik boven bundel:</TD>
   <TD class=f-Orange>€ 0,00</TD>
  </TR>
  <TR>
   <TD>Verbruik dat niet in de bundel zit*:</TD>
   <TD>€ 0,00</TD>
  </TR>
 </TBODY>
</TABLE>
</head>

My Saxhandler so far:

// ===========================================================
    // Fields
    // ===========================================================

    private boolean in_TABLE = false;
    private boolean in_TBODY = false;
    private boolean in_TR = false;
    private boolean in_TD = false;

    private ParsedExampleDataSet myParsedExampleDataSet = new ParsedExampleDataSet();

    // ===========================================================
    // Getter & Setter
    // ===========================================================

    public ParsedExampleDataSet getParsedData() {
            return this.myParsedExampleDataSet;
    }

    // ===========================================================
    // Methods
    // ===========================================================
    @Override
    public void startDocument() throws SAXException {
            this.myParsedExampleDataSet = new ParsedExampleDataSet();
    }

    @Override
    public void endDocument() throws SAXException {
            // Nothing to do
    }

    /** Gets be called on opening tags like: 
     * <tag> 
     * Can provide attribute(s), when xml was like:
     * <tag attribute="attributeValue">*/
    @Override
    public void startElement(String namespaceURI, String localName,
                    String qName, Attributes atts) throws SAXException {
            if (localName.equals("TABLE class=personaltable cellSpacing=0 cellPadding=0")) {
                    this.in_TABLE = true;
            }else if (localName.equals("TBODY")) {
                    this.in_TBODY = true;
            }else if (localName.equals("TR class=alternativerow")) {
                    this.in_TR = true;
            }else if (localName.equals("TD")) {
                    // Extract an Attribute
                    String attrValue = atts.getValue("TD");
                    int i = Integer.parseInt(attrValue);
                    myParsedExampleDataSet.setExtractedInt(i);
            }
    }

    /** Gets be called on closing tags like: 
     * </tag> */
    @Override
    public void endElement(String namespaceURI, String localName, String qName)
                    throws SAXException {
            if (localName.equals("TABLE class=personaltable cellSpacing=0 cellPadding=0")) {
                    this.in_TABLE = false;
            }else if (localName.equals("TBODY")) {
                    this.in_TBODY = false;
            }else if (localName.equals("TR class=alternativerow")) {
                    this.in_TR = false;
            }else if (localName.equals("TD")) {
                    // Nothing to do here
            }
    }

    /** Gets be called on the following structure: 
     * <tag>characters</tag> */
    @Override
public void characters(char ch[], int start, int length) {
            if(this.in_TD){
            myParsedExampleDataSet.setExtractedString(new String(ch, start, length));
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

纵山崖 2024-12-18 10:44:04

要解析 HTML 页面,使用 dom 比 SAX 更容易。

您也可以查看这篇文章,它已回答如何解析html页面

to parse HTML page it is easier to use dom than SAX.

Also you can check this post it is answered how to parse html page

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文