如何解析表格第三列的单元格?
我正在尝试使用 Jsoup 解析 第三列的单元格。
这是 HTML:
<b><table title="Avgångar:" class="tableMenuCell" cellspacing="0" cellpadding="4" border="0" id="GridViewForecasts" style="color:#333333;width:470px;border-collapse:collapse;">
<tr class="darkblue_pane" style="color:White;font-weight:bold;">
<th scope="col">Linje</th>
<th scope="col">Destination</th>
<th scope="col">Nästa tur (min)</th>
<th scope="col"> </th>
<th scope="col">Därefter</th>
<th scope="col"> </th>
</tr>
<tr class="white_pane" style="color:#333333;">
<td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
<td align="left">Hovshaga Kurortsv.</td><td align="right">55</td>
<td align="left"></td>
<td align="right">--</td>
<td align="left"></td>
</tr>
<tr class="lightblue_pane" style="color:#284775;">
<td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
<td align="left">Hovshaga via Resecentrum</td><td align="right">21</td>
<td align="left"></td><td align="right">--</td>
<td align="left"></td>
</tr>
<tr class="white_pane" style="color:#333333;">
<td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
<td align="left">Teleborg</td><td align="right">5</td>
<td align="left"></td><td align="right">45</td><td align="left"></td>
</tr>
</table></b>
这是我的代码尝试,它抛出 NullPointerException:
URL url = null;
try {
url = new URL("http://wap.nastabuss.se/its4wap/QueryForm.aspx?hpl=Teleborg+C+(V%C3%A4xj%C3%B6)");
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("1");
Document doc = null;
try {
System.out.println("2");
doc = Jsoup.parse(url, 3000);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("3");
Element table = doc.select("table[title=Avgångar:]").first();
System.out.println("3");
Iterator<Element> it = table.select("td").iterator();
//we know the third td element is where we wanna start so we call .next twice
it.next();
it.next();
while(it.hasNext()){
// do what ever you want with the td element here
System.out.println("::::::::::"+it.next());
//iterate three times to get to the next td you want. checking after the first
// one to make sure
// we're not at the end of the table.
it.next();
if(!it.hasNext()){
break;
}
it.next();
it.next();
}
它一直持续到第二个 System.Out.Println("3");
,然后卡住了。
I am trying to parse the cells of the 3rd column of a <table>
using Jsoup.
Here is the HTML:
<b><table title="Avgångar:" class="tableMenuCell" cellspacing="0" cellpadding="4" border="0" id="GridViewForecasts" style="color:#333333;width:470px;border-collapse:collapse;">
<tr class="darkblue_pane" style="color:White;font-weight:bold;">
<th scope="col">Linje</th>
<th scope="col">Destination</th>
<th scope="col">Nästa tur (min)</th>
<th scope="col"> </th>
<th scope="col">Därefter</th>
<th scope="col"> </th>
</tr>
<tr class="white_pane" style="color:#333333;">
<td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
<td align="left">Hovshaga Kurortsv.</td><td align="right">55</td>
<td align="left"></td>
<td align="right">--</td>
<td align="left"></td>
</tr>
<tr class="lightblue_pane" style="color:#284775;">
<td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
<td align="left">Hovshaga via Resecentrum</td><td align="right">21</td>
<td align="left"></td><td align="right">--</td>
<td align="left"></td>
</tr>
<tr class="white_pane" style="color:#333333;">
<td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
<td align="left">Teleborg</td><td align="right">5</td>
<td align="left"></td><td align="right">45</td><td align="left"></td>
</tr>
</table></b>
Here is my code attempt which throws a NullPointerException
:
URL url = null;
try {
url = new URL("http://wap.nastabuss.se/its4wap/QueryForm.aspx?hpl=Teleborg+C+(V%C3%A4xj%C3%B6)");
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("1");
Document doc = null;
try {
System.out.println("2");
doc = Jsoup.parse(url, 3000);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("3");
Element table = doc.select("table[title=Avgångar:]").first();
System.out.println("3");
Iterator<Element> it = table.select("td").iterator();
//we know the third td element is where we wanna start so we call .next twice
it.next();
it.next();
while(it.hasNext()){
// do what ever you want with the td element here
System.out.println("::::::::::"+it.next());
//iterate three times to get to the next td you want. checking after the first
// one to make sure
// we're not at the end of the table.
it.next();
if(!it.hasNext()){
break;
}
it.next();
it.next();
}
It goes till the second System.Out.Println("3");
and then it stucks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这种方法相当混乱,而且您没有告诉任何有关 NPE 发生在哪一行的信息,因此很难直接回答您的问题。
除此之外,我建议不要采用困难且容易出错的方式。由于
已经有一个
id
属性,该属性在整个文档中应该是唯一的,因此只需使用 ID 选择器#someid
即可。此外,您可以使用索引选择器:eq(index)
获取第三列的单元格(注意:它是从零开始的!)。因此,这几行简单的代码应该可以做到这一点:
这会导致:
就是这样。
我强烈建议花一些时间来正确学习 CSS 选择器语法,因为 Jsoup 是围绕它构建的。
另请参阅:
选择器
APIThis approach is quite a mess and you didn't tell anything about at which line the NPE occurred, so it's hard to give a straight answer to your question.
Apart from that, I would suggest to not do it the hard and error prone way. As that
<table>
has already anid
attribute which is supposed to be unique throughout the document, just use the ID selector#someid
. Further, you can get the cells of the 3rd column using the index selector:eq(index)
(note: it's zero based!).So, those few of simple lines should do it:
which results here in:
That's it.
I strongly recommend to invest some time in properly learning the CSS selector syntax as Jsoup is build around it.
See also:
Selector
API我认为最好的解决方案是使用
get();
方法从多个元素
中获取单个元素
。希望它会有所帮助。
I think the best solution is to use
get();
method to get singleelement
from number ofelements
.Hope it will help.