如何解析包含多个表的页面
关于如何抓取包含多个表的网页有什么想法吗? 我正在连接到网页
这是一个表,但在同一网页上有多个表
我也不知道如何读取该表...
XML:
<p><a href="/fantasy_news/feature/?ID=49818"><strong>Top 300 Overall Fantasy Rankings</strong></a></p>
<div class="storyStats">
<table>
<thead>
<tr>
<th>RANK</th>
<th>CENTRES</th>
<th>TEAM</th>
<th>POS</th>
<th>GP</th>
<th>G</th>
<th>A</th>
<th>PTS</th>
<th>+/-</th>
<th>PIM</th>
<th>PPP</th>
</tr>
</thead>
<tbody>
<tr class="bg1">
<td>1.</td>
<td><a href="/nhl/teams/players/?name=steven+stamkos">Steven Stamkos</a></td>
<td>Tampa Bay</td>
<td>C</td>
<td align="right">81</td>
<td align="right">50</td>
<td align="right">51</td>
<td align="right">101</td>
<td align="right">-2</td>
<td align="right">56</td>
<td align="right">38</td>
</tr>
Iterator<Element> trSIter = doc.select("table")
.iterator();
while (trSIter.hasNext()) {
Element trEl = trSIter.next().child(0);
Elements tdEls = trEl.children();
Iterator<Element> tdIter = tdEls.select("tr").iterator();
System.out.println("><1><><"+tdIter);
boolean firstRow = true;
while (tdIter.hasNext()) {
Element tr = (Element) tdIter.next();
while (tdIter.hasNext()) {
int tdCount = 1;
Element tdEl = tdIter.next();
//name = tdEl.getElementsByClass("playertablePlayerName").get(0).text();
Elements tdsEls = tdEl.select("td");
System.out.println("><2><><"+tdsEls);
Iterator<Element> columnIt = tdsEls.iterator();
while (columnIt.hasNext()) {
Element column = columnIt.next();
switch (tdCount++) {
case 1:
name =column.select("a").first().text();
break;
case 2:
stat2 = Double.parseDouble(column.text());
break;
case 3:
stat3 = Double.parseDouble(column.text());
break;
case 4:
stat4 = Double.parseDouble(column.text());
break;
case 5:
stat5 = Double.parseDouble(column.text());
break;
case 6:
stat6 = Double.parseDouble(column.text());
break;
case 7:
stat7 = Double.parseDouble(column.text());
break;
case 8:
stat8 = Double.parseDouble(column.text());
break;
Any idea on how to scrape a web page with multiple tables?
I am connecting to the web page
This is one table but on the same web page there are multiple tables
I also cant figure out how to read the table...
XML:
<p><a href="/fantasy_news/feature/?ID=49818"><strong>Top 300 Overall Fantasy Rankings</strong></a></p>
<div class="storyStats">
<table>
<thead>
<tr>
<th>RANK</th>
<th>CENTRES</th>
<th>TEAM</th>
<th>POS</th>
<th>GP</th>
<th>G</th>
<th>A</th>
<th>PTS</th>
<th>+/-</th>
<th>PIM</th>
<th>PPP</th>
</tr>
</thead>
<tbody>
<tr class="bg1">
<td>1.</td>
<td><a href="/nhl/teams/players/?name=steven+stamkos">Steven Stamkos</a></td>
<td>Tampa Bay</td>
<td>C</td>
<td align="right">81</td>
<td align="right">50</td>
<td align="right">51</td>
<td align="right">101</td>
<td align="right">-2</td>
<td align="right">56</td>
<td align="right">38</td>
</tr>
Iterator<Element> trSIter = doc.select("table")
.iterator();
while (trSIter.hasNext()) {
Element trEl = trSIter.next().child(0);
Elements tdEls = trEl.children();
Iterator<Element> tdIter = tdEls.select("tr").iterator();
System.out.println("><1><><"+tdIter);
boolean firstRow = true;
while (tdIter.hasNext()) {
Element tr = (Element) tdIter.next();
while (tdIter.hasNext()) {
int tdCount = 1;
Element tdEl = tdIter.next();
//name = tdEl.getElementsByClass("playertablePlayerName").get(0).text();
Elements tdsEls = tdEl.select("td");
System.out.println("><2><><"+tdsEls);
Iterator<Element> columnIt = tdsEls.iterator();
while (columnIt.hasNext()) {
Element column = columnIt.next();
switch (tdCount++) {
case 1:
name =column.select("a").first().text();
break;
case 2:
stat2 = Double.parseDouble(column.text());
break;
case 3:
stat3 = Double.parseDouble(column.text());
break;
case 4:
stat4 = Double.parseDouble(column.text());
break;
case 5:
stat5 = Double.parseDouble(column.text());
break;
case 6:
stat6 = Double.parseDouble(column.text());
break;
case 7:
stat7 = Double.parseDouble(column.text());
break;
case 8:
stat8 = Double.parseDouble(column.text());
break;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用下面的代码,从 HTML 解析表格似乎没有问题。
该代码可以在 textView 中显示 5,这是该 HTML 中 storyStats 类下的表格数量。如果您必须继续解析表的内容,您可以将表分配给另一个 Elements 对象并继续解析它。
安德森的答案展示了如何解析它的数据。希望有帮助。
With the below code, it seems there is no problem in parsing the tables from the HTML.
The code can show 5 in textView which is the number of tables you have in that HTML under the class storyStats. If you have to go ahead parsing the contents of the tables, you can assign the tables into another Elements object and go ahead parsing it.
Anderson's answer shows how to parse it for data. Hope that helps.
这应该可以帮助您开始。每个表都有一个空白记录,您必须考虑这一点。您还必须弄清楚您想要哪些统计数据以及它们在表格中的位置。您可以使用 tds.get() 获取统计信息。让我知道它对您有何作用。
This should get you started. Each table has a blank record you will have to account for. You will also have to figure out which stats you want and where they are in the table. You get the stats with
tds.get()
. Let me know how it works for you.