在asp.net vb中将html中的表格提取到htmltable中(htmlagilitypack)
我试图从远程页面获取一个 html 表,并在我的网站上的 htmltable 中显示该表的内容。我正在使用 htmlagility 包。到目前为止,这是我的代码:
Imports HtmlAgilityPack
Partial Class ContentGrabExperiment
Inherits System.Web.UI.Page
Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
'fetch the remote html page
Dim web As New HtmlWeb()
Dim html As HtmlAgilityPack.HtmlDocument = web.Load("http://www.thesite.com/page.html")
'Create table
Dim outputTable As New HtmlTable
Dim tableRow As New HtmlTableRow
Dim tableCell As New HtmlTableCell
'Target the <table> tag
For Each table As HtmlNode In html.DocumentNode.SelectNodes("//table")
'Target the <tr> tags within the table
For Each row As HtmlNode In table.SelectNodes("//tr")
'Target the <td> tags within the <tr> tags
For Each cell As HtmlNode In row.SelectNodes("//td")
'Set the value to that of the <td>
tableCell.InnerText = cell.InnerHtml
'Add the cell to the row
tableRow.Cells.Add(tableCell)
Next
'Add row to the outputTable
outputTable.Rows.Add(tableRow)
Next
Next
'Add the table to the page
PlaceHolderTable.Controls.Add(outputTable)
End Sub
End Class
由此,我期望从页面中获取包含内部文本的完整表格,作为我可以操作的 htmltable。我从这段代码中得到的是:
<table>
<tr>
<td>&nbsp;</td>
</tr>
</table>
请有人指出我的语法哪里出了问题。非常感谢任何帮助!
I am trying to grab a html table from a remote page and display the contents of this table in a htmltable on my site. I am using htmlagility pack. So far here is my code:
Imports HtmlAgilityPack
Partial Class ContentGrabExperiment
Inherits System.Web.UI.Page
Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
'fetch the remote html page
Dim web As New HtmlWeb()
Dim html As HtmlAgilityPack.HtmlDocument = web.Load("http://www.thesite.com/page.html")
'Create table
Dim outputTable As New HtmlTable
Dim tableRow As New HtmlTableRow
Dim tableCell As New HtmlTableCell
'Target the <table> tag
For Each table As HtmlNode In html.DocumentNode.SelectNodes("//table")
'Target the <tr> tags within the table
For Each row As HtmlNode In table.SelectNodes("//tr")
'Target the <td> tags within the <tr> tags
For Each cell As HtmlNode In row.SelectNodes("//td")
'Set the value to that of the <td>
tableCell.InnerText = cell.InnerHtml
'Add the cell to the row
tableRow.Cells.Add(tableCell)
Next
'Add row to the outputTable
outputTable.Rows.Add(tableRow)
Next
Next
'Add the table to the page
PlaceHolderTable.Controls.Add(outputTable)
End Sub
End Class
From this I was expecting to get the full table with innertext from the page, as a htmltable which I can then manipulate. What I get out of this code is:
<table>
<tr>
<td> </td>
</tr>
</table>
Please can someone point out where I am going wrong with my syntax. Any help much appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
1)你只有一个TableRow和一个TableCell。您需要为每一行/单元格创建一个新的。您可以重复使用这些变量,但您需要在其中“新建”一个对象。
2) 您可能需要选择
./tr
和./td
以仅获取当前表/行中的行和单元格。1) You only have one TableRow and one TableCell. You will need to create a new one for each row/cell. You can re-use the variables but you will need to "New" an object into them.
2) You might need to select
./tr
and./td
to get only rows and cells in the current table / row.