在 VB.Net 中使用 HTMLAgilitypack 防止错误

发布于 2024-10-09 09:05:25 字数 1449 浏览 4 评论 0原文

我正在使用 HTMLAgilityPack 来解析 HTML 页面。然而，在某些时候，我尝试解析错误的数据（在这种特定情况下是图像），由于明显的原因，ofc 失败了。

Private Sub parseHtml(ByVal content As String, ByVal url As String)
    Try
        Dim contentHash As String = hashGenerator.ComputeHash(content, "SHA1")
        Dim doc As HtmlDocument = New HtmlDocument()

        doc.Load(New StringReader(content))

        Dim root As HtmlNode = doc.DocumentNode
        Dim anchorTags As New List(Of String)

        For Each link As HtmlNode In root.SelectNodes("//a")
            cururl = link.OuterHtml
            If link.Attributes("href") Is Nothing Then Continue For
            If Uri.IsWellFormedUriString(link.Attributes("href").Value, UriKind.Absolute) Then
                urlQueue.Enqueue(link.Attributes("href").Value)
            Else
                Dim myUri As New Uri(url)
                urlQueue.Enqueue(myUri.Scheme & "://" & myUri.Host & link.Attributes("href").Value)
            End If
        Next
    Catch ex As Exception
        MsgBox(ex.Message, MsgBoxStyle.Critical, "Error (parseHtml(" & url & "))")
    End Try
End Sub

我得到的错误是：

类型的第一次机会异常 '系统.NullReferenceException' 发生在 Webcrawler.exe 对象中未设置对实例的引用对象。

关于内容我尝试解析：

��我评价�+�：8�0�x�

在尝试解析内容之前如何检查内容是否“可解析”以防止错误？

目前，它是一个会弹出错误的图像，但我认为它可能只是任何不是 (x)html 的图像。

预先感谢伟大的社区:)

原文

I'm using the HTMLAgilityPack to parse HTML pages. However at some point I try to parse wrong data (in this specific case an image), which ofc fails for obvious reasons.

Private Sub parseHtml(ByVal content As String, ByVal url As String)
    Try
        Dim contentHash As String = hashGenerator.ComputeHash(content, "SHA1")
        Dim doc As HtmlDocument = New HtmlDocument()

        doc.Load(New StringReader(content))

        Dim root As HtmlNode = doc.DocumentNode
        Dim anchorTags As New List(Of String)

        For Each link As HtmlNode In root.SelectNodes("//a")
            cururl = link.OuterHtml
            If link.Attributes("href") Is Nothing Then Continue For
            If Uri.IsWellFormedUriString(link.Attributes("href").Value, UriKind.Absolute) Then
                urlQueue.Enqueue(link.Attributes("href").Value)
            Else
                Dim myUri As New Uri(url)
                urlQueue.Enqueue(myUri.Scheme & "://" & myUri.Host & link.Attributes("href").Value)
            End If
        Next
    Catch ex As Exception
        MsgBox(ex.Message, MsgBoxStyle.Critical, "Error (parseHtml(" & url & "))")
    End Try
End Sub

The error I get is: