使用相同的 CookieContainer 进行冲浪

发布于 2024-07-14 16:42:42 字数 47 浏览 14 评论 0原文

如何为每个 Web 请求分配相同的 CookieContainer 来浏览网站?

How can you surf on a website assigning the same CookieContainer to each web request?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。



需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。


弱骨蛰伏 2024-07-21 16:42:42

这是我几年前写的一堂课。 它并不完全完整,并且是在我完全理解一切如何工作之前完成的(例如,它不能正确编码复杂的 POST 数据),但它确实可以很好地解决所有缺陷,并且它将演示如何保留 cookie 容器。 它也在 VB.Net 中,但您可以将其构建到单独的程序集中,或者如果需要,可以通过翻译器运行它:

Imports System.Net
Imports System.Collections.Generic

Public Class WebScraper
    Public Sub New()
        SetUserAgent(UserAgent.IE6SP1) 'default agent
    End Sub

#Region "Cookies"
    Private Cookies As New CookieContainer()

    Public Sub AddCookie(ByVal Name As String, ByVal data As String, Optional ByVal path As String = "", Optional ByVal domain As String = "")
        Dim ck As New Cookie(Name, data, path, domain)
    End Sub
    Public Sub AddCookie(ByRef cookie As Cookie)
    End Sub

    Public Sub ResetSession()
        Cookies = New CookieContainer()
        'TODO: Add other session reset code here
    End Sub

    Public Function GetCookies(ByVal uri As System.Uri) As System.Net.CookieCollection
        Return Cookies.GetCookies(uri)
    End Function
    Public Function GetCookies(ByVal url As String) As Net.CookieCollection
        Dim url2 As Uri = Nothing
        If Uri.TryCreate(url, UriKind.Absolute, url2) Then
            Return Cookies.GetCookies(url2)
            Return Nothing
        End If
    End Function
#End Region

    Public Property TimeOut() As UInteger
            Return _TimeOut
        End Get
        Set(ByVal value As UInteger)
            _TimeOut = value
        End Set
    End Property
    Private _TimeOut As UInteger = 100000 ''//100000 matches default used by httprequest if none is specified

    Public Property PageEncoding() As System.Text.Encoding
            Return _PageEncoding
        End Get
        Set(ByVal value As System.Text.Encoding)
            _PageEncoding = value
        End Set
    End Property
    Private _PageEncoding As System.Text.Encoding = System.Text.Encoding.UTF8

#Region "UserAgents"
    ' TODO: Update this for FF3, add GoogleBot
    ' TODO: Move to separate class with distinct sub-types (eg: UserAgents.IE.6XP or UserAgents.FF.2XP, classes that overload .ToString())
    Public Enum UserAgent
    End Enum

    Public Sub SetUserAgent(ByVal UserAgent As UserAgent)
        Select Case UserAgent
            Case WebScraper.UserAgent.FF2_Linux
                Agent = "Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20070713 Firefox/"
            Case WebScraper.UserAgent.FF2_Mac
                Agent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1) Gecko/20070713 Firefox/"
            Case WebScraper.UserAgent.FF2_Vista
                Agent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv: Gecko/20070713 Firefox/"
            Case WebScraper.UserAgent.FF2_XP
                Agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20070713 Firefox/"
            Case WebScraper.UserAgent.IE6SP1
                Agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
            Case WebScraper.UserAgent.IE7_Vista
                Agent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
            Case WebScraper.UserAgent.IE7_XP
                Agent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
            Case WebScraper.UserAgent.Safari
                Agent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/522.12.1 (KHTML, like Gecko) Safari/522.12.1"
        End Select
    End Sub

    Public Sub SetUserAgent(ByVal UserAgent As String)
        Agent = UserAgent
    End Sub

    'defaults to IE6 SP1
    ' TODO: Choose a better default
    Private Agent As String
#End Region

#Region "Get Page"
    Public Function GetPage(ByVal URL As Uri, Optional ByVal PostData As String = "") As String
        Dim reader As IO.StreamReader = Nothing
            reader = New System.IO.StreamReader(SendRequest(URL, PostData).GetResponseStream, PageEncoding)
            GetPage = reader.ReadToEnd()
            GetPage = ""
            End Try
        End Try
    End Function
    Public Function GetPage(ByVal URL As String, Optional ByVal PostData As String = "") As String
        Dim URL2 As Uri = Nothing
        If Uri.TryCreate(URL, UriKind.Absolute, URL2) Then
            Return GetPage(URL2, PostData)
            Return ""
        End If
    End Function
    Public Function GetPage(ByVal URL As String, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As String
        Return GetPage(URL, PrepPostData(PostData))
    End Function
    Public Function GetPage(ByVal URL As Uri, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As String
        Return GetPage(URL, PrepPostData(PostData))
    End Function
#End Region

#Region "Get Response"
    Public Function GetResponse(ByVal URL As Uri, Optional ByVal Postdata As String = "") As Object
        Dim x As HttpWebResponse = SendRequest(URL, Postdata)
        If x.ContentType.Contains("text") Then
            Dim result As String
            Dim reader As IO.StreamReader = Nothing
                reader = New System.IO.StreamReader(x.GetResponseStream, System.Text.Encoding.UTF8) ' TODO: figure out how to detect actual encoding
                result = reader.ReadToEnd()
                result = ""
                End Try
            End Try
            Return result
        ElseIf x.ContentType.Contains("image") Then
            Dim result As Drawing.Image
                result = System.Drawing.Image.FromStream(x.GetResponseStream)
                result = Nothing
            End Try
            Return result
            Return x.GetResponseStream
        End If
    End Function
    Public Function GetResponse(ByVal URL As Uri, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As Object
        Return GetResponse(URL, PrepPostData(PostData))
    End Function
    Public Function GetResponse(ByVal URL As String, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As Object
        Return GetResponse(URL, PrepPostData(PostData))
    End Function
    Public Function GetResponse(ByVal URL As String, Optional ByVal PostData As String = "") As Object
        Dim URL2 As Uri = Nothing
        If Uri.TryCreate(URL, UriKind.Absolute, URL2) Then
            Return GetResponse(URL2, PostData)
            Return Nothing
        End If
    End Function
#End Region

#Region "SaveResponseToFile"
    Function SaveResponseToFile(ByVal FullFileName As String, ByVal URL As Uri, Optional ByVal PostData As String = "") As Boolean
            Dim x As New IO.BinaryReader(SendRequest(URL, PostData).GetResponseStream)
            Dim y As New IO.FileStream(FullFileName, IO.FileMode.Create)
            Dim z As New IO.BinaryWriter(y)

            Try ' TODO: I can do better here
                While True
                End While
            Catch ' continue
            End Try

            Return False
        End Try
        Return True
    End Function
    Function SaveResponseToFile(ByVal FullFileName As String, ByVal URL As String, Optional ByVal PostData As String = "") As Boolean
        Dim URL2 As Uri = Nothing
        If Uri.TryCreate(URL, UriKind.Absolute, URL2) Then
            Return SaveResponseToFile(FullFileName, URL2, PostData)
        Else : Return False
        End If
    End Function
    Function SaveResponseToFile(ByVal FullFileName As String, ByVal URL As String, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As Boolean
        Return SaveResponseToFile(FullFileName, URL, PrepPostData(PostData))
    End Function
    Function SaveResponseToFile(ByVal FullFileName As String, ByVal URL As Uri, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As Boolean
        Return SaveResponseToFile(FullFileName, URL, PrepPostData(PostData))
    End Function
#End Region

#Region "Get Image"
    Public Function GetImage(ByVal URL As String) As System.Drawing.Image
            GetImage = System.Drawing.Image.FromStream(SendRequest(URL).GetResponseStream)
            GetImage = Nothing
        End Try
    End Function
    Public Function GetImage(ByVal URL As Uri) As System.Drawing.Image
            GetImage = System.Drawing.Image.FromStream(SendRequest(URL).GetResponseStream)
            GetImage = Nothing
        End Try
    End Function
#End Region

#Region "PostToURL"
    Public Sub PostToURL(ByVal URL As String, Optional ByVal PostData As String = "")
        SendRequest(URL, PostData)
    End Sub
    Public Sub PostToURL(ByVal URL As Uri, Optional ByVal PostData As String = "")
        SendRequest(URL, PostData)
    End Sub
    Public Sub PostToURL(ByVal URL As String, ByRef PostData As Dictionary(Of String, String))
        PostToURL(URL, PrepPostData(PostData))
    End Sub
    Public Sub PostToURL(ByVal URL As Uri, ByRef PostData As Dictionary(Of String, String))
        PostToURL(URL, PrepPostData(PostData))
    End Sub
#End Region

#Region "Private Methods"
    Private Function PrepPostData(ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As String
        PrepPostData = ""  ' TODO: properly encode post data
        For Each pair As KeyValuePair(Of String, String) In PostData
            PrepPostData += pair.Key & "=" & pair.Value & "&"
        Next pair
        PrepPostData = PrepPostData.Remove(PrepPostData.Length - 1)
    End Function

    Private Function SendRequest(ByVal URL As String, Optional ByVal PostData As String = "") As HttpWebResponse
        Dim URL2 As Uri = Nothing
        If Uri.TryCreate(URL, UriKind.Absolute, URL2) Then
            Return SendRequest(URL2, PostData)
            Return Nothing
        End If
    End Function
    Private Function SendRequest(ByVal URL As Uri, Optional ByVal PostData As String = "") As HttpWebResponse
        Dim Request As HttpWebRequest = HttpWebRequest.Create(URL)

        Request.CookieContainer = Cookies
        Request.Timeout = TimeOut
        Request.UserAgent = Agent

        If PostData.Length > 0 Then
            Request.Method = "POST" ' TODO: allow explicitly setting METHOD and Content-type for request via properties
            Request.ContentType = "application/x-www-form-urlencoded"
            Dim sw As New IO.StreamWriter(Request.GetRequestStream())
        End If

        Return Request.GetResponse()
    End Function
#End Region
End Class

轻微更新的 C# 版本是 现已在 GitHub 上,包括更新的用户代理。 它也不太可能仅仅接受异常。

This is a class I wrote a few years back. It's not quite complete and was done before I fully understood how everything works (It doesn't properly encode complex POST data, for example), but it does work pretty well for the all the flaws and it will demonstrate how you can keep your cookiecontainer. It's also in VB.Net, but you can just build that into a separate assembly or run it through a translator if you need to:

Imports System.Net
Imports System.Collections.Generic

Public Class WebScraper
    Public Sub New()
        SetUserAgent(UserAgent.IE6SP1) 'default agent
    End Sub

#Region "Cookies"
    Private Cookies As New CookieContainer()

    Public Sub AddCookie(ByVal Name As String, ByVal data As String, Optional ByVal path As String = "", Optional ByVal domain As String = "")
        Dim ck As New Cookie(Name, data, path, domain)
    End Sub
    Public Sub AddCookie(ByRef cookie As Cookie)
    End Sub

    Public Sub ResetSession()
        Cookies = New CookieContainer()
        'TODO: Add other session reset code here
    End Sub

    Public Function GetCookies(ByVal uri As System.Uri) As System.Net.CookieCollection
        Return Cookies.GetCookies(uri)
    End Function
    Public Function GetCookies(ByVal url As String) As Net.CookieCollection
        Dim url2 As Uri = Nothing
        If Uri.TryCreate(url, UriKind.Absolute, url2) Then
            Return Cookies.GetCookies(url2)
            Return Nothing
        End If
    End Function
#End Region

    Public Property TimeOut() As UInteger
            Return _TimeOut
        End Get
        Set(ByVal value As UInteger)
            _TimeOut = value
        End Set
    End Property
    Private _TimeOut As UInteger = 100000 ''//100000 matches default used by httprequest if none is specified

    Public Property PageEncoding() As System.Text.Encoding
            Return _PageEncoding
        End Get
        Set(ByVal value As System.Text.Encoding)
            _PageEncoding = value
        End Set
    End Property
    Private _PageEncoding As System.Text.Encoding = System.Text.Encoding.UTF8

#Region "UserAgents"
    ' TODO: Update this for FF3, add GoogleBot
    ' TODO: Move to separate class with distinct sub-types (eg: UserAgents.IE.6XP or UserAgents.FF.2XP, classes that overload .ToString())
    Public Enum UserAgent
    End Enum

    Public Sub SetUserAgent(ByVal UserAgent As UserAgent)
        Select Case UserAgent
            Case WebScraper.UserAgent.FF2_Linux
                Agent = "Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20070713 Firefox/"
            Case WebScraper.UserAgent.FF2_Mac
                Agent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1) Gecko/20070713 Firefox/"
            Case WebScraper.UserAgent.FF2_Vista
                Agent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv: Gecko/20070713 Firefox/"
            Case WebScraper.UserAgent.FF2_XP
                Agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20070713 Firefox/"
            Case WebScraper.UserAgent.IE6SP1
                Agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
            Case WebScraper.UserAgent.IE7_Vista
                Agent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
            Case WebScraper.UserAgent.IE7_XP
                Agent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
            Case WebScraper.UserAgent.Safari
                Agent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/522.12.1 (KHTML, like Gecko) Safari/522.12.1"
        End Select
    End Sub

    Public Sub SetUserAgent(ByVal UserAgent As String)
        Agent = UserAgent
    End Sub

    'defaults to IE6 SP1
    ' TODO: Choose a better default
    Private Agent As String
#End Region

#Region "Get Page"
    Public Function GetPage(ByVal URL As Uri, Optional ByVal PostData As String = "") As String
        Dim reader As IO.StreamReader = Nothing
            reader = New System.IO.StreamReader(SendRequest(URL, PostData).GetResponseStream, PageEncoding)
            GetPage = reader.ReadToEnd()
            GetPage = ""
            End Try
        End Try
    End Function
    Public Function GetPage(ByVal URL As String, Optional ByVal PostData As String = "") As String
        Dim URL2 As Uri = Nothing
        If Uri.TryCreate(URL, UriKind.Absolute, URL2) Then
            Return GetPage(URL2, PostData)
            Return ""
        End If
    End Function
    Public Function GetPage(ByVal URL As String, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As String
        Return GetPage(URL, PrepPostData(PostData))
    End Function
    Public Function GetPage(ByVal URL As Uri, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As String
        Return GetPage(URL, PrepPostData(PostData))
    End Function
#End Region

#Region "Get Response"
    Public Function GetResponse(ByVal URL As Uri, Optional ByVal Postdata As String = "") As Object
        Dim x As HttpWebResponse = SendRequest(URL, Postdata)
        If x.ContentType.Contains("text") Then
            Dim result As String
            Dim reader As IO.StreamReader = Nothing
                reader = New System.IO.StreamReader(x.GetResponseStream, System.Text.Encoding.UTF8) ' TODO: figure out how to detect actual encoding
                result = reader.ReadToEnd()
                result = ""
                End Try
            End Try
            Return result
        ElseIf x.ContentType.Contains("image") Then
            Dim result As Drawing.Image
                result = System.Drawing.Image.FromStream(x.GetResponseStream)
                result = Nothing
            End Try
            Return result
            Return x.GetResponseStream
        End If
    End Function
    Public Function GetResponse(ByVal URL As Uri, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As Object
        Return GetResponse(URL, PrepPostData(PostData))
    End Function
    Public Function GetResponse(ByVal URL As String, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As Object
        Return GetResponse(URL, PrepPostData(PostData))
    End Function
    Public Function GetResponse(ByVal URL As String, Optional ByVal PostData As String = "") As Object
        Dim URL2 As Uri = Nothing
        If Uri.TryCreate(URL, UriKind.Absolute, URL2) Then
            Return GetResponse(URL2, PostData)
            Return Nothing
        End If
    End Function
#End Region

#Region "SaveResponseToFile"
    Function SaveResponseToFile(ByVal FullFileName As String, ByVal URL As Uri, Optional ByVal PostData As String = "") As Boolean
            Dim x As New IO.BinaryReader(SendRequest(URL, PostData).GetResponseStream)
            Dim y As New IO.FileStream(FullFileName, IO.FileMode.Create)
            Dim z As New IO.BinaryWriter(y)

            Try ' TODO: I can do better here
                While True
                End While
            Catch ' continue
            End Try

            Return False
        End Try
        Return True
    End Function
    Function SaveResponseToFile(ByVal FullFileName As String, ByVal URL As String, Optional ByVal PostData As String = "") As Boolean
        Dim URL2 As Uri = Nothing
        If Uri.TryCreate(URL, UriKind.Absolute, URL2) Then
            Return SaveResponseToFile(FullFileName, URL2, PostData)
        Else : Return False
        End If
    End Function
    Function SaveResponseToFile(ByVal FullFileName As String, ByVal URL As String, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As Boolean
        Return SaveResponseToFile(FullFileName, URL, PrepPostData(PostData))
    End Function
    Function SaveResponseToFile(ByVal FullFileName As String, ByVal URL As Uri, ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As Boolean
        Return SaveResponseToFile(FullFileName, URL, PrepPostData(PostData))
    End Function
#End Region

#Region "Get Image"
    Public Function GetImage(ByVal URL As String) As System.Drawing.Image
            GetImage = System.Drawing.Image.FromStream(SendRequest(URL).GetResponseStream)
            GetImage = Nothing
        End Try
    End Function
    Public Function GetImage(ByVal URL As Uri) As System.Drawing.Image
            GetImage = System.Drawing.Image.FromStream(SendRequest(URL).GetResponseStream)
            GetImage = Nothing
        End Try
    End Function
#End Region

#Region "PostToURL"
    Public Sub PostToURL(ByVal URL As String, Optional ByVal PostData As String = "")
        SendRequest(URL, PostData)
    End Sub
    Public Sub PostToURL(ByVal URL As Uri, Optional ByVal PostData As String = "")
        SendRequest(URL, PostData)
    End Sub
    Public Sub PostToURL(ByVal URL As String, ByRef PostData As Dictionary(Of String, String))
        PostToURL(URL, PrepPostData(PostData))
    End Sub
    Public Sub PostToURL(ByVal URL As Uri, ByRef PostData As Dictionary(Of String, String))
        PostToURL(URL, PrepPostData(PostData))
    End Sub
#End Region

#Region "Private Methods"
    Private Function PrepPostData(ByRef PostData As IEnumerable(Of KeyValuePair(Of String, String))) As String
        PrepPostData = ""  ' TODO: properly encode post data
        For Each pair As KeyValuePair(Of String, String) In PostData
            PrepPostData += pair.Key & "=" & pair.Value & "&"
        Next pair
        PrepPostData = PrepPostData.Remove(PrepPostData.Length - 1)
    End Function

    Private Function SendRequest(ByVal URL As String, Optional ByVal PostData As String = "") As HttpWebResponse
        Dim URL2 As Uri = Nothing
        If Uri.TryCreate(URL, UriKind.Absolute, URL2) Then
            Return SendRequest(URL2, PostData)
            Return Nothing
        End If
    End Function
    Private Function SendRequest(ByVal URL As Uri, Optional ByVal PostData As String = "") As HttpWebResponse
        Dim Request As HttpWebRequest = HttpWebRequest.Create(URL)

        Request.CookieContainer = Cookies
        Request.Timeout = TimeOut
        Request.UserAgent = Agent

        If PostData.Length > 0 Then
            Request.Method = "POST" ' TODO: allow explicitly setting METHOD and Content-type for request via properties
            Request.ContentType = "application/x-www-form-urlencoded"
            Dim sw As New IO.StreamWriter(Request.GetRequestStream())
        End If

        Return Request.GetResponse()
    End Function
#End Region
End Class

A lightly-updated C# version is now on GitHub, including a more-recent User Agent. It's also less likely to just swallow exceptions.

美羊羊 2024-07-21 16:42:42

CookieContainer 的设计就像浏览器 cookie 存储一样。 因此它可以在任何站点中包含 cookie,因为它将处理域路径以及过期时间。 我可以看出,同一个容器中可以有任何域,因此任何 Web 请求也应该可以有相同的容器。

您应该注意到 CookieContainer 在 .Add(Cookie) 和 .GetCookies(uri) 方法上有一个错误。


http ://dot-net-expertise.blogspot.com/2009/10/cookiecontainer-domain-handling-bug-fix.html

CookieContainer is designed just like browser cookie store. So it can contain cookies in any sites because it will handle the domain path as well as expiry. I can tell that it possible to have any domain in the same container, so any web request should also possible to have the same container.

You should notice that CookieContainer has a bug on .Add(Cookie) and .GetCookies(uri) method.

See the details and fix here:


我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。