使用 .NET 通过 HTTPS 下载文件(第 2 部分)

发布于 2024-08-20 13:53:06 字数 2841 浏览 5 评论 0原文

我必须定期在 Web 浏览器中手动执行以下操作:

  1. 转到 https 网站。
  2. 登录网络表单。
  3. 单击链接可下载大文件 (135MB)。

我想使用.NET 自动化这个过程。

几天前,我在这里发布了这个问题。感谢 Rubens Farias 的一段代码,我现在能够执行上述步骤 1 和 2。在步骤 2 之后,我能够读取包含要下载文件的 URL 的页面的 HTML(使用 afterLoginPage = reader .ReadToEnd())。仅当授予登录权限时才会显示此页面,因此验证步骤 2 是否成功。

我现在的问题是如何执行步骤 3。我尝试了一些方法,但无济于事,尽管之前登录成功,但对文件的访问被拒绝。

为了澄清事情,我将发布下面的代码,当然没有实际的登录信息和网站。最后,变量 afterLoginPage 包含登录后页面的 HTML,其中包含我要下载的文件的链接。这个链接显然也是以 https 开头的。

Dim httpsSite As String = "https://www.test.test/user/login"
' enter correct address
Dim formPage As String = ""
Dim afterLoginPage As String = ""

' Get postback data and cookies
Dim cookies As New CookieContainer()
Dim getRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
getRequest.CookieContainer = cookies
getRequest.Method = "GET"

Dim wp As WebProxy = New WebProxy("[our proxies IP address]", [our proxies port number])
wp.Credentials = CredentialCache.DefaultCredentials
getRequest.Proxy = wp

Dim form As HttpWebResponse = DirectCast(getRequest.GetResponse(), HttpWebResponse)
Using response As New StreamReader(form.GetResponseStream(), Encoding.UTF8)
    formPage = response.ReadToEnd()
End Using

Dim inputs As New Dictionary(Of String, String)()
inputs.Add("form_build_id", "[some code I'd like to keep secret]")
inputs.Add("form_id", "user_login")
For Each input As Match In Regex.Matches(formPage, "<input.*?name=""(?<name>.*?)"".*?(?:value=""(?<value>.*?)"".*?)? />", RegexOptions.IgnoreCase Or RegexOptions.ECMAScript)
    If input.Groups("name").Value <> "form_build_id" And _
       input.Groups("name").Value <> "form_id" Then
        inputs.Add(input.Groups("name").Value, input.Groups("value").Value)
    End If
Next

inputs("name") = "[our login name]"
inputs("pass") = "[our login password]"

Dim buffer As Byte() = Encoding.UTF8.GetBytes( _
[String].Join("&", _
Array.ConvertAll(Of KeyValuePair(Of String, String), String)(inputs.ToArray(), _
Function(item As KeyValuePair(Of String, String)) (item.Key & "=") + System.Web.HttpUtility.UrlEncode(item.Value))))

Dim postRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
postRequest.CookieContainer = cookies
postRequest.Method = "POST"
postRequest.ContentType = "application/x-www-form-urlencoded"
postRequest.Proxy = wp

' send username/password
Using stream As Stream = postRequest.GetRequestStream()
    stream.Write(buffer, 0, buffer.Length)
End Using

' get response from login page
Using reader As New StreamReader(postRequest.GetResponse().GetResponseStream(), Encoding.UTF8)
    afterLoginPage = reader.ReadToEnd()
End Using

On a regular basis I have to do the following manually in a web browser:

  1. Go to an https website.
  2. Logon on a webform.
  3. Click a link to download a large file (135MB).

I would like to automate this process using .NET.

Some days ago I posted this question here. Thanks to a piece of code by Rubens Farias I am now able to perform the above steps 1 and 2. After step 2 I am able to read the HTML of the page that contains the URL to the file to be downloaded (using afterLoginPage = reader.ReadToEnd()). This page only shows up if the login is granted, so step 2 is verified to be successful.

My question is now how of course how to perform step 3. I have tried some things, but to no avail, access to the file was denied despite of the successful previous login.

To clarify things I will post the code below, of course without the actual login information and websites. At the end, variable afterLoginPage contains the HTML of the post-login page, containing the link to the file I'd like to download. This link also starts with https obviously.

Dim httpsSite As String = "https://www.test.test/user/login"
' enter correct address
Dim formPage As String = ""
Dim afterLoginPage As String = ""

' Get postback data and cookies
Dim cookies As New CookieContainer()
Dim getRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
getRequest.CookieContainer = cookies
getRequest.Method = "GET"

Dim wp As WebProxy = New WebProxy("[our proxies IP address]", [our proxies port number])
wp.Credentials = CredentialCache.DefaultCredentials
getRequest.Proxy = wp

Dim form As HttpWebResponse = DirectCast(getRequest.GetResponse(), HttpWebResponse)
Using response As New StreamReader(form.GetResponseStream(), Encoding.UTF8)
    formPage = response.ReadToEnd()
End Using

Dim inputs As New Dictionary(Of String, String)()
inputs.Add("form_build_id", "[some code I'd like to keep secret]")
inputs.Add("form_id", "user_login")
For Each input As Match In Regex.Matches(formPage, "<input.*?name=""(?<name>.*?)"".*?(?:value=""(?<value>.*?)"".*?)? />", RegexOptions.IgnoreCase Or RegexOptions.ECMAScript)
    If input.Groups("name").Value <> "form_build_id" And _
       input.Groups("name").Value <> "form_id" Then
        inputs.Add(input.Groups("name").Value, input.Groups("value").Value)
    End If
Next

inputs("name") = "[our login name]"
inputs("pass") = "[our login password]"

Dim buffer As Byte() = Encoding.UTF8.GetBytes( _
[String].Join("&", _
Array.ConvertAll(Of KeyValuePair(Of String, String), String)(inputs.ToArray(), _
Function(item As KeyValuePair(Of String, String)) (item.Key & "=") + System.Web.HttpUtility.UrlEncode(item.Value))))

Dim postRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
postRequest.CookieContainer = cookies
postRequest.Method = "POST"
postRequest.ContentType = "application/x-www-form-urlencoded"
postRequest.Proxy = wp

' send username/password
Using stream As Stream = postRequest.GetRequestStream()
    stream.Write(buffer, 0, buffer.Length)
End Using

' get response from login page
Using reader As New StreamReader(postRequest.GetResponse().GetResponseStream(), Encoding.UTF8)
    afterLoginPage = reader.ReadToEnd()
End Using

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

夏了南城 2024-08-27 13:53:06

<罢工>
正如我在该问题的评论中所说,您只需要使用 DownloadFile 方法:

using(WebClient client = new WebClient())
    client.DownloadFile(
        "http://www.google.com/", "google_homepage.html");

只需替换 "http://www.google.com/" 与您的文件地址。

抱歉,您需要使用 HttpWebRequest

string fileAddress = "http://www.google.com/";
HttpWebRequest client = (HttpWebRequest)WebRequest.Create(fileAddress));
client.CookieContainer = cookies;
int read = 0;
byte[] buffer = new byte[1024];
using(FileStream download = 
  new FileStream("google_homepage.html", FileMode.Create))
{
    Stream stream = client.GetResponse().GetResponseStream();
    while((read = stream.Read(buffer, 0, buffer.Length)) != 0)
    {
        download.Write(buffer, 0, read);
    }
}


As I said into comments in that question, you just need to use DownloadFile method:

using(WebClient client = new WebClient())
    client.DownloadFile(
        "http://www.google.com/", "google_homepage.html");

Just replace "http://www.google.com/" with your file address.

Sorry, you need to go with HttpWebRequest:

string fileAddress = "http://www.google.com/";
HttpWebRequest client = (HttpWebRequest)WebRequest.Create(fileAddress));
client.CookieContainer = cookies;
int read = 0;
byte[] buffer = new byte[1024];
using(FileStream download = 
  new FileStream("google_homepage.html", FileMode.Create))
{
    Stream stream = client.GetResponse().GetResponseStream();
    while((read = stream.Read(buffer, 0, buffer.Length)) != 0)
    {
        download.Write(buffer, 0, read);
    }
}
回眸一笑 2024-08-27 13:53:06

下载文件时是否传递cookie?

Are you passing the cookies along when downloading the file?

暖树树初阳… 2024-08-27 13:53:06

您需要保留登录表单发回给您的会话/身份验证 cookie。基本上从身份验证表单的响应中获取 cookie,并在执行步骤 3 时将其发回。

这是扩展 Web 客户端的一种简单方法,它应该为您提供比上面的代码更简单的代码:

http://couldbedone.blogspot.com/2007/08/webclient-handling-cookies.html

只是:

  1. 创建此 CookieAwareWebClient 的实例
  2. 发布到登录表单
  3. 下载文件

You need to retain the session/authentication cookie that is sent back to you by the login form. Basically take the cookies from the response of the authentication form and send them back when you make the step 3.

This is an easy way to extend the Web Client, which should give you much simpler code than the one above:

http://couldbedone.blogspot.com/2007/08/webclient-handling-cookies.html

Just:

  1. Create instance of this CookieAwareWebClient
  2. Post to login form
  3. Download the file
零度℉ 2024-08-27 13:53:06

您也可以选择自动化 Internet-Explorer,而不是尝试通过 HTTPS 发送 Web 请求。
使用 Powershell 进行 Web 自动化 使用 PowerShell 解释了这一点,但您也可以这样做当将 Internet Explorer 作为 COM 对象访问时,在 C# 中使用此属性。
如果您只需要一个文件并且不需要担心内存泄漏,则此方法非常有效。

You could alternatively choose to automate the Internet-Explorer instead of trying to send Web requests via HTTPS.
Web automation with Powershell explains this using PowerShell, but you could also do this in C# when accessing Internet Explorer as a COM object.
This method works fairly well if you just need one file and do not need to be afraid of memory leaks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文