使用 C# 中的 asp.net 对 https:// 上任何站点的安全页面进行屏幕抓取

发布于 2024-08-25 08:33:05 字数 1211 浏览 7 评论 0原文

我已经通过下面的代码完成了 http 上任何网站的安全页面的网站抓取：

    string cookiedata = "fsfsfsdfsfsfsfsfsdf";
    NetworkCredential credential = new NetworkCredential("xxx", "xxx");

    HttpWebRequest request = HttpWebRequest.Create("https://ysats.com") as HttpWebRequest;

    //set the user agent so it looks like IE to not raise suspicion 
    request.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
    request.Method = "POST";
    //set the cookie in the request header
    request.Headers.Add("Cookie", cookiedata);
    request.Credentials = credential;

    //get the response from the server
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    using (Stream stream = response.GetResponseStream())
    {
        using (StreamReader reader = new StreamReader(stream))
        {
            string pagedata = reader.ReadToEnd();
            //now we can scrape the contents of the secure page as needed
            //since the page contents is now stored in our pagedata string
            Response.Write(pagedata);
        }
    }
    response.Close();

但是当我尝试通过此代码抓取 https:// 上的任何网站时，我总是抓取登录页面而不是安全页面，不是必需的页面。

请建议我应该怎么做才能抓取 https 上任何网站的安全页面。

原文

I've done site scraping of secure page of any site on http by below code:

    string cookiedata = "fsfsfsdfsfsfsfsfsdf";
    NetworkCredential credential = new NetworkCredential("xxx", "xxx");

    HttpWebRequest request = HttpWebRequest.Create("https://ysats.com") as HttpWebRequest;

    //set the user agent so it looks like IE to not raise suspicion 
    request.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
    request.Method = "POST";
    //set the cookie in the request header
    request.Headers.Add("Cookie", cookiedata);
    request.Credentials = credential;

    //get the response from the server
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    using (Stream stream = response.GetResponseStream())
    {
        using (StreamReader reader = new StreamReader(stream))
        {
            string pagedata = reader.ReadToEnd();
            //now we can scrape the contents of the secure page as needed
            //since the page contents is now stored in our pagedata string
            Response.Write(pagedata);
        }
    }
    response.Close();

but when I am trying to scrap any site on https:// by this code then i always scrape the login page not secure page not required page.

Please advice what should i do for scraping a secure page of any site on https.

分享到QQ

分享到微博