使用 C# 中的 asp.net 对 https:// 上任何站点的安全页面进行屏幕抓取

发布于 2024-08-25 08:33:05 字数 1211 浏览 7 评论 0原文

我已经通过下面的代码完成了 http 上任何网站的安全页面的网站抓取:

    string cookiedata = "fsfsfsdfsfsfsfsfsdf";
    NetworkCredential credential = new NetworkCredential("xxx", "xxx");

    HttpWebRequest request = HttpWebRequest.Create("https://ysats.com") as HttpWebRequest;

    //set the user agent so it looks like IE to not raise suspicion 
    request.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
    request.Method = "POST";
    //set the cookie in the request header
    request.Headers.Add("Cookie", cookiedata);
    request.Credentials = credential;

    //get the response from the server
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    using (Stream stream = response.GetResponseStream())
    {
        using (StreamReader reader = new StreamReader(stream))
        {
            string pagedata = reader.ReadToEnd();
            //now we can scrape the contents of the secure page as needed
            //since the page contents is now stored in our pagedata string
            Response.Write(pagedata);
        }
    }
    response.Close();

但是当我尝试通过此代码抓取 https:// 上的任何网站时,我总是抓取登录页面而不是安全页面,不是必需的页面。

请建议我应该怎么做才能抓取 https 上任何网站的安全页面。

I've done site scraping of secure page of any site on http by below code:

    string cookiedata = "fsfsfsdfsfsfsfsfsdf";
    NetworkCredential credential = new NetworkCredential("xxx", "xxx");

    HttpWebRequest request = HttpWebRequest.Create("https://ysats.com") as HttpWebRequest;

    //set the user agent so it looks like IE to not raise suspicion 
    request.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
    request.Method = "POST";
    //set the cookie in the request header
    request.Headers.Add("Cookie", cookiedata);
    request.Credentials = credential;

    //get the response from the server
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    using (Stream stream = response.GetResponseStream())
    {
        using (StreamReader reader = new StreamReader(stream))
        {
            string pagedata = reader.ReadToEnd();
            //now we can scrape the contents of the secure page as needed
            //since the page contents is now stored in our pagedata string
            Response.Write(pagedata);
        }
    }
    response.Close();

but when I am trying to scrap any site on https:// by this code then i always scrape the login page not secure page not required page.

Please advice what should i do for scraping a secure page of any site on https.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

风苍溪 2024-09-01 08:33:05

您需要发送包含网站登录详细信息的 POST 请求,然后抓取登录后的页面。您还必须确保您的 WebClient 保留 cookie。

这将不可避免地因站点而异(字段的名称、需要哪些信息等),因此您将无法开发一揽子解决方案,并且您必须检查登录是否失败,否则您将最终再次抓取登录页面。

另请参阅此重复问题

You need to send a POST request with login details for the website, then scrape the page following the login. You'd also have to make sure your WebClient keeps cookies around.

This will inevitably vary from site to site (what the fields are called, what information is required etc.) so you won't be able to develop a blanket solution, and you'd have to check if the login failed or you'd end up scraping the login page again.

See also this duplicate question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文